forecast

Forecast univariate ARIMA or ARIMAX model responses or conditional variances

Syntax

``````[Y,YMSE] = forecast(Mdl,numperiods,Y0)``````
``````[Y,YMSE,V] = forecast(Mdl,numperiods,Y0)``````
``Tbl2 = forecast(Mdl,numperiods,Tbl1)``
``[___] = forecast(___,Name=Value)``

Description

example

``````[Y,YMSE] = forecast(Mdl,numperiods,Y0)``` returns the `numperiods`-by-1 numeric vector of consecutive forecasted responses `Y` and the corresponding numeric vector of forecast mean square errors (MSE) `YMSE` of the fully specified, univariate ARIMA model `Mdl`. The presample response data in the numeric vector `Y0` initializes the model to generate forecasts.```

example

``````[Y,YMSE,V] = forecast(Mdl,numperiods,Y0)``` also forecasts a `numperiods`-by-1 numeric vector of conditional variances `V` from a composite conditional mean and variance model (for example, an ARIMA and GARCH composite model).```

example

````Tbl2 = forecast(Mdl,numperiods,Tbl1)` returns the table or timetable `Tbl2` containing a variable for each of the paths of response, forecast MSE, and conditional variance series resulting from forecasting the ARIMA model `Mdl` over a `numperiods` forecast horizon. `Tbl1` is a table or timetable containing a variable for required presample response data to initialize the model for forecasting. `Tbl1` can optionally contain variables of presample data for innovations, conditional variances, and predictors. (since R2023b)`forecast` selects the response variable named in `Mdl.SeriesName` or the sole variable in `Tbl1`. To select a different response variable in `Tbl1` to initialize the model, use the `PresampleResponseVariable` name-value argument.```

example

````[___] = forecast(___,Name=Value)` specifies options using one or more name-value arguments in addition to any of the input argument combinations in previous syntaxes. `forecast` returns the output argument combination for the corresponding input arguments. For example, `forecast(Mdl,10,Y0,X0=Exo0,XF=Exo)` specifies the presample and forecast sample exogenous predictor data to `Exo0` and `Exo`, respectively, to forecast a model with a regression component (an ARIMAX model).```

Examples

collapse all

Forecast the conditional mean response of simulated data over a 30-period horizon. Supply a vector of presample response data and return a vector of forecasts.

Simulate 130 observations from a multiplicative seasonal moving average (MA) model with known parameter values.

```Mdl = arima(MA={0.5 -0.3},SMA=0.4,SMALags=12,Constant=0.04, ... Variance=0.2); rng(200,"twister") Y = simulate(Mdl,130);```

Fit a seasonal MA model to the first 100 observations, and reserve the remaining 30 observations to evaluate forecast performance.

```MdlTemplate = arima(MALags=1:2,SMALags=12); EstMdl = estimate(MdlTemplate,Y(1:100));```
``` ARIMA(0,0,2) Model with Seasonal MA(12) (Gaussian Distribution): Value StandardError TStatistic PValue ________ _____________ __________ __________ Constant 0.20403 0.069064 2.9542 0.0031344 MA{1} 0.50212 0.097298 5.1606 2.4619e-07 MA{2} -0.20174 0.10447 -1.9312 0.053464 SMA{12} 0.27028 0.10907 2.478 0.013211 Variance 0.18681 0.032732 5.7073 1.148e-08 ```

`EstMdl` is a new `arima` model that contains estimated parameters (that is, a fully specified model).

Forecast the fitted model into a 30-period horizon. Specify the estimation period data as a presample.

```[YF,YMSE] = forecast(EstMdl,30,Y(1:100)); YF(15)```
```ans = 0.2040 ```
`YMSE(15)`
```ans = 0.2592 ```

`YF` is a 30-by-1 vector of forecasted responses, and `YMSE` is a 30-by-1 vector of corresponding MSEs. The 15-period-ahead forecast is 0.2040 and its MSE is 0.2592.

Visually compare the forecasts to the holdout data.

```figure h1 = plot(Y,Color=[.7,.7,.7]); hold on h2 = plot(101:130,YF,"b",LineWidth=2); h3 = plot(101:130,YF + 1.96*sqrt(YMSE),"r:",LineWidth=2); plot(101:130,YF - 1.96*sqrt(YMSE),"r:",LineWidth=2); legend([h1 h2 h3],"Observed","Forecast","95% confidence interval", ... Location="NorthWest") title("30-Period Forecasts and 95% Confidence Intervals") hold off```

Since R2023b

Forecast the weekly average NYSE closing prices over a 15-week horizon. Supply presample data in a timetable and return a timetable of forecasts.

Load the US equity index data set `Data_EquityIdx`.

```load Data_EquityIdx T = height(DataTimeTable)```
```T = 3028 ```

The timetable `DataTimeTable` includes the time series variable `NYSE`, which contains daily NYSE composite closing prices from January 1990 through December 2001.

Plot the daily NYSE price series.

```figure plot(DataTimeTable.Time,DataTimeTable.NYSE) title("NYSE Daily Closing Prices: 1990 - 2001")```

Prepare Timetable for Estimation

When you plan to supply a timetable, you must ensure it has all the following characteristics:

• The selected response variable is numeric and does not contain any missing values.

• The timestamps in the `Time` variable are regular, and they are ascending or descending.

Remove all missing values from the timetable, relative to the NYSE price series.

```DTT = rmmissing(DataTimeTable,DataVariables="NYSE"); T_DTT = height(DTT)```
```T_DTT = 3028 ```

Because all sample times have observed NYSE prices, `rmmissing` does not remove any observations.

Determine whether the sampling timestamps have a regular frequency and are sorted.

`areTimestampsRegular = isregular(DTT,"days")`
```areTimestampsRegular = logical 0 ```
`areTimestampsSorted = issorted(DTT.Time)`
```areTimestampsSorted = logical 1 ```

`areTimestampsRegular = 0` indicates that the timestamps of `DTT` are irregular. `areTimestampsSorted = 1` indicates that the timestamps are sorted. Business day rules make daily macroeconomic measurements irregular.

Remedy the time irregularity by computing the weekly average closing price series of all timetable variables.

```DTTW = convert2weekly(DTT,Aggregation="mean"); areTimestampsRegular = isregular(DTTW,"weeks")```
```areTimestampsRegular = logical 1 ```
`T_DTTW = height(DTTW)`
```T_DTTW = 627 ```

`DTTW` is regular.

```figure plot(DTTW.Time,DTTW.NYSE) title("NYSE Daily Closing Prices: 1990 - 2001")```

Create Model Template for Estimation

Suppose that an ARIMA(1,1,1) model is appropriate to model NYSE composite series during the sample period.

Create an ARIMA(1,1,1) model template for estimation. Set the response series name to `NYSE`.

```Mdl = arima(1,1,1); Mdl.SeriesName = "NYSE";```

`Mdl` is a partially specified `arima` model object.

Partition Data

`estimate` and `forecast` require `Mdl.P` presample observations to initialize the model for estimaiton and forecasting.

Partition the data into three sets:

• A presample set for estimation

• An in-sample set, to which you fit the model and initialize the model for forecasting

• A holdout sample of length 15 to measure the model's predictive performance

```numpreobs = Mdl.P; % Required presample length numperiods = 15; % Forecast horizon DTTW0 = DTTW(1:numpreobs,:); % Estimation presample DTTW1 = DTTW((numpreobs+1):(end-numperiods),:); % In-sample for estimation and presample for forecasting DTTW2 = DTTW((end-numperiods+1):end,:); % Holdout sample```

Fit Model to Data

Fit an ARIMA(1,1,1) model to the in-sample weekly average NYSE closing prices. Specify the presample timetable and the presample response variable name.

`EstMdl = estimate(Mdl,DTTW1,Presample=DTTW0,PresampleResponseVariable="NYSE");`
``` ARIMA(1,1,1) Model (Gaussian Distribution): Value StandardError TStatistic PValue ________ _____________ __________ ___________ Constant 0.31873 0.23754 1.3418 0.17965 AR{1} 0.41132 0.2371 1.7348 0.082779 MA{1} -0.31232 0.24486 -1.2755 0.20212 Variance 55.472 1.8496 29.992 1.2638e-197 ```

`EstMdl` is a fully specified, estimated `arima` model object.

Forecast Conditional Mean

Forecast the weekly average NASDQ closing prices 15 weeks beyond the estimation sample using the fitted model. Use the estimatoin sample data as a presample to initialize the forecast. Specify the response variable name in the presample data.

`Tbl2 = forecast(EstMdl,numperiods,DTTW1)`
```Tbl2=15×3 timetable Time NYSE_Response NYSE_MSE NYSE_Variance ___________ _____________ ________ _____________ 28-Sep-2001 521.34 55.472 55.472 05-Oct-2001 519.89 122.47 55.472 12-Oct-2001 519.62 194.53 55.472 19-Oct-2001 519.82 268.72 55.472 26-Oct-2001 520.23 343.8 55.472 02-Nov-2001 520.71 419.24 55.472 09-Nov-2001 521.23 494.83 55.472 16-Nov-2001 521.76 570.49 55.472 23-Nov-2001 522.3 646.17 55.472 30-Nov-2001 522.84 721.86 55.472 07-Dec-2001 523.38 797.56 55.472 14-Dec-2001 523.92 873.26 55.472 21-Dec-2001 524.46 948.96 55.472 28-Dec-2001 525 1024.7 55.472 04-Jan-2002 525.55 1100.4 55.472 ```

`Tbl2` is a 15-by-3 timetable containing the forecasted weekly average closing price forecasts `NYSE_Response`, corresponding forecast MSEs `NYSE_MSE`, and the model's constant variance `NYSE_Variance` (`EstMdl.Variance = 55.8147`).

Plot the forecasts and approximate 95% forecast intervals.

```Tbl2.NYSE_Lower = Tbl2.NYSE_Response - 1.96*sqrt(Tbl2.NYSE_MSE); Tbl2.NYSE_Upper = Tbl2.NYSE_Response + 1.96*sqrt(Tbl2.NYSE_MSE); figure h1 = plot([DTTW1.Time((end-75):end); DTTW2.Time], ... [DTTW1.NYSE((end-75):end); DTTW2.NYSE],Color=[.7,.7,.7]); hold on h2 = plot(Tbl2.Time,Tbl2.NYSE_Response,"k",LineWidth=2); h3 = plot(Tbl2.Time,Tbl2{:,["NYSE_Lower" "NYSE_Upper"]},"r:",LineWidth=2); legend([h1 h2 h3(1)],"Observations","Forecasts","95% forecast intervals", ... Location="NorthWest") title("NYSE Weekly Average Closing Price") hold off```

The process is nonstationary, so the width of each forecast interval grows with time. The model tends to unestimate the weekly average closing prices.

Forecast the following known autoregressive model with one lag and an exogenous predictor (ARX(1)) model into a 10-period forecast horizon:

`${y}_{t}=1+0.3{y}_{t-1}+2{x}_{t}+{\epsilon }_{t},$`

where ${\epsilon }_{\mathit{t}}$ is a standard Gaussian random variable, and ${\mathit{x}}_{\mathit{t}}$ is an exogenous Gaussian random variable with a mean of 1 and a standard deviation of 0.5.

Create an `arima` model object that represents the ARX(1) model.

`Mdl = arima(Constant=1,AR=0.3,Beta=2,Variance=1);`

To forecast responses from the ARX(1) model, the `forecast` function requires:

• One presample response ${\mathit{y}}_{0}$ to initialize the autoregressive term

• Future exogenous data to include the effects of the exogenous variable on the forecasted responses

Set the presample response to the unconditional mean of the stationary process:

`$E\left({y}_{t}\right)=\frac{1+2\left(1\right)}{1-0.3}.$`

For the future exogenous data, draw 10 values from the distribution of the exogenous variable.

```rng(1,"twister"); y0 = (1 + 2)/(1 - 0.3); xf = 1 + 0.5*randn(10,1);```

Forecast the ARX(1) model into a 10-period forecast horizon. Specify the presample response and future exogenous data.

```fh = 10; yf = forecast(Mdl,fh,y0,XF=xf)```
```yf = 10×1 3.6367 5.2722 3.8232 3.0373 3.0657 3.3470 3.4454 4.2120 4.0667 4.8065 ```

`yf(3)` = `3.8232` is the 3-period-ahead forecast of the ARX(1) model.

Since R2023b

Consider the following AR(1) conditional mean model with a GARCH(1,1) conditional variance model for the weekly average NASDAQ rate series (as a percent) from January 2, 1990 through December 31, 2001.

`$\begin{array}{l}{y}_{t}=0.073+0.138{y}_{t-1}+{\epsilon }_{t}\\ {\sigma }_{t}^{2}=0.022+0.873{\sigma }_{t-1}^{2}+0.119{\epsilon }_{t-1},\end{array}$`

where ${\epsilon }_{\mathit{t}}$ is a series of independent random Gaussian variables with a mean of 0.

Create the model. Name the response series `NASDAQ`.

```CondVarMdl = garch(Constant=0.022,GARCH=0.873,ARCH=0.119); Mdl = arima(Constant=0.073,AR=0.138,Variance=CondVarMdl); Mdl.SeriesName = "NASDAQ";```

Load the equity index data set. Remedy the time irregularity by computing the weekly average closing price series of all timetable variables.

```load Data_EquityIdx DTTW = convert2weekly(DataTimeTable,Aggregation="mean");```

Convert the weekly average NASDAQ closing price series to a percent return series.

```RetTT = price2ret(DTTW); RetTT.NASDAQ = RetTT.NASDAQ*100;```

Infer residuals and conditional variances from the model.

```RetTT2 = infer(Mdl,RetTT); T = numel(RetTT);```

Forecast the model over a 25-day horizon. Supply the entire data set as a presample (`forecast` uses only the latest required observations to initialize the conditional mean and variance models). Supply variable names for the presample innovations and conditional variances. By default, `forecast` uses the variable name `Mdl.SeriesName` as the presample response variable.

```fh = 25; ForecastTT = forecast(Mdl,fh,RetTT2,PresampleInnovationVariable="NASDAQ_Residual", ... PresampleVarianceVariable="NASDAQ_Variance");```

Plot the forecasted responses and conditional variances with the observed series from June 2000.

```pdates = RetTT2.Time > datetime(2000,6,1); figure plot(RetTT2.Time(pdates),RetTT2.NASDAQ(pdates)) hold on plot([RetTT2.Time(end); ForecastTT.Time], ... [RetTT2.NASDAQ(end); ForecastTT.NASDAQ_Response]) title("NASDAQ Weekly Average Percent Return Series") legend("Observed","Forecasted") axis tight grid on hold off```

```figure plot(RetTT2.Time(pdates),RetTT2.NASDAQ_Variance(pdates)) hold on plot([RetTT2.Time(end); ForecastTT.Time], ... [RetTT2.NASDAQ_Variance(end); ForecastTT.NASDAQ_Variance]) title("Conditional Variance Series") legend("Observed","Forecasted") axis tight grid on hold off```

Forecast multiple response and conditional variance paths from a known composite conditional mean and variance model: a SAR$\left(1,0,0\right){\left(1,1,0\right)}_{4}$ conditional mean model with an ARCH(1) conditional variance model. Specify multiple presample response paths.

Create a `garch` model object that represents this ARCH(1) model:

`${\sigma }_{t}^{2}=0.1+0.2{\epsilon }_{t}^{2}.$`

Create an `arima` model object that represents this quarterly SAR$\left(1,0,0\right){\left(1,1,0\right)}_{4}$ model:

`$\left(1-0.5L\right)\left(1-0.2{L}^{4}\right)\left(1-{L}^{4}\right){y}_{t}=1+{\epsilon }_{t},$`

where ${\epsilon }_{\mathit{t}}$ is a standard Gaussian random variable.

`CVMdl = garch(ARCH=0.2,Constant=0.1)`
```CVMdl = garch with properties: Description: "GARCH(0,1) Conditional Variance Model (Gaussian Distribution)" SeriesName: "Y" Distribution: Name = "Gaussian" P: 0 Q: 1 Constant: 0.1 GARCH: {} ARCH: {0.2} at lag [1] Offset: 0 ```
```Mdl = arima(Constant=1,AR=0.5,Variance=CVMdl,Seasonality=4, ... SARLags=4,SAR=0.2)```
```Mdl = arima with properties: Description: "ARIMA(1,0,0) Model Seasonally Integrated with Seasonal AR(4) (Gaussian Distribution)" SeriesName: "Y" Distribution: Name = "Gaussian" P: 9 D: 0 Q: 0 Constant: 1 AR: {0.5} at lag [1] SAR: {0.2} at lag [4] MA: {} SMA: {} Seasonality: 4 Beta: [1×0] Variance: [GARCH(0,1) Model] ```

Because `Mdl` contains 9 autoregressive terms and 1 ARCH term, `forecast` requires `Mdl.P = 9` responses and `CVMdl.Q` = 1 conditional variance to generate each $\mathit{t}$-period-ahead forecast.

Generate 10 random paths of length 9 from the model.

```rng(1,"twister") numpreobs = Mdl.P; numpaths = 10; [Y0,~,V0] = simulate(Mdl,numpreobs,NumPaths=numpaths);```

Forecast 10 paths of responses and conditional variances from the model into a 12-quarter forecast horizon. Specify the presample response paths `Y0` and conditional variance paths V0.

```fh = 12; [YF,~,VF] = forecast(Mdl,fh,Y0,V0=V0);```

`YF` and `VF` are 12-by-10 matrices of independent forecasted response and conditional variance paths, respectively. `YF(j,k)` is the `j`-period-ahead forecast of path `k`. Path `YF(:,k)` represents the continuation of the presample path `Y0(:,k)`. `forecast` structures `VF` similarly.

Plot the presample and forecasted responses.

```Y = [Y0; YF]; figure plot(Y) hold on h = gca; px = [numpreobs+0.5 h.XLim([2 2]) numpreobs+0.5]; py = h.YLim([1 1 2 2]); hp = patch(px,py,[0.9 0.9 0.9]); uistack(hp,"bottom"); axis tight legend("Forecast period") xlabel("Time (quarters)") title("Response paths") hold off```

```V = [V0; VF]; figure plot(V) hold on h = gca; px = [numpreobs+0.5 h.XLim([2 2]) numpreobs+0.5]; py = h.YLim([1 1 2 2]); hp = patch(px,py,[0.9 0.9 0.9]); uistack(hp,"bottom"); legend("Forecast period") axis tight xlabel("Time (quarters)") title("Conditional Variance Paths") hold off```

Input Arguments

collapse all

Fully specified ARIMA model, specified as an `arima` model object created by `arima` or `estimate`.

The properties of `Mdl` cannot contain `NaN` values.

Forecast horizon, or the number of time points in the forecast period, specified as a positive integer.

Data Types: `double`

Presample response data yt used to initialize the model for forecasting, specified as a `numpreobs`-by-1 numeric column vector or a `numpreobs`-by-`numpaths` numeric matrix. When you supply `Y0`, supply all optional data as numeric arrays, and `forecast` returns results in numeric arrays.

`numpreobs` is the number of presample observations. `numpaths` is the number of independent presample paths, from which `forecast` initializes the resulting `numpaths` forecasts (see Algorithms).

Each row is a presample observation, and measurements in each row occur simultaneously. The last row contains the latest presample observation. `numpreobs` must be at least `Mdl.P` to initialize the model. If `numpreobs` > `Mdl.P`, `forecast` uses only the latest `Mdl.P` rows. For more details, see Time Base Partitions for Forecasting.

Columns of `Y0` correspond to separate, independent presample paths.

• If `Y0` is a column vector, it represents a single path of the response series. `forecast` applies it to each forecasted path. In this case, all forecast paths `Y` derive from the same initial responses.

• If `Y0` is a matrix, each column represents a presample path of the response series. `numpaths` is the maximum among the second dimensions of the specified presample observation matrices `Y0`, `E0`, and `V0`.

Data Types: `double`

Since R2023b

Presample data containing required presample responses yt, and, optionally, innovations εt, conditional variances σt2, or predictors xt, to initialize the model, specified as a table or timetable with `numprevars` variables and `numpreobs` rows. You can select a response, innovation, conditional variance, or multiple predictor variables from `Tbl1` by using the `PresampleResponseVariable`, `PresampleInnovationVariable`, `PresampleVarianceVariable`, or `PresamplePredictorVariables` name-value argument, respectively.

`numpreobs` is the number of presample observations. `numpaths` is the number of independent presample paths, from which `forecast` initializes the resulting `numpaths` forecasts (see Algorithms).

For all selected variables except predictor variables, each variable contains a single path (`numpreobs`-by-1 vector) or multiple paths (`numpreobs`-by-`numpaths` matrix) of presample response, innovations, or conditional variance data.

Each selected predictor variable contains a single path of observations. `forecast` applies all selected predictor variables to each forecasted path. When you do not specify presample innovation data for forecasting an ARIMAX model, `forecast` uses the presample predictor data to infer presample innovations.

Each row is a presample observation, and measurements in each row occur simultaneously. `numpreobs` must be one of the following values:

• At least `Mdl.P` when `Presample` provides only presample responses

• At least `max([Mdl.P Mdl.Q])` otherwise

When `Mdl.Variance` is a conditional variance model, `forecast` can require more than the minimum required number of presample values. If `numpreobs` exceeds the minimum number, `forecast` uses the latest required number of observations only.

If `Tbl1` is a timetable, all the following conditions must be true:

• `Tbl1` must represent a sample with a regular datetime time step (see `isregular`).

• The datetime vector of sample timestamps `Tbl1.Time` must be ascending or descending.

If `Tbl1` is a table, the last row contains the latest presample observation.

Although `forecast` requires presample response data, `forecast` sets default presample innovation and conditional variance data as follows:

• To infer necessary presample innovations from presample responses, `numpreobs` must be at least `Mdl.P + Mdl.Q` (see `infer`). Additionally, for ARIMAX models, `forecast` requires enough presample predictor data. If `numpreobs` is less than `Mdl.P + Mdl.Q` or you do not specify presample predictor data for ARIMAX forecasting, `forecast` sets all necessary presample innovations to zero.

• To infer necessary presample variances from presample innovations, `forecast` requires a sufficient number of presample innovations to initialize the specified conditional variance model (see `infer`). If you do not specify enough presample innovations to initialize the conditional variance model, `forecast` sets the necessary presample variances to the unconditional variance of the specified variance process.

Name-Value Arguments

Specify optional pairs of arguments as `Name1=Value1,...,NameN=ValueN`, where `Name` is the argument name and `Value` is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose `Name` in quotes.

Example: `forecast(Mdl,10,Y0,X0=Exo0,XF=Exo)` specifies the presample and forecast sample exogenous predictor data to `Exo0` and `Exo`, respectively, to forecast a model with a regression component.

Presample innovations εt used to initialize either the moving average (MA) component of the ARIMA model or the conditional variance model, specified as a `numpreobs`-by-1 column vector or `numpreobs`-by-`numpaths` numeric matrix. Use `E0` only when you supply the numeric array of presample response data `Y0`. `forecast` assumes that the presample innovations have a mean of zero.

Each row is a presample observation, and measurements in each row occur simultaneously. The last row contains the latest presample observation. `numpreobs` must be at least `Mdl.Q` to initialize the model. If `Mdl.Variance` is a conditional variance model (for example, a `garch` model object), `E0` might require more than `Mdl.Q` rows. If `numpreobs` is greater than required, `forecast` uses only the latest required rows.

Columns of `E0` correspond to separate, independent presample paths.

• If `E0` is a column vector, it represents a single path of the innovation series. `forecast` applies it to each forecasted path. In this case, all forecast paths `Y` derive from the same initial innovations.

• If `E0` is a matrix, each column represents a presample path of the innovation series. `numpaths` is the maximum among the second dimensions of the specified presample observation matrices `Y0`, `E0`, and `V0`.

By default:

• If you provide enough presample responses and, for ARIMAX models, presample predictor data (`X0`), `forecast` infers necessary presample innovations from the presample data. In this case, `numpreobs` must be at least `Mdl.P + Mdl.Q` (see `infer`)

• Otherwise, `forecast` sets all necessary presample innovations to zero.

Data Types: `double`

Presample conditional variances σt2 used to initialize the conditional variance model, specified as a `numpreobs`-by-1 positive column vector or `numpreobs`-by-`numpaths` positive matrix. Use `V0` only when you supply the numeric array of presample response data `Y0`. If the model variance `Mdl.Variance` is constant, `forecast` ignores `V0`.

Rows of `V0` correspond to periods in the presample, and the last row contains the latest presample conditional variance. `numpreobs` must be enough to initialize the conditional variance model (see `forecast`). If `numpreobs` exceeds the minimum number, `forecast` uses only the latest observations.

Columns of `V0` correspond to separate, independent paths.

• If `V0` is a column vector, `forecast` applies it to each forecasted path. In this case, the conditional variance model of all forecast paths `Y` derives from the same initial conditional variances.

• If `V0` is a matrix, each column represents a presample path of the conditional variance series. `numpaths` is the maximum among the second dimensions of the specified presample observation matrices `Y0`, `E0`, and `V0`.

By default:

• If you specify enough presample innovations `E0` to initialize the conditional variance model `Mdl.Variance`, `forecast` infers any necessary presample conditional variances by passing the conditional variance model and `E0` to the `infer` function.

• If you do not specify `E0`, but you specify enough presample responses and, for ARIMAX models, presample predictor data, `Y0` to infer enough presample innovations, `forecast` infers any necessary presample conditional variances from the inferred presample innovations.

• If you do not specify enough presample data, `forecast` sets all necessary presample conditional variances to the unconditional variance of the variance process.

Data Types: `double`

Since R2023b

Response variable yt to select from `Tbl1` containing the presample response data, specified as one of the following data types:

• String scalar or character vector containing a variable name in `Tbl1.Properties.VariableNames`

• Variable index (positive integer) to select from `Tbl1.Properties.VariableNames`

• A logical vector, where ```PresampleResponseVariable(j) = true``` selects variable `j` from `Tbl1.Properties.VariableNames`

The selected variable must be a numeric vector and cannot contain missing values (`NaN`s).

If `Tbl1` has one variable, the default specifies that variable. Otherwise, the default matches the variable to names in `Mdl.SeriesName`.

Example: `PresampleResponseVariable="StockRate"`

Example: `PresampleResponseVariable=[false false true false]` or `PresampleResponseVariable=3` selects the third table variable as the response variable.

Data Types: `double` | `logical` | `char` | `cell` | `string`

Since R2023b

Presample innovation variable of εt to select from `Tbl1` containing presample innovation data, specified as one of the following data types:

• String scalar or character vector containing a variable name in `Tbl1.Properties.VariableNames`

• Variable index (positive integer) to select from `Tbl1.Properties.VariableNames`

• A logical vector, where ```PresampleInnovationVariable(j) = true``` selects variable `j` from `Tbl1.Properties.VariableNames`

The selected variable must be a numeric matrix and cannot contain missing values (`NaN`s).

If you specify presample innovation data in `Tbl1`, you must specify `PresampleInnovationVariable`.

Example: `PresampleInnovationVariable="StockRateDist0"`

Example: `PresampleInnovationVariable=[false false true false]` or `PresampleInnovationVariable=3` selects the third table variable as the presample innovation variable.

Data Types: `double` | `logical` | `char` | `cell` | `string`

Presample conditional variance variable σt2 to select from `Tbl1` containing presample conditional variance data, specified as one of the following data types:

• String scalar or character vector containing a variable name in `Tbl1.Properties.VariableNames`

• Variable index (positive integer) to select from `Tbl1.Properties.VariableNames`

• A logical vector, where ```PresampleVarianceVariable(j) = true``` selects variable `j` from `Tbl1.Properties.VariableNames`

The selected variable must be a numeric vector and cannot contain missing values (`NaN`s).

If you specify presample conditional variance data in `Tbl1`, you must specify `PresampleVarianceVariable`.

Example: `PresampleVarianceVariable="StockRateVar0"`

Example: `PresampleVarianceVariable=[false false true false]` or `PresampleVarianceVariable=3` selects the third table variable as the presample conditional variance variable.

Data Types: `double` | `logical` | `char` | `cell` | `string`

Presample predictor data used to infer the presample innovations `E0`, specified as a `numpreobs`-by-`numpreds` numeric matrix. Use `X0` only when you supply the numeric array of presample response data `Y0` and your model contains a regression component. `numpreds` = `numel(Mdl.Beta)`.

Rows of `X0` correspond to periods in the presample, and the last row contains the latest set of presample predictor observations. Columns of `X0` represent separate time series variables, and they correspond to the columns of `XF` and `Mdl.Beta`.

If you do not specify `E0`, `X0` must have at least `numpreobs``Mdl.P` rows so that `forecast` can infer presample innovations. If the number of rows exceeds the minimum number required to infer presample innovations, `forecast` uses only the latest required presample predictor observations. A best practice is to set `X0` to the same predictor data matrix used in the estimation, simulation, or inference of `Mdl`. This setting ensures that `forecast` infers presample innovations `E0` correctly.

If you specify `E0`, `forecast` ignores `X0`.

If you specify `X0` but you do not specify forecasted predictor data `XF`, `forecast` issues an error.

By default, `forecast` drops the regression component from the model when it infers presample innovations, regardless of the value of the regression coefficient `Mdl.Beta`.

Data Types: `double`

Since R2023b

Presample exogenous predictor variables xt to select from `Tbl1` containing presample exogenous predictor data, specified as one of the following data types:

• String vector or cell vector of character vectors containing `numpreds` variable names in `Tbl1.Properties.VariableNames`

• A vector of unique indices (positive integers) of variables to select from `Tbl1.Properties.VariableNames`

• A logical vector, where ```PresamplePredictorVariables(j) = true ``` selects variable `j` from `Tbl1.Properties.VariableNames`

The selected variables must be numeric vectors and cannot contain missing values (`NaN`s).

If you specify presample predictor data, you must also specify in-sample predictor data by using the `InSample` and `PredictorVariables` name-value arguments.

By default, `forecast` excludes the regression component, regardless of its presence in `Mdl`.

Example: ```PresamplePredictorVariables=["M1SL" "TB3MS" "UNRATE"]```

Example: `PresamplePredictorVariables=[true false true false]` or `PredictorVariable=[1 3]` selects the first and third table variables to supply the predictor data.

Data Types: `double` | `logical` | `char` | `cell` | `string`

Forecasted (or future) predictor data, specified as a numeric matrix with `numpreds` columns. `XF` represents the evolution of specified presample predictor data `X0` forecasted into the future (the forecast period). Use `XF` only when you supply the numeric array of presample response data `Y0`.

Rows of `XF` correspond to time points in the future; `XF(t,:)` contains the `t`-period-ahead predictor forecasts. `XF` must have at least `numperiods` rows. If the number of rows exceeds `numperiods`, `forecast` uses only the first (earliest) `numperiods` forecasts. For more details, see Time Base Partitions for Forecasting.

Columns of `XF` are separate time series variables, and they correspond to the columns of `X0` and `Mdl.Beta`.

By default, the `forecast` function generates forecasts from `Mdl` without a regression component, regardless of the value of the regression coefficient `Mdl.Beta`.

Since R2023b

Forecasted (future) predictor data for the exogenous regression component of the model, specified as a table or timetable. `InSample` contains `numvars` variables, including `numpreds` predictor variables xt.

`forecast` returns the forecasted variables in the output table or timetable `Tbl2`, which is commensurate with `InSample`.

Each row corresponds to an observation in the forecast horizon, the first row is the earliest observation, and measurements in each row, among all paths, occur simultaneously. `InSample` must have at least `numperiods` rows to cover the forecast horizon. If you supply more rows than necessary, `forecast` uses only the first `numperiods` rows.

Each selected predictor variable is a numeric vector without missing values (`NaN`s). `forecast` applies the specified predictor variables to all forecasted paths.

If `InSample` is a timetable, the following conditions apply:

• `InSample` must represent a sample with a regular datetime time step (see `isregular`).

• The datetime vector `InSample.Time` must be ascending or descending.

• `Tbl1` must immediately precede `InSample`, with respect to the sampling frequency.

If `InSample` is a table, the last row contains the latest observation.

By default, `forecast` does not include the regression component in the model, regardless of the value of `Mdl.Beta`.

Since R2023b

Exogenous predictor variables xt to select from `InSample` containing exogenous predictor data in the forecast horizon, specified as one of the following data types:

• String vector or cell vector of character vectors containing `numpreds` variable names in `InSample.Properties.VariableNames`

• A vector of unique indices (positive integers) of variables to select from `InSample.Properties.VariableNames`

• A logical vector, where `PredictorVariables(j) = true ` selects variable `j` from `InSample.Properties.VariableNames`

The selected variables must be numeric vectors and cannot contain missing values (`NaN`s).

By default, `forecast` excludes the regression component, regardless of its presence in `Mdl`.

Example: ```PredictorVariables=["M1SL" "TB3MS" "UNRATE"]```

Example: `PredictorVariables=[true false true false]` or `PredictorVariable=[1 3]` selects the first and third table variables to supply the predictor data.

Data Types: `double` | `logical` | `char` | `cell` | `string`

Note

For numeric array inputs, `forecast` assumes that you synchronize all specified presample data sets so that the latest observation of each presample series occurs simultaneously. Similarly, `forecast` assumes that the first observation in the forecasted predictor data `XF` occurs in the time point immediately after the last observation in the presample predictor data `X0`.

Output Arguments

collapse all

Minimum mean square error (MMSE) conditional mean forecasts yt, returned as a `numperiods`-by-1 column vector or a `numperiods`-by-`numpaths` numeric matrix. `Y` represents a continuation of `Y0` (`Y(1,:)` occurs in the time point immediately after `Y0(end,:)`). `forecast` returns `Y` only when you supply numeric presample data `Y0`.

`Y(t,:)` contains the `t`-period-ahead forecasts, or the conditional mean forecast of all paths for time point `t` in the forecast period.

`forecast` determines `numpaths` from the number of columns in the presample data sets `Y0`, `E0`, and `V0`. For details, see Algorithms. If each presample data set has one column, `Y` is a column vector.

Data Types: `double`

MSE of the forecasted responses `Y` (forecast error variances), returned as a `numperiods`-by-1 column vector or a `numperiods`-by-`numpaths` numeric matrix. `forecast` returns `YMSE` only when you supply numeric presample data `Y0`.

`YMSE(t,:)` contains the forecast error variances of all paths for time point `t` in the forecast period.

`forecast` determines `numpaths` from the number of columns in the presample data sets `Y0`, `E0`, and `V0`. For details, see Algorithms. If you do not specify any presample data sets, or if each data set is a column vector, `YMSE` is a column vector.

The square roots of `YMSE` are the standard errors of the forecasts `Y`.

Data Types: `double`

MMSE forecasts of the conditional variances of future model innovations, returned as a `numperiods`-by-1 numeric column vector or a `numperiods`-by-`numpaths` numeric matrix. `forecast` returns `V` only when you supply numeric presample data `Y0`.

When `Mdl.Variance` is a conditional variance model, row `j` contains the conditional variance forecasts of period `j`. Otherwise, `V` is a matrix composed of the constant `Mdl.Variance`.

`forecast` determines `numpaths` from the number of columns in the presample data sets `Y0`, `E0`, and `V0`. For details, see Algorithms. If you do not specify any presample data sets, or if each data set is a column vector, `YMSE` is a column vector.

Data Types: `double`

Since R2023b

Paths of MMSE forecasts of responses yt, corresponding forecast MSEs, and MMSE forecasts of conditional variances σt2 of future model innovations εt, returned as a table or timetable, the same data type as `Tbl1`. `forecast` returns `Tbl2` only when you supply the input `Tbl1`.

`Tbl2` contains the following variables:

• The forecasted response paths, which are in a `numperiods`-by-`numpaths` numeric matrix, with rows representing periods in the forecast horizon and columns representing independent paths, each corresponding to the input presample response paths in `Tbl1`. `forecast` names the forecasted response variable `responseName_Response`, where `responseName` is `Mdl.SeriesName`. For example, if `Mdl.SeriesName` is `GDP`, `Tbl2` contains a variable for the corresponding forecasted response paths with the name `GDP_Response`.

Each path in `Tbl2.responseName_Response` represents the continuation of the corresponding presample response path in `Tbl1` (`Tbl2.responseName_Response(1,:)` occurs in the next time point, with respect to the periodicity `Tbl1`, after the last presample response). `Tbl2.responseName_Response(j,k)` contains the `j`-period-ahead forecasted response of path `k`.

• The forecast MSE paths, which are in a `numperiods`-by-`numpaths` numeric matrix, with rows representing periods in the forecast horizon and columns representing independent paths, each corresponding to the forecasted responses in `Tbl2.responseName_Response`. `forecast` names the forecast MSEs `responseName_MSE`, where `responseName` is `Mdl.SeriesName`. For example, if `Mdl.SeriesName` is `GDP`, `Tbl2` contains a variable for the corresponding forecast MSE with the name `GDP_MSE`.

• The forecasted conditional variance paths, which are in a `numperiods`-by-`numpaths` numeric matrix, with rows representing periods in the forecast horizon and columns representing independent paths. `forecast` names the forecasted conditional variance variable `responseName_Variance`, where `responseName` is `Mdl.SeriesName`. For example, if `Mdl.SeriesName` is `StockReturns`, `Tbl2` contains a variable for the corresponding forecasted conditional variance paths with the name `StockReturns_Variance`.

Each path in `Tbl2.responseName_Variance` represents a continuation of the presample conditional variance process, either supplied by `Tbl1` or set by default (`Tbl2.responseName_Variance(1,:)` occurs in the next time point, with respect to the periodicity `Tbl1`, after the last presample conditional variance). `Tbl2.responseName_Variance(j,k)` contains the `j`-period-ahead forecasted conditional variance of path `k`.

• When you supply `InSample`, `Tbl2` contains all variables in `InSample`.

If `Tbl1` is a timetable, the following conditions hold:

• The row order of `Tbl2`, either ascending or descending, matches the row order of `Tbl1`.

• `Tbl2.Time(1)` is the next time after `Tbl1.Time(end)` relative the sampling frequency, and `Tbl2.Time(2:numobs)` are the following times relative to the sampling frequency.

collapse all

Time Base Partitions for Forecasting

Time base partitions for forecasting are two disjoint, contiguous intervals of the time base; each interval contains time series data for forecasting a dynamic model. The forecast period (forecast horizon) is a `numperiods` length partition at the end of the time base during which the `forecast` function generates the forecasts `Y` from the dynamic model `Mdl`. The presample period is the entire partition occurring before the forecast period. The `forecast` function can require observed responses, innovations, or conditional variances in the presample period (`Y0`, `E0`, and `V0`, or `Tbl1`) to initialize the dynamic model for forecasting. The model structure determines the types and amounts of required presample observations.

A common practice is to fit a dynamic model to a portion of the data set, and then validate the predictability of the model by comparing its forecasts to observed responses. During forecasting, the presample period contains the data to which the model is fit, and the forecast period contains the holdout sample for validation. Suppose that yt is an observed response series; x1,t, x2,t, and x3,t are observed exogenous series; and time t = 1,…,T. Consider forecasting responses from a dynamic model of yt containing a regression component with `numperiods` = K periods. Suppose that the dynamic model is fit to the data in the interval [1,TK] (for more details, see `estimate`). This figure shows the time base partitions for forecasting.

For example, to generate the forecasts `Y` from an ARX(2) model, `forecast` requires:

• Presample responses `Y0` = ${\left[\begin{array}{cc}{y}_{T-K-1}& {y}_{T-K}\end{array}\right]}^{\prime }$ to initialize the model. The 1-period-ahead forecast requires both observations, whereas the 2-periods-ahead forecast requires yTK and the 1-period-ahead forecast `Y(1)`. The `forecast` function generates all other forecasts by substituting previous forecasts for lagged responses in the model.

• Future exogenous data `XF` = $\left[\begin{array}{ccc}{x}_{1,\left(T-K+1\right):T}& {x}_{2,\left(T-K+1\right):T}& {x}_{3,\left(T-K+1\right):T}\end{array}\right]$ for the model regression component. Without specified future exogenous data, the `forecast` function ignores the model regression component, which can yield unrealistic forecasts.

Dynamic models containing either a moving average component or a conditional variance model can require presample innovations or conditional variances. Given enough presample responses, `forecast` infers the required presample innovations and conditional variances. If such a model also contains a regression component, then `forecast` must have enough presample responses and exogenous data to infer the required presample innovations and conditional variances. This figure shows the arrays of required observations for this case, with corresponding input and output arguments.

Algorithms

• The `forecast` function sets the number of sample paths (`numpaths`) to the maximum number of columns among the specified presample data sets:

All specified presample data sets must have either one column or `numpaths` > 1 columns. Otherwise, `forecast` issues an error. For example, if you supply `Y0` and `E0`, and `Y0` has five columns representing five paths, then `E0` can have one column or five columns. If `E0` has one column, `forecast` applies `E0` to each path.

• `NaN` values in presample and future data sets indicate missing data. For input numeric arrays, `forecast` removes missing data from the presample data sets following this procedure:

1. `forecast` horizontally concatenates the specified presample data sets `Y0`, `E0`, `V0`, and `X0` so that the latest observations occur simultaneously. The result can be a jagged array because the presample data sets can have a different number of rows. In this case, `forecast` prepads variables with an appropriate number of zeros to form a matrix.

2. `forecast` applies listwise deletion to the combined presample matrix by removing all rows containing at least one `NaN`.

3. `forecast` extracts the processed presample data sets from the result of step 2, and removes all prepadded zeros.

`forecast` applies a similar procedure to the forecasted predictor data `XF`. After `forecast` applies listwise deletion to `XF`, the result must have at least `numperiods` rows. Otherwise, `forecast` issues an error.

List-wise deletion reduces the sample size and can create irregular time series.

• `forecast` issues an error when any table or timetable input contains missing values.

• When `forecast` computes the MSEs `YMSE` of the conditional mean forecasts `Y`, the function treats the specified predictor data sets as exogenous, nonstochastic, and statistically independent of the model innovations. Therefore, `YMSE` reflects only the variance associated with the ARIMA component of the input model `Mdl`.

References

[1] Baillie, Richard T., and Tim Bollerslev. “Prediction in Dynamic Models with Time-Dependent Conditional Variances.” Journal of Econometrics 52, (April 1992): 91–113. https://doi.org/10.1016/0304-4076(92)90066-Z.

[2] Bollerslev, Tim. “Generalized Autoregressive Conditional Heteroskedasticity.” Journal of Econometrics 31 (April 1986): 307–27. https://doi.org/10.1016/0304-4076(86)90063-1.

[3] Bollerslev, Tim. “A Conditionally Heteroskedastic Time Series Model for Speculative Prices and Rates of Return.” The Review of Economics and Statistics 69 (August 1987): 542–47. https://doi.org/10.2307/1925546.

[4] Box, George E. P., Gwilym M. Jenkins, and Gregory C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994.

[5] Enders, Walter. Applied Econometric Time Series. Hoboken, NJ: John Wiley & Sons, Inc., 1995.

[6] Engle, Robert. F. “Autoregressive Conditional Heteroscedasticity with Estimates of the Variance of United Kingdom Inflation.” Econometrica 50 (July 1982): 987–1007. https://doi.org/10.2307/1912773.

[7] Hamilton, James D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994.

Version History

Introduced in R2012a

expand all