# forecast

**Class: **regARIMA

Forecast responses of regression model with ARIMA errors

## Syntax

```
[Y,YMSE]
= forecast(Mdl,numperiods)
```

[Y,YMSE,U]
= forecast(Mdl,numperiods)

[Y,YMSE,U]
= forecast(Mdl,numperiods,Name,Value)

## Description

`[`

forecasts responses (`Y`

,`YMSE`

]
= forecast(`Mdl`

,`numperiods`

)`Y`

) for a regression model with ARIMA time series
errors and generates corresponding mean square errors (`YMSE`

).

`[`

additionally forecasts unconditional disturbances for a regression model with ARIMA
errors.`Y`

,`YMSE`

,`U`

]
= forecast(`Mdl`

,`numperiods`

)

`[`

forecasts with additional options specified by one or more
`Y`

,`YMSE`

,`U`

]
= forecast(`Mdl`

,`numperiods`

,`Name,Value`

)`Name,Value`

pair arguments.

## Input Arguments

`numperiods`

— Forecast horizon

positive integer

Forecast horizon, or the number of time points in the forecast period, specified as a positive integer.

**Data Types: **`double`

### Name-Value Arguments

Specify optional pairs of arguments as
`Name1=Value1,...,NameN=ValueN`

, where `Name`

is
the argument name and `Value`

is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.

*
Before R2021a, use commas to separate each name and value, and enclose*
`Name`

*in quotes.*

`E0`

— Presample innovations

numeric column vector | numeric matrix

Presample innovations that initialize the moving average (MA)
component of the ARIMA error model, specified as the comma-separated
pair consisting of `'E0'`

and a numeric column vector
or numeric matrix. `forecast`

assumes that the
presample innovations have a mean of 0.

If

`E0`

is a column vector, then`forecast`

applies it to each forecasted path.If

`E0`

,`Y0`

, and`U0`

are matrices with multiple paths, then they must have the same number of columns.`E0`

requires at least`Mdl.Q`

rows. If`E0`

contains extra rows, then`forecast`

uses the latest presample innovations. The last row contains the latest presample innovation.

By default, if `U0`

contains at least
`Mdl.P`

+ `Mdl.Q`

rows, then
`forecast`

infers `E0`

from
`U0`

. If `U0`

has an insufficient
number of rows, and `forecast`

cannot infer sufficient
observations of `U0`

from the presample data
(`Y0`

and `X0`

), then
`E0`

is 0.

**Data Types: **`double`

`U0`

— Presample unconditional disturbances

numeric column vector | numeric matrix

Presample unconditional disturbances that initialize the
autoregressive (AR) component of the ARIMA error model, specified as the
comma-separated pair consisting of `'U0'`

and a numeric
column vector or numeric matrix. If you do not specify presample
innovations `E0`

, `forecast`

uses `U0`

to infer them.

If

`U0`

is a column vector, then`forecast`

applies it to each forecasted path.If

`U0`

,`Y0`

, and`E0`

are matrices with multiple paths, then they must have the same number of columns.`U0`

requires at least`Mdl.P`

rows. If`U0`

contains extra rows, then`forecast`

uses the latest presample unconditional disturbances. The last row contains the latest presample unconditional disturbance.

By default, if the presample data (`Y0`

and
`X0`

) contains at least `Mdl.P`

rows, then `forecast`

infers `U0`

from
the presample data. If you do not specify presample data, then all
required presample unconditional disturbances are 0.

**Data Types: **`double`

`X0`

— Presample predictor data

numeric matrix

Presample predictor data that initializes the model for forecasting,
specified as the comma-separated pair consisting of
`'X0'`

and a numeric matrix. The columns of
`X0`

are separate time series variables.
`forecast`

uses `X0`

to infer
presample unconditional disturbances `U0`

. Therefore,
if you specify `U0`

, `forecast`

ignores `X0`

.

If you do not specify

`U0`

, then`X0`

requires at least`Mdl.P`

rows to infer`U0`

. If`X0`

contains extra rows, then`forecast`

uses the latest observations. The last row contains the latest observation of each series.`X0`

requires the same number of columns as the length of`Mdl.Beta`

.If you specify

`X0`

, then you must also specify`XF`

.`forecast`

treats`X0`

as a fixed (nonstochastic) matrix.

**Data Types: **`double`

`XF`

— Forecasted or future predictor data

numeric matrix

Forecasted or future predictor data, specified as the comma-separated
pair consisting of `'XF'`

and a numeric matrix.

The columns of `XF`

are separate time series, each
corresponding to forecasts of the series in `X0`

. Row
*t* of `XF`

contains the
*t*-period-ahead forecasts of
`X0`

.

If you specify `X0`

, then you must also specify
`XF`

. `XF`

and
`X0`

require the same number of columns.
`XF`

must have at least
`numperiods`

rows. If `XF`

exceeds
`numperiods`

rows, then `forecast`

uses the first `numperiods`

forecasts.

`forecast`

treats `XF`

as a fixed
(nonstochastic) matrix.

By default, `forecast`

does not include a regression
component in the model, regardless of the presence of regression
coefficients in `Mdl`

.

**Data Types: **`double`

`Y0`

— Presample response data

numeric column vector | numeric matrix

Presample response data that initializes the model for forecasting,
specified as the comma-separated pair consisting of
`'Y0'`

and a numeric column vector or numeric
matrix. `forecast`

uses `Y0`

to
infer presample unconditional disturbances `U0`

.
Therefore, if you specify `U0`

,
`forecast`

ignores `Y0`

.

If

`Y0`

is a column vector,`forecast`

applies it to each forecasted path.If

`Y0`

,`E0`

, and`U0`

are matrices with multiple paths, then they must have the same number of columns.If you do not specify

`U0`

, then`Y0`

requires at least`Mdl.P`

rows to infer`U0`

. If`Y0`

contains extra rows, then`forecast`

uses the latest observations. The last row contains the latest observation.

**Data Types: **`double`

**Notes**

`NaN`

s in`E0`

,`U0`

,`X0`

,`XF`

, and`Y0`

indicate missing values and`forecast`

removes them. The software merges the presample data sets (`E0`

,`U0`

,`X0`

, and`Y0`

), then uses list-wise deletion to remove any`NaN`

s.`forecast`

similarly removes`NaN`

s from`XF`

. Removing`NaN`

s in the data reduces the sample size. Such removal can also create irregular time series.`forecast`

assumes that you synchronize presample data such that the latest observation of each presample series occurs simultaneously.Set

`X0`

to the same predictor matrix as`X`

used in the estimation, simulation, or inference of`Mdl`

. This assignment ensures correct inference of the unconditional disturbances,`U0`

.To include a regression component in the response forecast, you must specify the forecasted predictor data

`XF`

. That is, you can specify`XF`

without also specifying`X0`

, but`forecast`

issues an error when you specify`X0`

without also specifying`XF`

.

## Output Arguments

`Y`

— Minimum mean square error forecasts of response data

numeric matrix

Minimum mean square error (MMSE) forecasts of the response data, returned
as a numeric matrix. `Y`

has `numperiods`

rows and `numPaths`

columns.

If you do not specify

`Y0`

,`E0`

, and`U0`

, then`Y`

is a`numperiods`

column vector.If you specify

`Y0`

,`E0`

, and`U0`

, all having`numPaths`

columns, then`Y`

is a`numperiods`

-by-`numPaths`

matrix.Row

*i*of`Y`

contains the forecasts for the*i*th period.

**Data Types: **`double`

`YMSE`

— Mean square errors of forecasted responses

numeric matrix

Mean square errors (MSEs) of the forecasted responses, returned as a
numeric matrix. `YMSE`

has `numperiods`

rows and `numPaths`

columns.

If you do not specify

`Y0`

,`E0`

, and`U0`

, then`YMSE`

is a`numperiods`

column vector.If you specify

`Y0`

,`E0`

, and`U0`

, all having`numPaths`

columns, then`YMSE`

is a`numperiods`

-by-`numPaths`

matrix.Row

*i*of`YMSE`

contains the forecast error variances for the*i*th period.The predictor data does not contribute variability to

`YMSE`

because`forecast`

treats`XF`

as a nonstochastic matrix.The square roots of

`YMSE`

are the standard errors of the forecasts of`Y`

.

**Data Types: **`double`

`U`

— Minimum mean square error forecasts of future ARIMA error model unconditional disturbances

numeric matrix

Minimum mean square error (MMSE) forecasts of future ARIMA error model
unconditional disturbances, returned as a numeric matrix.
`U`

has `numperiods`

rows and
`numPaths`

columns.

If you do not specify

`Y0`

,`E0`

, and`U0`

, then`U`

is a`numperiods`

column vector.If you specify

`Y0`

,`E0`

, and`U0`

, all having`numPaths`

columns, then`U`

is a`numperiods`

-by-`numPaths`

matrix.Row

*i*of`U`

contains the forecasted unconditional disturbances for the*i*th period.

**Data Types: **`double`

## Examples

### Forecast Responses of a Regression Model with ARIMA Errors

Forecast responses from the following regression model with ARMA(2,1) errors over a 30-period horizon:

$$\begin{array}{l}\begin{array}{c}{y}_{t}={X}_{t}\left[\begin{array}{c}0.1\\ -0.2\end{array}\right]+{u}_{t}\\ {u}_{t}=0.5{u}_{t-1}-0.8{u}_{t-2}+{\epsilon}_{t}-0.5{\epsilon}_{t-1},\end{array}\end{array}$$

where $${\epsilon}_{t}$$ is Gaussian with variance 0.1.

Specify the model. Simulate responses from the model and two predictor series.

Mdl0 = regARIMA('Intercept',0,'AR',{0.5 -0.8},... 'MA',-0.5,'Beta',[0.1 -0.2],'Variance',0.1); rng(1); % For reproducibility X = randn(130,2); y = simulate(Mdl0,130,'X',X);

Fit the model to the first 100 observations, and reserve the remaining 30 observations to evaluate forecast performance.

Mdl = regARIMA('ARLags',1:2); EstMdl = estimate(Mdl,y(1:100),'X',X(1:100,:));

Regression with ARMA(2,0) Error Model (Gaussian Distribution): Value StandardError TStatistic PValue ________ _____________ __________ __________ Intercept 0.004358 0.021314 0.20446 0.83799 AR{1} 0.36833 0.067103 5.4891 4.0408e-08 AR{2} -0.75063 0.090865 -8.2609 1.4453e-16 Beta(1) 0.076398 0.023008 3.3205 0.00089863 Beta(2) -0.1396 0.023298 -5.9919 2.0741e-09 Variance 0.079876 0.01342 5.9522 2.6453e-09

`EstMdl`

is a new `regARIMA`

model containing the estimates. The estimates are close to their true values.

Use `EstMdl`

to forecast a 30-period horizon. Visually compare the forecasts to the holdout data using a plot.

[yF,yMSE] = forecast(EstMdl,30,'Y0',y(1:100),... 'X0',X(1:100,:),'XF',X(101:end,:)); figure plot(y,'Color',[.7,.7,.7]); hold on plot(101:130,yF,'b','LineWidth',2); plot(101:130,yF+1.96*sqrt(yMSE),'r:',... 'LineWidth',2); plot(101:130,yF-1.96*sqrt(yMSE),'r:','LineWidth',2); h = gca; ph = patch([repmat(101,1,2) repmat(130,1,2)],... [h.YLim fliplr(h.YLim)],... [0 0 0 0],'b'); ph.FaceAlpha = 0.1; legend('Observed','Forecast',... '95% Forecast Interval','Location','Best'); title(['30-Period Forecasts and Approximate 95% '... 'Forecast Intervals']) axis tight hold off

Many observations in the holdout sample fall beyond the 95% forecast intervals. Two reasons for this are:

The predictors are randomly generated in this example.

`estimate`

treats the predictors as fixed. The 95% forecast intervals based on the estimates from`estimate`

do not account for the variability in the predictors.By shear chance, the estimation period seems less volatile than the forecast period.

`estimate`

uses the less volatile estimation period data to estimate the parameters. Therefore, forecast intervals based on the estimates should not cover observations that have an underlying innovations process with larger variability.

### Forecast the GDP Using Regression Model with ARMA Errors

Forecast stationary, log GDP using a regression model with ARMA(1,1) errors, including CPI as a predictor.

Load the U.S. macroeconomic data set and preprocess the data.

load Data_USEconModel; logGDP = log(DataTimeTable.GDP); dlogGDP = diff(logGDP); % For stationarity dCPI = diff(DataTimeTable.CPIAUCSL); % For stationarity numObs = length(dlogGDP); gdp = dlogGDP(1:end-15); % Estimation sample cpi = dCPI(1:end-15); T = length(gdp); % Effective sample size frstHzn = T+1:numObs; % Forecast horizon hoCPI = dCPI(frstHzn); % Holdout sample dts = DataTimeTable.Time(2:end);

Fit a regression model with ARMA(1,1) errors.

Mdl = regARIMA('ARLags',1,'MALags',1); EstMdl = estimate(Mdl,gdp,'X',cpi);

Regression with ARMA(1,1) Error Model (Gaussian Distribution): Value StandardError TStatistic PValue __________ _____________ __________ __________ Intercept 0.014793 0.0016289 9.0818 1.0684e-19 AR{1} 0.57601 0.10009 5.7548 8.6755e-09 MA{1} -0.15258 0.11978 -1.2738 0.20272 Beta(1) 0.0028972 0.0013989 2.071 0.038355 Variance 9.5734e-05 6.5562e-06 14.602 2.723e-48

Forecast the GDP rate over a 15-quarter horizon. Use the estimation sample as a presample for the forecast.

[gdpF,gdpMSE] = forecast(EstMdl,15,'Y0',gdp,... 'X0',cpi,'XF',hoCPI);

Plot the forecasts and 95% forecast intervals.

figure h1 = plot(dts(end-65:end),dlogGDP(end-65:end),... 'Color',[.7,.7,.7]); datetick hold on h2 = plot(dts(frstHzn),gdpF,'b','LineWidth',2); h3 = plot(dts(frstHzn),gdpF+1.96*sqrt(gdpMSE),'r:',... 'LineWidth',2); plot(dts(frstHzn),gdpF-1.96*sqrt(gdpMSE),'r:','LineWidth',2); ha = gca; title('{\bf GDP Rate Forecasts and Approximate 95% Intervals}') ph = patch([repmat(dts(frstHzn(1)),1,2) repmat(dts(frstHzn(end)),1,2)],... [ha.YLim fliplr(ha.YLim)],... [0 0 0 0],'b'); ph.FaceAlpha = 0.1; legend([h1 h2 h3],{'Observed GDP rate','Forecasted GDP rate ',... '95% Forecast Interval'},'Location','Best','AutoUpdate','off'); axis tight hold off

### Forecast Regression Model with ARIMA Errors With Known Intercept

Forecast unit root nonstationary, log GDP using a regression model with ARIMA(1,1,1) errors, including CPI as a predictor and a known intercept.

Load the U.S. Macroeconomic data set and preprocess the data.

load Data_USEconModel; numObs = length(DataTimeTable.GDP); logGDP = log(DataTimeTable.GDP(1:end-15)); cpi = DataTimeTable.CPIAUCSL(1:end-15); T = length(logGDP); % Effective sample size frstHzn = T+1:numObs; % Forecast horizon hoCPI = DataTimeTable.CPIAUCSL(frstHzn); % Holdout sample dt = DataTimeTable.Time;

Specify the model for the estimation period.

Mdl = regARIMA('ARLags',1,'MALags',1,'D',1);

The intercept is not identifiable in a model with integrated errors, so fix its value before estimation. One way to do this is to estimate the intercept using simple linear regression.

Reg4Int = [ones(T,1), cpi]\logGDP; intercept = Reg4Int(1);

Consider performing a sensitivity analysis by using a grid of intercepts.

Set the intercept and fit the regression model with ARIMA(1,1,1) errors.

Mdl.Intercept = intercept; EstMdl = estimate(Mdl,logGDP,'X',cpi,'Display','off')

EstMdl = regARIMA with properties: Description: "ARIMA(1,1,1) Error Model (Gaussian Distribution)" Distribution: Name = "Gaussian" Intercept: 5.80142 Beta: [0.00396698] P: 2 D: 1 Q: 1 AR: {0.922709} at lag [1] SAR: {} MA: {-0.387843} at lag [1] SMA: {} Variance: 0.000108943 Regression with ARIMA(1,1,1) Error Model (Gaussian Distribution)

Forecast GDP over a 15-quarter horizon. Use the estimation sample as a presample for the forecast.

[gdpF,gdpMSE] = forecast(EstMdl,15,'Y0',logGDP,... 'X0',cpi,'XF',hoCPI);

Plot the forecasts and 95% forecast intervals.

figure h1 = plot(dt(end-65:end),log(DataTimeTable.GDP(end-65:end)),... 'Color',[.7,.7,.7]); hold on h2 = plot(dt(frstHzn),gdpF,'b','LineWidth',2); h3 = plot(dt(frstHzn),gdpF+1.96*sqrt(gdpMSE),'r:',... 'LineWidth',2); plot(dt(frstHzn),gdpF-1.96*sqrt(gdpMSE),'r:',... 'LineWidth',2); ha = gca; title('{\bf Log GDP Forecasts and Approximate 95% Intervals}') ph = patch([repmat(dt(frstHzn(1)),1,2) repmat(dt(frstHzn(end)),1,2)],... [ha.YLim fliplr(ha.YLim)],... [0 0 0 0],'b'); ph.FaceAlpha = 0.1; legend([h1 h2 h3],{'Observed GDP','Forecasted GDP',... '95% Forecast Interval'},'Location','Best','AutoUpdate','off'); axis tight hold off

The unconditional disturbances, $${u}_{t}$$, are nonstationary, therefore the widths of the forecast intervals grow with time.

## More About

### Time Base Partitions for Forecasting

*Time base partitions for forecasting* are
two disjoint, contiguous intervals of the time base; each interval contains time
series data for forecasting a dynamic model. The *forecast
period* (forecast horizon) is a `numperiods`

length partition at the end of the time base during which
`forecast`

generates forecasts `Y`

from
the dynamic model `Mdl`

. The *presample
period* is the entire partition occurring before the forecast period.
`forecast`

can require observed responses
`Y0`

, regression data `X0`

, unconditional
disturbances `U0`

, or innovations `E0`

in the
presample period to initialize the dynamic model for forecasting. The model
structure determines the types and amounts of required presample
observations.

A common practice is to fit a dynamic model to a portion of the data set, then
validate the predictability of the model by comparing its forecasts to observed
responses. During forecasting, the presample period contains the data to which the
model is fit, and the forecast period contains the holdout sample for validation.
Suppose that *y _{t}* is an observed response
series;

*x*

_{1,t},

*x*

_{2,t}, and

*x*

_{3,t}are observed exogenous series; and time

*t*= 1,…,

*T*. Consider forecasting responses from a dynamic model of

*y*

_{t}containing a regression component

`numperiods`

= *K*periods. Suppose that the dynamic model is fit to the data in the interval [1,

*T*–

*K*] (for more details, see

`estimate`

). This figure shows the time base partitions for
forecasting.For example, to generate forecasts `Y`

from a regression model
with AR(2) errors, `forecast`

requires presample unconditional
disturbances `U0`

and future predictor data `XF`

.

`forecast`

infers unconditional disturbances given enough readily available presample responses and predictor data. To initialize an AR(2) error model,`Y0`

= $${\left[\begin{array}{cc}{y}_{T-K-1}& {y}_{T-K}\end{array}\right]}^{\prime}$$ and`X0`

= $$\left[\begin{array}{ccc}{x}_{1,T-K-1}& {x}_{2,T-K-1}& {x}_{3,T-K-1}\\ {x}_{1,T-K-1}& {x}_{2,T-K}& {x}_{3,T-K}\end{array}\right]$$.To model,

`forecast`

requires future exogenous data`XF`

= $$\left[\begin{array}{ccc}{x}_{1,\left(T-K+1\right):T}& {x}_{2,\left(T-K+1\right):T}& {x}_{3,\left(T-K+1\right):T}\end{array}\right]$$.

This figure shows the arrays of required observations for the general case, with corresponding input and output arguments.

## Algorithms

`forecast`

computes the forecasted response MSEs,`YMSE`

, by treating the predictor data matrices (`X0`

and`XF`

) as nonstochastic and statistically independent of the model innovations. Therefore,`YMSE`

reflects the variance associated with the unconditional disturbances of the ARIMA error model alone.`forecast`

uses`Y0`

and`X0`

to infer`U0`

. Therefore, if you specify`U0`

,`forecast`

ignores`Y0`

and`X0`

.

## References

[1] Box, G. E. P., G. M. Jenkins, and G. C. Reinsel. *Time Series
Analysis: Forecasting and Control*. 3rd ed. Englewood Cliffs, NJ:
Prentice Hall, 1994.

[2] Davidson, R., and J. G. MacKinnon. *Econometric Theory and
Methods*. Oxford, UK: Oxford University Press, 2004.

[3] Enders, W. *Applied Econometric Time Series*. Hoboken,
NJ: John Wiley & Sons, Inc., 1995.

[4] Hamilton, J. D. *Time Series Analysis*. Princeton, NJ:
Princeton University Press, 1994.

[5] Pankratz, A. *Forecasting with Dynamic Regression
Models.* John Wiley & Sons, Inc., 1991.

[6] Tsay, R. S. *Analysis of Financial Time Series*. 2nd
ed. Hoboken, NJ: John Wiley & Sons, Inc., 2005.

## Open Example

You have a modified version of this example. Do you want to open this example with your edits?

## MATLAB Command

You clicked a link that corresponds to this MATLAB command:

Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.

Select a Web Site

Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .

You can also select a web site from the following list:

## How to Get Best Site Performance

Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.

### Americas

- América Latina (Español)
- Canada (English)
- United States (English)

### Europe

- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)

- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)