Main Content

Regression diagnostics

`regstats(y,X,`

* model*)

stats = regstats(...)

stats = regstats(y,X,

`model`

`whichstats`

`regstats(y,X,`

performs
a multilinear regression of the responses in * model*)

`y`

on
the predictors in `X`

. `X`

is an `y`

is
an **Note**

By default, `regstats`

adds a first column
of 1s to `X`

, corresponding to a constant term in
the model. Do not enter a column of 1s directly into `X`

.

The optional input * model* controls
the regression model. By default,

`regstats`

uses
a linear additive model with a constant term. `model`

`'linear'`

— Constant and linear terms (the default)`'interaction'`

— Constant, linear, and interaction terms`'quadratic'`

— Constant, linear, interaction, and squared terms`'purequadratic'`

— Constant, linear, and squared terms

Alternatively, `model`

can be a matrix of model
terms accepted by the `x2fx`

function.
See `x2fx`

for a description of
this matrix and for a description of the order in which terms appear.
You can use this matrix to specify other models including ones without
a constant term.

With this syntax, the function displays a graphical user interface (GUI) with a list of diagnostic statistics, as shown in the following figure.

When you select check boxes corresponding to the statistics
you want to compute and click **OK**, `regstats`

returns
the selected statistics to the MATLAB^{®} workspace. The names of
the workspace variables are displayed on the right-hand side of the
interface. You can change the name of the workspace variable to any
valid MATLAB variable name.

`stats = regstats(...)`

creates the structure `stats`

,
whose fields contain all of the diagnostic statistics for the regression.
This syntax does not open the GUI. The fields of `stats`

are
listed in the following table.

Field | Description |
---|---|

`Q` | Q from the QR decomposition
of the design matrix |

`R` | R from the QR decomposition
of the design matrix |

`beta` | Regression coefficients |

`covb` | Covariance of regression coefficients |

`yhat` | Fitted values of the response data |

`r` | Residuals |

`mse` | Mean squared error |

`rsquare` | R^{2} statistic |

`adjrsquare` | Adjusted R^{2} statistic |

`leverage` | Leverage |

`hatmat` | Hat matrix |

`s2_i` | Delete-1 variance |

`beta_i` | Delete-1 coefficients |

`standres` | Standardized residuals |

`studres` | Studentized residuals |

`dfbetas` | Scaled change in regression coefficients |

`dffit` | Change in fitted values |

`dffits` | Scaled change in fitted values |

`covratio` | Change in covariance |

`cookd` | Cook's distance |

`tstat` | t statistics and p-values
for coefficients |

`fstat` | F statistic and p-value |

`dwstat` | Durbin-Watson statistic and p-value |

Note that the fields names of `stats`

correspond
to the names of the variables returned to the MATLAB workspace
when you use the GUI. For example, `stats.beta`

corresponds
to the variable `beta`

that is returned when you
select **Coefficients** in the GUI and click **OK**.

`stats = regstats(y,X,`

returns only the statistics that you specify in * model*,

`whichstats`

`whichstats`

`whichstats`

`'leverage'`

, a string array such as
`["leverage","standres","studres"]`

, or a cell array of character vectors such
as `{'leverage','standres','studres'}`

. Set
`whichstats`

`'all'`

to return all of the
statistics.**Note**

The *F* statistic is computed under the assumption
that the model contains a constant term. It is not correct for models
without a constant. The *R*^{2} statistic
can be negative for models without a constant, which indicates that
the model is not appropriate for the data.

Open the `regstats`

GUI using data from `hald.mat`

:

load hald regstats(heat,ingredients,'linear');

Select **Fitted Values** and **Residuals** in
the GUI:

Click **OK** to export the fitted values
and residuals to the MATLAB workspace in variables named `yhat`

and `r`

,
respectively.

You can create the same variables using the `stats`

output,
without opening the GUI:

whichstats = {'yhat','r'}; stats = regstats(heat,ingredients,'linear',whichstats); yhat = stats.yhat; r = stats.r;

`regstats`

treats`NaN`

values in`X`

or`y`

as missing values.`regstats`

omits observations with missing values from the regression fit.

[1] Belsley, D. A., E. Kuh, and R. E. Welsch. *Regression
Diagnostics*. Hoboken, NJ: John Wiley & Sons, Inc.,
1980.

[2] Chatterjee, S., and A. S. Hadi. “Influential
Observations, High Leverage Points, and Outliers in Linear Regression.” *Statistical
Science*. Vol. 1, 1986, pp. 379–416.

[3] Cook, R. D., and S. Weisberg. *Residuals
and Influence in Regression*. New York: Chapman & Hall/CRC
Press, 1983.

[4] Goodall, C. R. “Computation Using
the QR Decomposition.” *Handbook in Statistics.* Vol.
9, Amsterdam: Elsevier/North-Holland, 1993.