# ecmmvnrmle

Multivariate normal regression with missing data

## Syntax

```[Parameters,Covariance,Resid,Info] = ecmmvnrmle(Data,Design,MaxIterations,TolParam,TolObj,Param0,Covar0,CovarFormat)
```

## Arguments

 `Data` `NUMSAMPLES`-by-`NUMSERIES` matrix with `NUMSAMPLES` samples of a `NUMSERIES`-dimensional random vector. Missing values are represented as `NaN`s. Only samples that are entirely `NaN`s are ignored. (To ignore samples with at least one `NaN`, use `mvnrmle`.) `Design` A matrix or a cell array that handles two model structures:If `NUMSERIES = 1`, `Design` is a `NUMSAMPLES`-by-`NUMPARAMS` matrix with known values. This structure is the standard form for regression on a single series.If `NUMSERIES` ≥ `1`, `Design` is a cell array. The cell array contains either one or `NUMSAMPLES` cells. Each cell contains a `NUMSERIES`-by-`NUMPARAMS` matrix of known values.If `Design` has a single cell, it is assumed to have the same `Design` matrix for each sample. If `Design` has more than one cell, each cell contains a `Design` matrix for each sample. `MaxIterations` (Optional) Maximum number of iterations for the estimation algorithm. Default value is 100. `TolParam` (Optional) Convergence tolerance for estimation algorithm based on changes in model parameter estimates. Default value is `sqrt(eps)` which is about 1.0e-8 for double precision. The convergence test for changes in model parameters is `$‖Para{m}_{k}-Para{m}_{k-1}‖` where `Param` represents the output `Parameters`, and iteration k = 2, 3, ... . Convergence is assumed when both the `TolParam` and `TolObj` conditions are satisfied. If both `TolParam `≤` 0` and `TolObj `≤` 0`, do the maximum number of iterations (`MaxIterations`), whatever the results of the convergence tests. `TolObj` (Optional) Convergence tolerance for estimation algorithm based on changes in the objective function. Default value is eps ∧ 3/4 which is about 1.0e-12 for double precision. The convergence test for changes in the objective function is `$|Ob{j}_{k}-Ob{j}_{k-1}|<\text{\hspace{0.17em}}TolObj×\left(1+|Ob{j}_{k}|\right)$` for iteration k = 2, 3, ... . Convergence is assumed when both the `TolParam` and `TolObj` conditions are satisfied. If both `TolParam ` ≤ `0` and `TolObj ` ≤ `0`, do the maximum number of iterations (`MaxIterations`), whatever the results of the convergence tests. `Param0` (Optional) `NUMPARAMS`-by-`1` column vector that contains a user-supplied initial estimate for the parameters of the regression model. `Covar0` (Optional) `NUMSERIES`-by-`NUMSERIES` matrix that contains a user-supplied initial or known estimate for the covariance matrix of the regression residuals. `CovarFormat` (Optional) Character vector that specifies the format for the covariance matrix. The choices are: `'full'` — Default method. Compute the full covariance matrix.`'diagonal'` — Force the covariance matrix to be a diagonal matrix.

## Description

```[Parameters,Covariance,Resid,Info] = ecmmvnrmle(Data,Design,MaxIterations,TolParam,TolObj,Param0,Covar0,CovarFormat)``` estimates a multivariate normal regression model with missing data. The model has the form

`$Dat{a}_{k}\sim N\left(Desig{n}_{k}×Parameters,\text{\hspace{0.17em}}Covariance\right)$`

for samples k = 1, ... , `NUMSAMPLES`.

`ecmmvnrmle` estimates a `NUMPARAMS`-by-`1` column vector of model parameters called `Parameters`, and a `NUMSERIES`-by-`NUMSERIES` matrix of covariance parameters called `Covariance`.

`ecmmvnrmle(Data, Design)` with no output arguments plots the log-likelihood function for each iteration of the algorithm.

To summarize the outputs of `ecmmvnrmle`:

• `Parameters` is a `NUMPARAMS`-by-`1` column vector of estimates for the parameters of the regression model.

• `Covariance` is a `NUMSERIES`-by-`NUMSERIES` matrix of estimates for the covariance of the regression model's residuals.

• `Resid` is a `NUMSAMPLES`-by-`NUMSERIES` matrix of residuals from the regression. For any missing values in `Data`, the corresponding residual is the difference between the conditionally imputed value for `Data` and the model, that is, the imputed residual.

Note

The covariance estimate `Covariance` cannot be derived from the residuals.

Another output, `Info`, is a structure that contains additional information from the regression. The structure has these fields:

• `Info.Obj` — A variable-extent column vector, with no more than `MaxIterations` elements, that contain each value of the objective function at each iteration of the estimation algorithm. The last value in this vector, `Obj``(end)`, is the terminal estimate of the objective function. If you do maximum likelihood estimation, the objective function is the log-likelihood function.

• `Info.PrevParameters``NUMPARAMS`-by-`1` column vector of estimates for the model parameters from the iteration just prior to the terminal iteration.`nfo.PrevCovariance``NUMSERIES`-by-`NUMSERIES` matrix of estimates for the covariance parameters from the iteration just prior to the terminal iteration.

## Notes

`ecmmvnrmle` does not accept an initial parameter vector, since the parameters are estimated directly from the first iteration onward.

You can configure `Design` as a matrix if ```NUMSERIES = 1``` or as a cell array if `NUMSERIES `` 1`.

• If `Design` is a cell array and `NUMSERIES` = `1`, each cell contains a `NUMPARAMS` row vector.

• If `Design` is a cell array and `NUMSERIES` > `1`, each cell contains a `NUMSERIES`-by-`NUMPARAMS` matrix.

These points concern how `Design` handles missing data:

• Although `Design` should not have `NaN` values, ignored samples due to `NaN` values in `Data` are also ignored in the corresponding `Design` array.

• If `Design` is a `1`-by-`1` cell array, which has a single `Design` matrix for each sample, no `NaN` values are permitted in the array. A model with this structure must have `NUMSERIES``NUMPARAMS` with `rank(Design{1}) = NUMPARAMS`.

• `ecmmvnrmle` is more strict than `mvnrmle` about the presence of `NaN` values in the `Design` array.

Use the estimates in the optional output structure `Info` for diagnostic purposes.

## References

Roderick J. A. Little and Donald B. Rubin. Statistical Analysis with Missing Data. 2nd Edition. John Wiley & Sons, Inc., 2002.

Xiao-Li Meng and Donald B. Rubin. “Maximum Likelihood Estimation via the ECM Algorithm.” Biometrika. Vol. 80, No. 2, 1993, pp. 267–278.

Joe Sexton and Anders Rygh Swensen. “ECM Algorithms that Converge at the Rate of EM.” Biometrika. Vol. 87, No. 3, 2000, pp. 651–662.

A. P. Dempster, N.M. Laird, and D. B. Rubin. “Maximum Likelihood from Incomplete Data via the EM Algorithm.” Journal of the Royal Statistical Society. Series B, Vol. 39, No. 1, 1977, pp. 1–37.