# kfoldPredict

Predict responses for observations not used for training

## Description

returns
cross-validated predicted responses by the cross-validated linear
regression model `YHat`

= kfoldPredict(`CVMdl`

)`CVMdl`

. That is, for every fold, `kfoldPredict`

predicts
responses for observations that it holds out when it trains using
all other observations.

`YHat`

contains predicted responses for each
regularization strength in the linear regression models that compose `CVMdl`

.

## Input Arguments

`CVMdl`

— Cross-validated, linear regression model

`RegressionPartitionedLinear`

model object

Cross-validated, linear regression model, specified as a `RegressionPartitionedLinear`

model object. You can create a
`RegressionPartitionedLinear`

model using `fitrlinear`

and specifying any of the one of the cross-validation,
name-value pair arguments, for example, `CrossVal`

.

To obtain estimates, kfoldPredict applies the same data used to cross-validate the linear
regression model (`X`

and `Y`

).

## Output Arguments

`YHat`

— Cross-validated predicted responses

numeric array

Cross-validated predicted responses, returned as an
*n*-by-*L* numeric array.
*n* is the number of observations in the predictor data
that created `CVMdl`

(see `X`

) and
*L* is the number of regularization strengths in
`CVMdl.Trained{1}.Lambda`

.
`YHat(`

is the predicted response for observation * i*,

*)*

`j`

*using the linear regression model that has regularization strength*

`i`

`CVMdl.Trained{1}.Lambda(``j`

)

.The predicted response using the model with regularization strength *j* is $${\widehat{y}}_{j}=x{\beta}_{j}+{b}_{j}.$$

*x*is an observation from the predictor data matrix`X`

, and is row vector.$${\beta}_{j}$$ is the estimated column vector of coefficients. The software stores this vector in

`Mdl.Beta(:,`

.)`j`

$${b}_{j}$$ is the estimated, scalar bias, which the software stores in

`Mdl.Bias(`

.)`j`

## Examples

### Predict Cross-Validated Responses

Simulate 10000 observations from this model

$$y={x}_{100}+2{x}_{200}+e.$$

$$X={x}_{1},...,{x}_{1000}$$ is a 10000-by-1000 sparse matrix with 10% nonzero standard normal elements.

*e*is random normal error with mean 0 and standard deviation 0.3.

```
rng(1) % For reproducibility
n = 1e4;
d = 1e3;
nz = 0.1;
X = sprandn(n,d,nz);
Y = X(:,100) + 2*X(:,200) + 0.3*randn(n,1);
```

Cross-validate a linear regression model.

CVMdl = fitrlinear(X,Y,'CrossVal','on')

CVMdl = RegressionPartitionedLinear CrossValidatedModel: 'Linear' ResponseName: 'Y' NumObservations: 10000 KFold: 10 Partition: [1x1 cvpartition] ResponseTransform: 'none' Properties, Methods

Mdl1 = CVMdl.Trained{1}

Mdl1 = RegressionLinear ResponseName: 'Y' ResponseTransform: 'none' Beta: [1000x1 double] Bias: 0.0107 Lambda: 1.1111e-04 Learner: 'svm' Properties, Methods

By default, `fitrlinear`

implements 10-fold cross-validation. `CVMdl`

is a `RegressionPartitionedLinear`

model. It contains the property `Trained`

, which is a 10-by-1 cell array holding 10 `RegressionLinear`

models that the software trained using the training set.

Predict responses for observations that `fitrlinear`

did not use in training the folds.

yHat = kfoldPredict(CVMdl);

Because there is one regularization strength in `Mdl`

, `yHat`

is a numeric vector.

### Predict for Models Containing Several Regularization Strengths

Simulate 10000 observations as in Predict Cross-Validated Responses.

```
rng(1) % For reproducibility
n = 1e4;
d = 1e3;
nz = 0.1;
X = sprandn(n,d,nz);
Y = X(:,100) + 2*X(:,200) + 0.3*randn(n,1);
```

Create a set of 15 logarithmically-spaced regularization strengths from $$1{0}^{-5}$$ through $$1{0}^{-1}$$.

Lambda = logspace(-5,-1,15);

Cross-validate the models. To increase execution speed, transpose the predictor data and specify that the observations are in columns. Specify using least squares with a lasso penalty and optimizing the objective function using SpaRSA.

X = X'; CVMdl = fitrlinear(X,Y,'ObservationsIn','columns','KFold',5,'Lambda',Lambda,... 'Learner','leastsquares','Solver','sparsa','Regularization','lasso');

`CVMdl`

is a `RegressionPartitionedLinear`

model. Its `Trained`

property contains a 5-by-1 cell array of trained `RegressionLinear`

models, each one holds out a different fold during training. Because `fitrlinear`

trained using 15 regularization strengths, you can think of each `RegressionLinear`

model as 15 models.

Predict cross-validated responses.

YHat = kfoldPredict(CVMdl); size(YHat)

`ans = `*1×2*
10000 15

YHat(2,:)

`ans = `*1×15*
-1.7338 -1.7332 -1.7319 -1.7299 -1.7266 -1.7239 -1.7135 -1.7210 -1.7324 -1.7063 -1.6397 -1.5112 -1.2631 -0.7841 -0.0096

`YHat`

is a 10000-by-15 matrix. `YHat(2,:)`

is the cross-validated response for observation 2 using the model regularized with all 15 regularization values.

## Version History

**Introduced in R2016a**

## See Also

`RegressionPartitionedLinear`

| `predict`

| `RegressionLinear`

| `fitrlinear`

## Open Example

You have a modified version of this example. Do you want to open this example with your edits?

## MATLAB Command

You clicked a link that corresponds to this MATLAB command:

Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.

Select a Web Site

Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .

You can also select a web site from the following list:

## How to Get Best Site Performance

Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.

### Americas

- América Latina (Español)
- Canada (English)
- United States (English)

### Europe

- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)

- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)