# kfoldPredict

Predict responses for observations in cross-validated regression model

## Syntax

## Description

specifies options using one or more name-value arguments. For example,
`yFit`

= kfoldPredict(`CVMdl`

,`Name,Value`

)`'IncludeInteractions',true`

specifies to include interaction terms in
computations for generalized additive models.

`[`

also returns the standard deviations and prediction intervals of the response variable,
evaluated at each observation in the predictor data `yFit`

,`ySD`

,`yInt`

] = kfoldPredict(___)`CVMdl.X`

, using
any of the input argument combinations in the previous syntaxes. This syntax applies only
to generalized additive models (GAM) for which the `IsStandardDeviationFit`

property of `CVMdl`

is
`true`

.

## Examples

### Compute Cross-Validation Loss Manually

When you create a cross-validated regression model, you can compute the mean squared error (MSE) by using the `kfoldLoss`

object function. Alternatively, you can predict responses for validation-fold observations using `kfoldPredict`

and compute the MSE manually.

Load the `carsmall`

data set. Specify the predictor data `X`

and the response data `Y`

.

```
load carsmall
X = [Cylinders Displacement Horsepower Weight];
Y = MPG;
```

Train a cross-validated regression tree model. By default, the software implements 10-fold cross-validation.

rng('default') % For reproducibility CVMdl = fitrtree(X,Y,'CrossVal','on');

Compute the 10-fold cross-validation MSE by using `kfoldLoss`

.

L = kfoldLoss(CVMdl)

L = 29.4963

Predict the responses `yfit`

by using the cross-validated regression model. Compute the mean squared error between `yfit`

and the true responses `CVMdl.Y`

. The computed MSE matches the loss value returned by `kfoldLoss`

.

yfit = kfoldPredict(CVMdl); mse = mean((yfit - CVMdl.Y).^2)

mse = 29.4963

## Input Arguments

`CVMdl`

— Cross-validated partitioned regression model

`RegressionPartitionedModel`

object | `RegressionPartitionedEnsemble`

object | `RegressionPartitionedGAM`

object | `RegressionPartitionedGP`

object | `RegressionPartitionedNeuralNetwork`

object | `RegressionPartitionedSVM`

object

Cross-validated partitioned regression model, specified as a `RegressionPartitionedModel`

, `RegressionPartitionedEnsemble`

, `RegressionPartitionedGAM`

, `RegressionPartitionedGP`

, `RegressionPartitionedNeuralNetwork`

, or `RegressionPartitionedSVM`

object. You can create the object in two ways:

Pass a trained regression model listed in the following table to its

`crossval`

object function.Train a regression model using a function listed in the following table and specify one of the cross-validation name-value arguments for the function.

Regression Model | Function |
---|---|

`RegressionEnsemble` | `fitrensemble` |

`RegressionGAM` | `fitrgam` |

`RegressionGP` | `fitrgp` |

`RegressionNeuralNetwork` | `fitrnet` |

`RegressionSVM` | `fitrsvm` |

`RegressionTree` | `fitrtree` |

### Name-Value Arguments

Specify optional pairs of arguments as
`Name1=Value1,...,NameN=ValueN`

, where `Name`

is
the argument name and `Value`

is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.

*
Before R2021a, use commas to separate each name and value, and enclose*
`Name`

*in quotes.*

**Example: **`'Alpha',0.01,'IncludeInteractions',false`

specifies the
confidence level as 99% and excludes interaction terms from computations for a generalized
additive model.

`Alpha`

— Significance level

0.05 (default) | numeric scalar in `[0,1]`

Significance level for the confidence level of the prediction intervals
`yInt`

, specified as a numeric scalar in the range
`[0,1]`

. The confidence level of `yInt`

is equal
to `100(1 – Alpha)%`

.

This argument is valid only for a generalized additive model object that includes
the standard deviation fit. That is, you can specify this argument only when
`CVMdl`

is `RegressionPartitionedGAM`

and the `IsStandardDeviationFit`

property of `CVMdl`

is
`true`

.

**Example: **`'Alpha',0.01`

**Data Types: **`single`

| `double`

`IncludeInteractions`

— Flag to include interaction terms

`true`

| `false`

Flag to include interaction terms of the model, specified as `true`

or
`false`

. This argument is valid only for a generalized
additive model (GAM). That is, you can specify this argument only when
`CVMdl`

is `RegressionPartitionedGAM`

.

The default value is `true`

if the models in
`CVMdl`

(`CVMdl.Trained`

) contain
interaction terms. The value must be `false`

if the models do not
contain interaction terms.

**Data Types: **`logical`

`PredictionForMissingValue`

— Predicted response value to use for observations with missing predictor values

`"median"`

| `"mean"`

| numeric scalar

*Since R2023b*

Predicted response value to use for observations with missing predictor values,
specified as `"median"`

, `"mean"`

, or a numeric
scalar. This argument is valid only for a Gaussian process regression, neural network,
or support vector machine model. That is, you can specify this argument only when
`CVMdl`

is a `RegressionPartitionedGP`

,
`RegressionPartitionedNeuralNetwork`

, or
`RegressionPartitionedSVM`

object.

Value | Description |
---|---|

`"median"` |
This value is
the default when |

`"mean"` | `kfoldPredict` uses the mean of the observed response
values in the training-fold data as the predicted response value for
observations with missing predictor values. |

Numeric scalar | `kfoldPredict` uses this value as the predicted
response value for observations with missing predictor values. |

**Example: **`"PredictionForMissingValue","mean"`

**Example: **`"PredictionForMissingValue",NaN`

**Data Types: **`single`

| `double`

| `char`

| `string`

## Output Arguments

`yFit`

— Predicted responses

numeric vector

Predicted responses, returned as an *n*-by-1 numeric vector, where
*n* is the number of observations. (*n* is
`size(CVMdl.X,1)`

when observations are in rows.) Each entry of
`yFit`

corresponds to the predicted response for the corresponding
row of `CVMdl.X`

.

If you use a holdout validation technique to create `CVMdl`

(that
is, if `CVMdl.KFold`

is `1`

), then
`yFit`

has `NaN`

values for training-fold
observations.

`ySD`

— Standard deviations of response variable

column vector

Standard deviations of the response variable, evaluated at each observation in the
predictor data

, returned as a column
vector of length `CVMdl`

.X*n*, where *n* is the number of
observations in

. The
`CVMdl`

.X`i`

th element `ySD(i)`

contains the standard
deviation of the `i`

th response for the `i`

th
observation `CVMdl.X(i,:)`

, estimated using the trained standard
deviation model in `CVMdl`

.

This argument is valid only for a generalized additive model object that includes
the standard deviation fit. That is, `kfoldPredict`

can return this
argument only when `CVMdl`

is `RegressionPartitionedGAM`

and the `IsStandardDeviationFit`

property of `CVMdl`

is
`true`

.

`yInt`

— Prediction intervals of response variable

two-column matrix

Prediction intervals of the response variable, evaluated at each observation in the
predictor data

, returned as an
`CVMdl`

.X*n*-by-2 matrix, where *n* is the number of
observations in

. The
`CVMdl`

.X`i`

th row `yInt(i,:)`

contains the estimated
`100(1 – `

prediction
interval of the `Alpha`

)%`i`

th response for the `i`

th
observation `CVMdl.X(i,:)`

using

. The `ySD`

(i)`Alpha`

value
is the probability that the prediction interval does not contain the true response value
`CVMdl.Y(i)`

. The first column of `yInt`

contains
the lower limits of the prediction intervals, and the second column contains the upper
limits.

This argument is valid only for a generalized additive model object that includes
the standard deviation fit. That is, `kfoldPredict`

can return this
argument only when `CVMdl`

is `RegressionPartitionedGAM`

and the `IsStandardDeviationFit`

property of `CVMdl`

is
`true`

.

## Extended Capabilities

### GPU Arrays

Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Usage notes and limitations:

This function fully supports GPU arrays for the following models.

`RegressionPartitionedModel`

object fitted using`fitrtree`

, or by passing a`RegressionTree`

object to`crossval`

For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).

## Version History

**Introduced in R2011a**

### R2024b: Specify GPU arrays for neural network models (requires Parallel Computing Toolbox)

`kfoldPredict`

fully supports GPU arrays for `RegressionPartitionedNeuralNetwork`

models.

### R2023b: Specify predicted response value to use for observations with missing predictor values

Starting in R2023b, when you predict or compute the loss, some regression models allow you to specify the predicted response value for observations with missing predictor values. Specify the `PredictionForMissingValue`

name-value argument to use a numeric scalar, the training set median, or the training set mean as the predicted value. When computing the loss, you can also specify to omit observations with missing predictor values.

This table lists the object functions that support the
`PredictionForMissingValue`

name-value argument. By default, the
functions use the training set median as the predicted response value for observations with
missing predictor values.

Model Type | Model Objects | Object Functions |
---|---|---|

Gaussian process regression (GPR) model | `RegressionGP` , `CompactRegressionGP` | `loss` , `predict` , `resubLoss` , `resubPredict` |

`RegressionPartitionedGP` | `kfoldLoss` , `kfoldPredict` | |

Gaussian kernel regression model | `RegressionKernel` | `loss` , `predict` |

`RegressionPartitionedKernel` | `kfoldLoss` , `kfoldPredict` | |

Linear regression model | `RegressionLinear` | `loss` , `predict` |

`RegressionPartitionedLinear` | `kfoldLoss` , `kfoldPredict` | |

Neural network regression model | `RegressionNeuralNetwork` , `CompactRegressionNeuralNetwork` | `loss` , `predict` , `resubLoss` , `resubPredict` |

`RegressionPartitionedNeuralNetwork` | `kfoldLoss` , `kfoldPredict` | |

Support vector machine (SVM) regression model | `RegressionSVM` , `CompactRegressionSVM` | `loss` , `predict` , `resubLoss` , `resubPredict` |

`RegressionPartitionedSVM` | `kfoldLoss` , `kfoldPredict` |

In previous releases, the regression model `loss`

and `predict`

functions listed above used `NaN`

predicted response values for observations with missing predictor values. The software omitted observations with missing predictor values from the resubstitution ("resub") and cross-validation ("kfold") computations for prediction and loss.

### R2023a: GPU support for `RegressionPartitionedSVM`

models

Starting in R2023a, `kfoldPredict`

fully supports GPU arrays for `RegressionPartitionedSVM`

models.

## MATLAB Command

You clicked a link that corresponds to this MATLAB command:

Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.

Select a Web Site

Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .

You can also select a web site from the following list:

## How to Get Best Site Performance

Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.

### Americas

- América Latina (Español)
- Canada (English)
- United States (English)

### Europe

- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)

- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)