Documentation

RegressionPartitionedEnsemble

Package: classreg.learning.partition
Superclasses: RegressionPartitionedModel

Cross-validated regression ensemble

Description

RegressionPartitionedEnsemble is a set of regression ensembles trained on cross-validated folds. Estimate the quality of classification by cross validation using one or more “kfold” methods: kfoldfun, kfoldLoss, or kfoldPredict. Every “kfold” method uses models trained on in-fold observations to predict response for out-of-fold observations. For example, suppose you cross validate using five folds. In this case, every training fold contains roughly 4/5 of the data and every test fold contains roughly 1/5 of the data. The first model stored in Trained{1} was trained on X and Y with the first 1/5 excluded, the second model stored in Trained{2} was trained on X and Y with the second 1/5 excluded, and so on. When you call kfoldPredict, it computes predictions for the first 1/5 of the data using the first model, for the second 1/5 of data using the second model and so on. In short, response for every observation is computed by kfoldPredict using the model trained without this observation.

Construction

cvens = crossval(ens) creates a cross-validated ensemble from ens, a regression ensemble. For syntax details, see the crossval method reference page.

cvens = fitrensemble(X,Y,Name,Value) creates a cross-validated ensemble when Name is one of 'crossval', 'kfold', 'holdout', 'leaveout', or 'cvpartition'. For syntax details, see the fitrensemble function reference page.

Input Arguments

 ens A regression ensemble constructed with fitrensemble.

Properties

 BinEdges Bin edges for numeric predictors, specified as a cell array of p numeric vectors, where p is the number of predictors. Each vector includes the bin edges for a numeric predictor. The element in the cell array for a categorical predictor is empty because the software does not bin categorical predictors.The software bins numeric predictors only if you specify the 'NumBins' name-value pair argument as a positive integer scalar when training a model with tree learners. The BinEdges property is empty if the 'NumBins' value is empty (default).You can reproduce the binned predictor data Xbinned by using the BinEdges property of the trained model mdl.X = mdl.X; % Predictor data Xbinned = zeros(size(X)); edges = mdl.BinEdges; % Find indices of binned predictors. idxNumeric = find(~cellfun(@isempty,edges)); if iscolumn(idxNumeric) idxNumeric = idxNumeric'; end for j = idxNumeric x = X(:,j); % Convert x to array if x is a table. if istable(x) x = table2array(x); end % Group x into bins by using the discretize function. xbinned = discretize(x,[-inf; edges{j}; inf]); Xbinned(:,j) = xbinned; endXbinned contains the bin indices, ranging from 1 to the number of bins, for numeric predictors. Xbinned values are 0 for categorical predictors. If X contains NaNs, then the corresponding Xbinned values are NaNs. CategoricalPredictors Categorical predictor indices, specified as a vector of positive integers. CategoricalPredictors contains index values corresponding to the columns of the predictor data that contain categorical predictors. If none of the predictors are categorical, then this property is empty ([]). CrossValidatedModel Name of the cross-validated model, a character vector. Kfold Number of folds used in a cross-validated tree, a positive integer. ModelParameters Object holding parameters of tree. NumObservations Numeric scalar containing the number of observations in the training data. NTrainedPerFold Vector of Kfold elements. Each entry contains the number of trained learners in this cross-validation fold. Partition The partition of class cvpartition used in creating the cross-validated ensemble. PredictorNames A cell array of names for the predictor variables, in the order in which they appear in X. ResponseName Name of the response variable Y, a character vector. ResponseTransform Function handle for transforming scores, or character vector representing a built-in transformation function. 'none' means no transformation; equivalently, 'none' means @(x)x. Add or change a ResponseTransform function using dot notation: ens.ResponseTransform = @function Trainable Cell array of ensembles trained on cross-validation folds. Every ensemble is full, meaning it contains its training data and weights. Trained Cell array of compact ensembles trained on cross-validation folds. W The scaled weights, a vector with length n, the number of rows in X. X A matrix or table of predictor values. Each column of X represents one variable, and each row represents one observation. Y A numeric column vector with the same number of rows as X. Each entry in Y is the response to the data in the corresponding row of X.

Methods

 kfoldLoss Cross-validation loss of partitioned regression ensemble resume Resume training ensemble

Inherited Methods

 kfoldLoss Cross-validation loss of partitioned regression model kfoldPredict Predict response for observations not used for training kfoldfun Cross validate function

Copy Semantics

Value. To learn how value classes affect copy operations, see Copying Objects (MATLAB).

Examples

collapse all

Construct a partitioned regression ensemble, and examine the cross-validation losses for the folds.

Load the carsmall data set.

Create a subset of variables.

XX = [Cylinders Displacement Horsepower Weight];
YY = MPG;

Construct the ensemble model.

rens = fitrensemble(XX,YY);

Create a cross-validated ensemble from rens.

rng(10,'twister') % For reproducibility
cvrens = crossval(rens);

Examine the cross-validation losses.

L = kfoldLoss(cvrens,'mode','individual')
L = 10×1

21.4489
48.4388
28.2223
17.5354
29.9441
49.5254
43.8872
31.0152
31.6388
8.9607

L is a vector containing the cross-validation loss for each trained learner in the ensemble.