ClassificationPartitionedModel

Cross-validated classification model

Description

ClassificationPartitionedModel is a set of classification models trained on cross-validated folds. Estimate the quality of classification by cross validation using one or more “kfold” methods: kfoldPredict, kfoldLoss, kfoldMargin, kfoldEdge, and kfoldfun.

Every “kfold” method uses models trained on in-fold observations to predict the response for out-of-fold observations. For example, suppose you cross validate using five folds. In this case, the software randomly assigns each observation into five roughly equally sized groups. The training fold contains four of the groups (i.e., roughly 4/5 of the data) and the test fold contains the other group (i.e., roughly 1/5 of the data). In this case, cross validation proceeds as follows:

The software trains the first model (stored in CVMdl.Trained{1}) using the observations in the last four groups and reserves the observations in the first group for validation.
The software trains the second model (stored in CVMdl.Trained{2}) using the observations in the first group and last three groups, and reserves the observations in the second group for validation.
The software proceeds in a similar fashion for the third to fifth models.

If you validate by calling kfoldPredict, it computes predictions for the observations in group 1 using the first model, group 2 for the second model, and so on. In short, the software estimates a response for every observation using the model trained without that observation.

Creation

Description

example

CVMdl = crossval(Mdl) creates a cross-validated classification model from a classification model (Mdl).

Alternatively:

CVDiscrMdl = fitcdiscr(X,Y,Name,Value)
CVKNNMdl = fitcknn(X,Y,Name,Value)
CVNetMdl = fitcnet(X,Y,Name,Value)
CVNBMdl = fitcnb(X,Y,Name,Value)
CVSVMMdl = fitcsvm(X,Y,Name,Value)
CVTreeMdl = fitctree(X,Y,Name,Value)

create a cross-validated model when Name is either 'CrossVal', 'KFold', 'Holdout', 'Leaveout', or 'CVPartition'. For syntax details, see fitcdiscr, fitcknn, fitcnet, fitcnb, fitcsvm, and fitctree.

Input Arguments

expand all

`Mdl` — Classification model
`ClassificationTree` object | `ClassificationDiscriminant` object | `ClassificationNeuralNetwork` object | `ClassificationNaiveBayes` object | `ClassificationKNN` object | `ClassificationSVM` object

A classification model, specified as one of the following:

A classification tree trained using fitctree
A discriminant analysis classifier trained using fitcdiscr
A neural network classifier trained using fitcnet
A naive Bayes classifier trained using fitcnb
A nearest neighbor classifier trained using fitcknn
A support vector machine classifier trained using fitcsvm

Properties

expand all

`BinEdges` — Bin edges for numeric predictors
cell array of p numeric vectors

This property is read-only.

Bin edges for numeric predictors, specified as a cell array of p numeric vectors, where p is the number of predictors. Each vector includes the bin edges for a numeric predictor. The element in the cell array for a categorical predictor is empty because the software does not bin categorical predictors.

The software bins numeric predictors only if you specify the 'NumBins' name-value argument as a positive integer scalar when training a model with tree learners. The BinEdges property is empty if the 'NumBins' value is empty (default).

You can reproduce the binned predictor data Xbinned by using the BinEdges property of the trained model mdl.

X = mdl.X; % Predictor data
Xbinned = zeros(size(X));
edges = mdl.BinEdges;
% Find indices of binned predictors.
idxNumeric = find(~cellfun(@isempty,edges));
if iscolumn(idxNumeric)
    idxNumeric = idxNumeric';
end
for j = idxNumeric 
    x = X(:,j);
    % Convert x to array if x is a table.
    if istable(x) 
        x = table2array(x);
    end
    % Group x into bins by using the discretize function.
    xbinned = discretize(x,[-inf; edges{j}; inf]); 
    Xbinned(:,j) = xbinned;
end

Xbinned contains the bin indices, ranging from 1 to the number of bins, for numeric predictors. Xbinned values are 0 for categorical predictors. If X contains NaNs, then the corresponding Xbinned values are NaNs.

`CategoricalPredictors` — Categorical predictor indices
vector of positive integers | `[]`

Categorical predictor indices, specified as a vector of positive integers. CategoricalPredictors contains index values indicating that the corresponding predictors are categorical. The index values are between 1 and p, where p is the number of predictors used to train the model. If none of the predictors are categorical, then this property is empty ([]).

If Mdl is a trained discriminant analysis classifier, then CategoricalPredictors is always empty ([]).

Data Types: single | double

`ClassNames` — Unique class labels
categorical array | character array | logical vector | numeric vector | cell array of character vectors

Unique class labels used in training, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors. ClassNames has the same data type as the class labels Y. (The software treats string arrays as cell arrays of character vectors.) ClassNames also determines the class order.

`Cost` — Misclassification costs
square numeric matrix

Square matrix, where Cost(i,j) is the cost of classifying a point into class j if its true class is i (i.e., the rows correspond to the true class and the columns correspond to the predicted class). The order of the rows and columns of Cost corresponds to the order of the classes in ClassNames. The number of rows and columns in Cost is the number of unique classes in the response.

If CVModel is a cross-validated ClassificationDiscriminant, ClassificationKNN, ClassificationNaiveBayes, or ClassificationNeuralNetwork model, then you can change its cost matrix to e.g., CostMatrix, using dot notation.

CVModel.Cost = CostMatrix;

Data Types: double

`CrossValidatedModel` — Name of cross-validated model
character vector

Name of the cross-validated model, returned as a character vector.

Data Types: char

`KFold` — Number of folds in model
positive integer

Number of folds in the cross-validated model, returned as a positive integer.

Data Types: double

`ModelParameters` — Parameters of cross-validated model
object

Parameters of the cross-validated model, returned as an object.

`NumObservations` — Number of observations in the training data
positive integer

This property is read-only.

Number of observations in the training data, returned as a positive integer. NumObservations can be less than the number of rows of input data when there are missing values in the input data or response data.

Data Types: double

`Partition` — Partition used in cross-validation
`CVPartition` object

Partition used in cross-validation, returned as a CVPartition object.

`PredictorNames` — Predictor names
cell array of character vectors

Predictor names in order of their appearance in the predictor data X, specified as a cell array of character vectors. The length of PredictorNames is equal to the number of columns in X.

Data Types: cell

`Prior` — Prior probabilities for each class
numeric vector

Prior probabilities for each class, returned as a numeric vector. The order of the elements of Prior corresponds to the order of the classes in ClassNames.

If CVModel is a cross-validated ClassificationDiscriminant or ClassificationNaiveBayes model, then you can change its vector of priors using dot notation. For example, if priorVector is a vector whose length is the number of classes,

CVModel.Prior = priorVector;

Data Types: double

`ResponseName` — Response variable name
character vector

Response variable name, specified as a character vector.

Data Types: char

`ScoreTransform` — Score transformation function
function name | function handle

Score transformation, specified as a character vector or function handle. ScoreTransform represents a built-in transformation function or a function handle for transforming predicted classification scores.

To change the score transformation function to function, for example, use dot notation.

For a built-in function, enter a character vector.

Mdl.ScoreTransform = 'function';

This table describes the available built-in functions.

Value	Description
`'doublelogit'`	1/(1 + e^–2x)
`'invlogit'`	log(x / (1 – x))
`'ismax'`	Sets the score for the class with the largest score to 1, and sets the scores for all other classes to 0
`'logit'`	1/(1 + e^–x)
`'none'` or `'identity'`	x (no transformation)
`'sign'`	–1 for x < 0 0 for x = 0 1 for x > 0
`'symmetric'`	2x – 1
`'symmetricismax'`	Sets the score for the class with the largest score to 1, and sets the scores for all other classes to –1
`'symmetriclogit'`	2/(1 + e^–x) – 1

For a MATLAB^® function or a function that you define, enter its function handle.
```
Mdl.ScoreTransform = @function;
```
function must accept a matrix (the original scores) and return a matrix of the same size (the transformed scores).

Data Types: char | string | function_handle

`Trained` — Trained learners
cell array of compact classification models

The trained learners, returned as a cell array of compact classification models trained on cross-validation folds.

`W` — Scaled weights in model
numeric vector

This property is read-only.

Scaled weights in the model, returned as a numeric vector. W has length n, the number of rows in the training data.

Data Types: double

`X` — Predictor values
real matrix | table

This property is read-only.

Predictor values, returned as a real matrix or table. Each column of X represents one variable (predictor), and each row represents one observation.

Data Types: double | table

`Y` — Row classifications
categorical array | cell array of character vectors | character array | logical vector | numeric vector

This property is read-only.

Row classifications corresponding to the rows of X, returned as a categorical array, cell array of character vectors, character array, logical vector, or a numeric vector. Each row of Y represents the classification of the corresponding row of X.

Object Functions

`gather`	Gather properties of Statistics and Machine Learning Toolbox object from GPU
`kfoldEdge`	Classification edge for cross-validated classification model
`kfoldLoss`	Classification loss for cross-validated classification model
`kfoldMargin`	Classification margins for cross-validated classification model
`kfoldPredict`	Classify observations in cross-validated classification model
`kfoldfun`	Cross-validate function for classification

Examples

collapse all

Evaluate the Classification Error of a Classification Tree Classifier

Open Live Script

Evaluate the k-fold cross-validation error for a classification tree model.

Load Fisher's iris data set.

load fisheriris

Train a classification tree using default options.

Mdl = fitctree(meas,species);

Cross validate the classification tree model.

CVMdl = crossval(Mdl);

Estimate the 10-fold cross-validation loss.

L = kfoldLoss(CVMdl)

L = 0.0533

Estimate Posterior Probabilities for Test Samples

Open Live Script

Estimate positive class posterior probabilities for the test set of an SVM algorithm.

Load the ionosphere data set.

load ionosphere

Train an SVM classifier. Specify a 20% holdout sample. It is good practice to standardize the predictors and specify the class order.

rng(1) % For reproducibility
CVSVMModel = fitcsvm(X,Y,'Holdout',0.2,'Standardize',true,...
    'ClassNames',{'b','g'});

CVSVMModel is a trained ClassificationPartitionedModel cross-validated classifier.

Estimate the optimal score function for mapping observation scores to posterior probabilities of an observation being classified as 'g'.

ScoreCVSVMModel = fitSVMPosterior(CVSVMModel);

ScoreSVMModel is a trained ClassificationPartitionedModel cross-validated classifier containing the optimal score transformation function estimated from the training data.

Estimate the out-of-sample positive class posterior probabilities. Display the results for the first 10 out-of-sample observations.

[~,OOSPostProbs] = kfoldPredict(ScoreCVSVMModel);
indx = ~isnan(OOSPostProbs(:,2));
hoObs = find(indx); % Holdout observation numbers
OOSPostProbs = [hoObs, OOSPostProbs(indx,2)];
table(OOSPostProbs(1:10,1),OOSPostProbs(1:10,2),...
    'VariableNames',{'ObservationIndex','PosteriorProbability'})

ans=10×2 table
    ObservationIndex    PosteriorProbability
    ________________    ____________________

            6                   0.17379     
            7                   0.89639     
            8                 0.0076634     
            9                   0.91603     
           16                   0.02672     
           22                4.6091e-06     
           23                    0.9024     
           24                2.4127e-06     
           38                0.00042696     
           41                   0.86429

Tips

To estimate posterior probabilities of trained, cross-validated SVM classifiers, use fitSVMPosterior.

Extended Capabilities

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Usage notes and limitations:

ClassificationPartitionedModel can be one of the following cross-validated model objects:
- k-nearest neighbor classifier trained with fitcknn
- Support vector machine classifier trained with fitcsvm
- Binary decision tree for multiclass classification trained with fitctree
The object functions of the ClassificationPartitionedModel model fully support GPU arrays.

For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).

Version History

Introduced in R2011a

expand all

R2023a: Neural network classifiers support misclassification costs and prior probabilities

fitcnet supports misclassification costs and prior probabilities for neural network classifiers. Specify the Cost and Prior name-value arguments when you create a model. Alternatively, you can specify misclassification costs after training a model by using dot notation to change the Cost property value of the model.

Mdl.Cost = [0 2; 1 0];

R2022a: `Cost` property stores the user-specified cost matrix

Starting in R2022a, the Cost property of a cross-validated SVM classification model stores the user-specified cost matrix, so that you can compute the observed misclassification cost using the specified cost value. The software stores normalized prior probabilities (Prior) and observation weights (W) that do not reflect the penalties described in the cost matrix. Other cross-validated models already had this behavior. To compute the observed misclassification cost, specify the LossFun name-value argument as "classifcost" when you call the kfoldLoss function.

Note that model training has not changed and, therefore, the decision boundaries between classes have not changed.

For training an SVM model, the fitting function updates the specified prior probabilities by incorporating the penalties described in the specified cost matrix, and then normalizes the prior probabilities and observation weights. This behavior has not changed. In previous releases, the software stored the default cost matrix in the Cost property and stored the prior probabilities and observation weights used for training in the Prior and W properties, respectively. Starting in R2022a, the software stores the user-specified cost matrix without modification, and stores normalized prior probabilities and observation weights that do not reflect the cost penalties. For more details, see Misclassification Cost Matrix, Prior Probabilities, and Observation Weights.

Some object functions use the Cost and W properties:

The kfoldLoss function uses the cost matrix stored in the Cost property if you specify the LossFun name-value argument as "classifcost" or "mincost".
The kfoldLoss and kfoldEdge functions use the observation weights stored in the W property.

If you specify a nondefault cost matrix when you train a classification model, the object functions return a different value compared to previous releases.

If you want the software to handle the cost matrix, prior probabilities, and observation weights in the same way as in previous releases, adjust the prior probabilities and observation weights for the nondefault cost matrix, as described in Adjust Prior Probabilities and Observation Weights for Misclassification Cost Matrix. Then, when you train a classification model, specify the adjusted prior probabilities and observation weights by using the Prior and Weights name-value arguments, respectively, and use the default cost matrix.

ClassificationPartitionedModel

Description

Creation

Description

Input Arguments

`Mdl` — Classification model
`ClassificationTree` object | `ClassificationDiscriminant` object | `ClassificationNeuralNetwork` object | `ClassificationNaiveBayes` object | `ClassificationKNN` object | `ClassificationSVM` object

Properties

`BinEdges` — Bin edges for numeric predictors
cell array of p numeric vectors

`CategoricalPredictors` — Categorical predictor indices
vector of positive integers | `[]`

`ClassNames` — Unique class labels
categorical array | character array | logical vector | numeric vector | cell array of character vectors

`Cost` — Misclassification costs
square numeric matrix

`CrossValidatedModel` — Name of cross-validated model
character vector

`KFold` — Number of folds in model
positive integer

`ModelParameters` — Parameters of cross-validated model
object

`NumObservations` — Number of observations in the training data
positive integer

`Partition` — Partition used in cross-validation
`CVPartition` object

`PredictorNames` — Predictor names
cell array of character vectors

`Prior` — Prior probabilities for each class
numeric vector

`ResponseName` — Response variable name
character vector

`ScoreTransform` — Score transformation function
function name | function handle

`Trained` — Trained learners
cell array of compact classification models

`W` — Scaled weights in model
numeric vector

`X` — Predictor values
real matrix | table

`Y` — Row classifications
categorical array | cell array of character vectors | character array | logical vector | numeric vector

Object Functions

Examples

Evaluate the Classification Error of a Classification Tree Classifier

Estimate Posterior Probabilities for Test Samples

Tips

Extended Capabilities

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Version History

R2023a: Neural network classifiers support misclassification costs and prior probabilities

R2022a: `Cost` property stores the user-specified cost matrix

See Also

Topics

ClassificationPartitionedModel

Description

Creation

Description

Input Arguments

Mdl — Classification model ClassificationTree object | ClassificationDiscriminant object | ClassificationNeuralNetwork object | ClassificationNaiveBayes object | ClassificationKNN object | ClassificationSVM object

Properties

BinEdges — Bin edges for numeric predictors cell array of p numeric vectors

CategoricalPredictors — Categorical predictor indices vector of positive integers | []

ClassNames — Unique class labels categorical array | character array | logical vector | numeric vector | cell array of character vectors

Cost — Misclassification costs square numeric matrix

CrossValidatedModel — Name of cross-validated model character vector

KFold — Number of folds in model positive integer

ModelParameters — Parameters of cross-validated model object

NumObservations — Number of observations in the training data positive integer

Partition — Partition used in cross-validation CVPartition object

PredictorNames — Predictor names cell array of character vectors

Prior — Prior probabilities for each class numeric vector

ResponseName — Response variable name character vector

ScoreTransform — Score transformation function function name | function handle

Trained — Trained learners cell array of compact classification models

W — Scaled weights in model numeric vector

X — Predictor values real matrix | table

Y — Row classifications categorical array | cell array of character vectors | character array | logical vector | numeric vector

Object Functions

Examples

Evaluate the Classification Error of a Classification Tree Classifier

Estimate Posterior Probabilities for Test Samples

Tips

Extended Capabilities

GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Version History

R2023a: Neural network classifiers support misclassification costs and prior probabilities

R2022a: Cost property stores the user-specified cost matrix

See Also

Topics

`Mdl` — Classification model
`ClassificationTree` object | `ClassificationDiscriminant` object | `ClassificationNeuralNetwork` object | `ClassificationNaiveBayes` object | `ClassificationKNN` object | `ClassificationSVM` object

`BinEdges` — Bin edges for numeric predictors
cell array of p numeric vectors

`CategoricalPredictors` — Categorical predictor indices
vector of positive integers | `[]`

`ClassNames` — Unique class labels
categorical array | character array | logical vector | numeric vector | cell array of character vectors

`Cost` — Misclassification costs
square numeric matrix

`CrossValidatedModel` — Name of cross-validated model
character vector

`KFold` — Number of folds in model
positive integer

`ModelParameters` — Parameters of cross-validated model
object

`NumObservations` — Number of observations in the training data
positive integer

`Partition` — Partition used in cross-validation
`CVPartition` object

`PredictorNames` — Predictor names
cell array of character vectors

`Prior` — Prior probabilities for each class
numeric vector

`ResponseName` — Response variable name
character vector

`ScoreTransform` — Score transformation function
function name | function handle

`Trained` — Trained learners
cell array of compact classification models

`W` — Scaled weights in model
numeric vector

`X` — Predictor values
real matrix | table

`Y` — Row classifications
categorical array | cell array of character vectors | character array | logical vector | numeric vector

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

R2022a: `Cost` property stores the user-specified cost matrix