ClassificationBaggedEnsemble

Classification ensemble grown by resampling

Description

ClassificationBaggedEnsemble combines a set of trained weak learner models and the data on which the learners were trained. Use the predict object function to predict the ensemble response for new data by aggregating predictions from the weak learners.

Creation

Create a bagged classification ensemble object using fitcensemble. Set the name-value argument Method of fitcensemble to "Bag" to use bootstrap aggregation, or bagging (for example, random forest).

For a description of bagged classification ensembles, see Bootstrap Aggregation (Bagging) and Random Forest.

Properties

expand all

Ensemble Properties

`CombineWeights` — Method used to combine weak learner weights
Read-only: `'WeightedAverage'` | `'WeightedSum'`

This property is read-only.

Method used to combine weak learner weights, returned as either 'WeightedAverage' or 'WeightedSum'.

Data Types: char

`FitInfo` — Fit information
Read-only: numeric array

This property is read-only.

Fit information, returned as a numeric array. The FitInfoDescription property describes the content of this array.

Data Types: double

`FitInfoDescription` — Description of information in `FitInfo`
Read-only: character vector | cell array of character vectors

This property is read-only.

Description of the information in FitInfo, returned as a character vector or cell array of character vectors.

Data Types: char | cell

`FResample` — Fraction of training data resampled
Read-only: numeric scalar between `0` and `1`

This property is read-only.

Fraction of the training data resampled when the ensemble object is created, returned as a numeric scalar between 0 and 1. When creating the ensemble model object, fitcensemble resamples the training data randomly for every weak learner.

Data Types: double

`LearnerNames` — Names of weak learners in ensemble
Read-only: cell array of character vectors

This property is read-only.

Names of weak learners in the ensemble, returned as a cell array of character vectors. The name of each learner appears just once. For example, if you have an ensemble of 100 trees, LearnerNames is {'Tree'}.

Data Types: cell

`Method` — Method used to create ensemble
Read-only: character vector

This property is read-only.

Method used by fitcensemble to create the ensemble, returned as a character vector.

Data Types: char

`ModelParameters` — Parameters used in training ensemble
Read-only: `EnsembleParams` object

This property is read-only.

Parameters used in training the ensemble, returned as an EnsembleParams object. The properties of ModelParameters include the type of ensemble, either 'classification' or 'regression', the Method used to create the ensemble, and other parameters, depending on the ensemble.

`NumTrained` — Number of trained weak learners
Read-only: positive integer

This property is read-only.

Number of trained weak learners in the ensemble, returned as a positive integer.

Data Types: double

`ReasonForTermination` — Reason function stopped adding weak learners
Read-only: character vector

This property is read-only.

Reason the fitcensemble function stopped adding weak learners to the ensemble, returned as a character vector.

Data Types: char

`Replace` — Indication that ensemble was trained with replacement
Read-only: `true` | `false`

This property is read-only.

Indication that the ensemble was trained with replacement, returned as true or false.

Data Types: logical

`Trained` — Trained weak learners
Read-only: cell vector

This property is read-only.

Trained weak learners, returned as a cell vector. The entries of the cell vector contain the corresponding compact classification models.

Data Types: cell

`TrainedWeights` — Trained weak learner weights
Read-only: numeric vector

This property is read-only.

Trained weak learner weights, returned as a numeric vector. TrainedWeights has NumTrained elements, where NumTrained is the number of weak learners in the ensemble. The ensemble computes the predicted response by aggregating weighted predictions from its learners.

Data Types: double

`UseObsForLearner` — Indicator that observation was used to train learner
Read-only: logical matrix

This property is read-only.

Indicator that an observation was used to train a learner, returned as a logical matrix of size n-by-NumTrained, where n is the number of rows of training data and NumTrained is the number of trained weak learners. UseObsForLearner(i,j) is true if observation i was used for training learner j, and is false otherwise.

Data Types: logical

Predictor Properties

`BinEdges` — Bin edges for numeric predictors
Read-only: cell array of p numeric vectors

This property is read-only.

Bin edges for numeric predictors, specified as a cell array of p numeric vectors, where p is the number of predictors. Each vector includes the bin edges for a numeric predictor. The element in the cell array for a categorical predictor is empty because the software does not bin categorical predictors.

The software bins numeric predictors only if you specify the NumBins name-value argument as a positive integer scalar when training a model with tree learners. The BinEdges property is empty if the NumBins value is empty (default).

You can reproduce the binned predictor data Xbinned by using the BinEdges property of the trained model mdl.

X = mdl.X; % Predictor data
Xbinned = zeros(size(X));
edges = mdl.BinEdges;
% Find indices of binned predictors.
idxNumeric = find(~cellfun(@isempty,edges));
if iscolumn(idxNumeric)
    idxNumeric = idxNumeric';
end
for j = idxNumeric 
    x = X(:,j);
    % Convert x to array if x is a table.
    if istable(x) 
        x = table2array(x);
    end
    % Group x into bins by using the discretize function.
    xbinned = discretize(x,[-inf; edges{j}; inf]); 
    Xbinned(:,j) = xbinned;
end

Xbinned contains the bin indices, ranging from 1 to the number of bins, for the numeric predictors. Xbinned values are 0 for categorical predictors. If X contains NaNs, then the corresponding Xbinned values are NaNs.

Data Types: cell

`CategoricalPredictors` — Categorical predictor indices
Read-only: vector of positive integers | `[]`

This property is read-only.

Categorical predictor indices, specified as a vector of positive integers. CategoricalPredictors contains index values indicating that the corresponding predictors are categorical. The index values are between 1 and p, where p is the number of predictors used to train the model. If none of the predictors are categorical, then this property is empty ([]).

Data Types: single | double

`ExpandedPredictorNames` — Expanded predictor names
Read-only: cell array of character vectors

This property is read-only.

Expanded predictor names, returned as a cell array of character vectors.

If the model uses encoding for categorical variables, then ExpandedPredictorNames includes the names that describe the expanded variables. Otherwise, ExpandedPredictorNames is the same as PredictorNames.

Data Types: cell

`PredictorNames` — Predictor names
Read-only: cell array of character vectors

This property is read-only.

Predictor names, specified as a cell array of character vectors. The order of the entries in PredictorNames is the same as in the training data.

Data Types: cell

`X` — Predictor values
Read-only: real matrix | table

This property is read-only.

Predictor values, returned as a real matrix or table. Each column of X represents one variable (predictor), and each row represents one observation.

Data Types: double | table

Response Properties

`ClassNames` — List of elements in `Y` with duplicates removed
Read-only: categorical array | cell array of character vectors | character array | logical vector | numeric vector

This property is read-only.

List of the elements in Y with duplicates removed, returned as a categorical array, cell array of character vectors, character array, logical vector, or numeric vector. ClassNames has the same data type as the data in the argument Y. (The software treats string arrays as cell arrays of character vectors.)

Data Types: double | logical | char | cell | categorical

`ResponseName` — Name of response variable
Read-only: character vector

This property is read-only.

Name of the response variable, returned as a character vector.

Data Types: char

`Y` — Class labels
Read-only: categorical array | cell array of character vectors | character array | logical vector | numeric vector

This property is read-only.

Class labels corresponding to the observations in X, returned as a categorical array, cell array of character vectors, character array, logical vector, or numeric vector. Each row of Y represents the classification of the corresponding row of X.

Other Data Properties

`HyperparameterOptimizationResults` — Description of cross-validation optimization of hyperparameters
Read-only: `BayesianOptimization` object | table

This property is read-only.

Description of the cross-validation optimization of hyperparameters, returned as a BayesianOptimization object or a table of hyperparameters and associated values. This property is nonempty if the OptimizeHyperparameters name-value argument is nonempty when you create the model. The value of HyperparameterOptimizationResults depends on the setting of the Optimizer option in HyperparameterOptimizationOptions when you create the model.

"bayesopt" (default) — Object of class BayesianOptimization
"gridsearch" or "randomsearch" — Table of hyperparameters used, observed objective function values (cross-validation loss), and rank of observations from lowest (best) to highest (worst)

`NumObservations` — Number of observations in training data
Read-only: positive integer

This property is read-only.

Number of observations in the training data, returned as a positive integer. NumObservations can be less than the number of rows of input data when there are missing values in the input data or response data.

Data Types: double

`RowsUsed` — Rows of original predictor data `X` used for fitting
Read-only: logical vector

This property is read-only.

Rows of the original predictor data X used for fitting, returned as an n-element logical vector, where n is the number of rows of X. If the software uses all rows of X to create the object, then RowsUsed is an empty array ([]).

Data Types: logical

`W` — Scaled weights in ensemble
Read-only: numeric vector

This property is read-only.

Scaled weights in the ensemble, returned as a numeric vector. W has length n, the number of rows in the training data. The sum of the elements of W is 1.

Data Types: double

Other Classification Properties

`Cost` — Misclassification costs
Read-only: square numeric matrix

This property is read-only.

Misclassification costs, returned as a square numeric matrix. Cost has K rows and columns, where K is the number of classes.

Cost(i,j) is the cost of classifying a point into class j if its true class is i. The order of the rows and columns of Cost corresponds to the order of the classes in ClassNames.

Data Types: double

`Prior` — Prior probabilities for each class
Read-only: numeric vector

This property is read-only.

Prior probabilities for each class, returned as a K-element numeric vector, where K is the number of unique classes in the response. The order of the elements of Prior corresponds to the order of the classes in ClassNames.

Data Types: double

`ScoreTransform` — Function for transforming scores
function handle | name of a built-in transformation function | `"none"`

Function for transforming scores, specified as a function handle or the name of a built-in transformation function. "none" means no transformation; equivalently, "none" means @(x)x. For a list of built-in transformation functions and the syntax of custom transformation functions, see ScoreTransform (for trees) or ScoreTransform (for ensembles).

Add or change a ScoreTransform function using dot notation:

Mdl.ScoreTransform = "function"
% or
Mdl.ScoreTransform = @function

Data Types: char | string | function_handle

Object Functions

`compact`	Reduce size of machine learning model
`compareHoldout`	Compare accuracies of two classification models using new data
`crossval`	Cross-validate machine learning model
`edge`	Classification edge for classification ensemble model
`gather`	Gather properties of Statistics and Machine Learning Toolbox object from GPU
`lime`	Local interpretable model-agnostic explanations (LIME)
`loss`	Classification loss for classification ensemble model
`margin`	Classification margins for classification ensemble model
`oobEdge`	Out-of-bag classification edge for bagged classification ensemble model
`oobLoss`	Out-of-bag classification loss for bagged classification ensemble model
`oobMargin`	Out-of-bag classification margins for bagged classification ensemble
`oobPermutedPredictorImportance`	Out-of-bag predictor importance estimates for random forest of classification trees by permutation
`oobPredict`	Predict out-of-bag labels and scores of bagged classification ensemble
`partialDependence`	Compute partial dependence
`plotPartialDependence`	Create partial dependence plot (PDP) and individual conditional expectation (ICE) plots
`predict`	Predict labels using classification ensemble model
`predictorImportance`	Estimates of predictor importance for classification ensemble of decision trees
`resubEdge`	Resubstitution classification edge for classification ensemble model
`resubLoss`	Resubstitution classification loss for classification ensemble model
`resubMargin`	Resubstitution classification margins for classification ensemble model
`resubPredict`	Classify observations in classification ensemble by resubstitution
`resume`	Resume training of classification ensemble model
`shapley`	Shapley values
`testckfold`	Compare accuracies of two classification models by repeated cross-validation

Examples

collapse all

Train Bagged Ensemble of Classification Trees

Open Live Script

Load the ionosphere data set.

load ionosphere

You can train a bagged ensemble of 100 classification trees using all measurements.

Mdl = fitcensemble(X,Y,Method="Bag")

fitcensemble uses a default template tree object templateTree() as a weak learner when Method is "Bag". In this example, for reproducibility, specify Reproducible=true when you create a tree template object, and then use the object as a weak learner.

rng(0,"twister") % For reproducibility
t = templateTree(Reproducible=true); % For reproducibiliy of random predictor selections
Mdl = fitcensemble(X,Y,Method="Bag",Learners=t)

Mdl = 
  ClassificationBaggedEnsemble
             ResponseName: 'Y'
    CategoricalPredictors: []
               ClassNames: {'b'  'g'}
           ScoreTransform: 'none'
          NumObservations: 351
               NumTrained: 100
                   Method: 'Bag'
             LearnerNames: {'Tree'}
     ReasonForTermination: 'Terminated normally after completing the requested number of training cycles.'
                  FitInfo: []
       FitInfoDescription: 'None'
                FResample: 1
                  Replace: 1
         UseObsForLearner: [351×100 logical]


  Properties, Methods

Mdl is a ClassificationBaggedEnsemble model object.

Mdl.Trained is the property that stores a 100-by-1 cell vector of the trained classification trees (CompactClassificationTree model objects) that compose the ensemble.

Plot a graph of the first trained classification tree.

view(Mdl.Trained{1},Mode="graph")

Figure Classification tree viewer contains an axes object and other objects of type uimenu, uicontrol. The axes object contains 60 objects of type line, text. One or more of the lines displays its values using only markers

By default, fitcensemble grows deep decision trees for bagged ensembles.

Estimate the in-sample misclassification rate.

L = resubLoss(Mdl)

L = 
0

L is 0, which indicates that Mdl is perfect at classifying the training data.

Tips

For a bagged ensemble of classification trees Mdl, the Trained property of Mdl stores a cell vector of Mdl.NumTrained CompactClassificationTree model objects. For a textual or graphical display of tree t in the cell vector, enter

view(Mdl.Trained{t})

Alternative Functionality

Bootstrap Aggregation Methods

For classification or regression, you can choose two approaches for bagging:

Classification: create a bagged ensemble using fitcensemble or TreeBagger.
Regression: create a bagged ensemble using fitrensemble or TreeBagger.

For help choosing between these approaches, see Ensemble Algorithms and Suggestions for Choosing an Appropriate Ensemble Algorithm.

Extended Capabilities

expand all

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

Usage notes and limitations:

The predict function supports code generation.
To integrate the prediction of an ensemble into Simulink^®, you can use the ClassificationEnsemble Predict block in the Statistics and Machine Learning Toolbox™ library or a MATLAB^® Function block with the predict function.
When you train an ensemble by using fitcensemble, the following restrictions apply.
- The value of the ScoreTransform name-value argument cannot be an anonymous function.
- Code generation limitations for the weak learners used in the ensemble also apply to the ensemble. For decision tree weak learners, you cannot use surrogate splits; that is, the value of the Surrogate name-value argument must be "off".
For fixed-point code generation, the following additional restrictions apply.
- When you train an ensemble by using fitcensemble, you must train an ensemble using tree learners, and the ScoreTransform value cannot be "invlogit".
- Categorical predictors (logical, categorical, char, string, or cell) are not supported. You cannot use the CategoricalPredictors name-value argument. To include categorical predictors in a model, preprocess them by using dummyvar before fitting the model.
- Class labels with the categorical data type are not supported. Both the class label value in the training data (Tbl or Y) and the value of the ClassNames name-value argument cannot be an array with the categorical data type.

For more information, see Introduction to Code Generation.

Version History

Introduced in R2011a

expand all

R2022a: `Cost` property stores the user-specified cost matrix

Starting in R2022a, the Cost property stores the user-specified cost matrix, so that you can compute the observed misclassification cost using the specified cost value. The software stores normalized prior probabilities (Prior) and observation weights (W) that do not reflect the penalties described in the cost matrix. To compute the observed misclassification cost, specify the LossFun name-value argument as "classifcost" when you call the loss, resubLoss, or oobLoss function.

Note that model training has not changed and, therefore, the decision boundaries between classes have not changed.

For training, the fitting function updates the specified prior probabilities by incorporating the penalties described in the specified cost matrix, and then normalizes the prior probabilities and observation weights. This behavior has not changed. In previous releases, the software stored the default cost matrix in the Cost property and stored the prior probabilities and observation weights used for training in the Prior and W properties, respectively. Starting in R2022a, the software stores the user-specified cost matrix without modification, and stores normalized prior probabilities and observation weights that do not reflect the cost penalties. For more details, see Misclassification Cost Matrix, Prior Probabilities, and Observation Weights.

Some object functions use the Cost, Prior, and W properties:

The loss, resubLoss, and oobLoss functions use the cost matrix stored in the Cost property if you specify the LossFun name-value argument as "classifcost" or "mincost".
The loss and edge functions use the prior probabilities stored in the Prior property to normalize the observation weights of the input data.
The resubLoss, resubEdge, oobLoss, and oobEdge functions use the observation weights stored in the W property.

If you specify a nondefault cost matrix when you train a classification model, the object functions return a different value compared to previous releases.

If you want the software to handle the cost matrix, prior probabilities, and observation weights in the same way as in previous releases, adjust the prior probabilities and observation weights for the nondefault cost matrix, as described in Adjust Prior Probabilities and Observation Weights for Misclassification Cost Matrix. Then, when you train a classification model, specify the adjusted prior probabilities and observation weights by using the Prior and Weights name-value arguments, respectively, and use the default cost matrix.

ClassificationBaggedEnsemble

Description

Creation

Properties

Ensemble Properties

CombineWeights — Method used to combine weak learner weights Read-only: 'WeightedAverage' | 'WeightedSum'

FitInfo — Fit information Read-only: numeric array

FitInfoDescription — Description of information in FitInfo Read-only: character vector | cell array of character vectors

FResample — Fraction of training data resampled Read-only: numeric scalar between 0 and 1

LearnerNames — Names of weak learners in ensemble Read-only: cell array of character vectors

Method — Method used to create ensemble Read-only: character vector

ModelParameters — Parameters used in training ensemble Read-only: EnsembleParams object

NumTrained — Number of trained weak learners Read-only: positive integer

ReasonForTermination — Reason function stopped adding weak learners Read-only: character vector

Replace — Indication that ensemble was trained with replacement Read-only: true | false

Trained — Trained weak learners Read-only: cell vector

TrainedWeights — Trained weak learner weights Read-only: numeric vector

UseObsForLearner — Indicator that observation was used to train learner Read-only: logical matrix

Predictor Properties

BinEdges — Bin edges for numeric predictors Read-only: cell array of p numeric vectors

CategoricalPredictors — Categorical predictor indices Read-only: vector of positive integers | []

ExpandedPredictorNames — Expanded predictor names Read-only: cell array of character vectors

PredictorNames — Predictor names Read-only: cell array of character vectors

X — Predictor values Read-only: real matrix | table

Response Properties

ClassNames — List of elements in Y with duplicates removed Read-only: categorical array | cell array of character vectors | character array | logical vector | numeric vector

ResponseName — Name of response variable Read-only: character vector

Y — Class labels Read-only: categorical array | cell array of character vectors | character array | logical vector | numeric vector

Other Data Properties

HyperparameterOptimizationResults — Description of cross-validation optimization of hyperparameters Read-only: BayesianOptimization object | table

NumObservations — Number of observations in training data Read-only: positive integer

RowsUsed — Rows of original predictor data X used for fitting Read-only: logical vector

W — Scaled weights in ensemble Read-only: numeric vector

Other Classification Properties

Cost — Misclassification costs Read-only: square numeric matrix

Prior — Prior probabilities for each class Read-only: numeric vector

ScoreTransform — Function for transforming scores function handle | name of a built-in transformation function | "none"

Object Functions

Examples

Train Bagged Ensemble of Classification Trees

Tips

Alternative Functionality

Bootstrap Aggregation Methods

Extended Capabilities

C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™.

Version History

R2022a: Cost property stores the user-specified cost matrix

See Also

Topics

`CombineWeights` — Method used to combine weak learner weights
Read-only: `'WeightedAverage'` | `'WeightedSum'`

`FitInfo` — Fit information
Read-only: numeric array

`FitInfoDescription` — Description of information in `FitInfo`
Read-only: character vector | cell array of character vectors

`FResample` — Fraction of training data resampled
Read-only: numeric scalar between `0` and `1`

`LearnerNames` — Names of weak learners in ensemble
Read-only: cell array of character vectors

`Method` — Method used to create ensemble
Read-only: character vector

`ModelParameters` — Parameters used in training ensemble
Read-only: `EnsembleParams` object

`NumTrained` — Number of trained weak learners
Read-only: positive integer

`ReasonForTermination` — Reason function stopped adding weak learners
Read-only: character vector

`Replace` — Indication that ensemble was trained with replacement
Read-only: `true` | `false`

`Trained` — Trained weak learners
Read-only: cell vector

`TrainedWeights` — Trained weak learner weights
Read-only: numeric vector

`UseObsForLearner` — Indicator that observation was used to train learner
Read-only: logical matrix

`BinEdges` — Bin edges for numeric predictors
Read-only: cell array of p numeric vectors

`CategoricalPredictors` — Categorical predictor indices
Read-only: vector of positive integers | `[]`

`ExpandedPredictorNames` — Expanded predictor names
Read-only: cell array of character vectors

`PredictorNames` — Predictor names
Read-only: cell array of character vectors

`X` — Predictor values
Read-only: real matrix | table

`ClassNames` — List of elements in `Y` with duplicates removed
Read-only: categorical array | cell array of character vectors | character array | logical vector | numeric vector

`ResponseName` — Name of response variable
Read-only: character vector

`Y` — Class labels
Read-only: categorical array | cell array of character vectors | character array | logical vector | numeric vector

`HyperparameterOptimizationResults` — Description of cross-validation optimization of hyperparameters
Read-only: `BayesianOptimization` object | table

`NumObservations` — Number of observations in training data
Read-only: positive integer

`RowsUsed` — Rows of original predictor data `X` used for fitting
Read-only: logical vector

`W` — Scaled weights in ensemble
Read-only: numeric vector

`Cost` — Misclassification costs
Read-only: square numeric matrix

`Prior` — Prior probabilities for each class
Read-only: numeric vector

`ScoreTransform` — Function for transforming scores
function handle | name of a built-in transformation function | `"none"`

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

R2022a: `Cost` property stores the user-specified cost matrix