Main Content

Classification loss for multiclass error-correcting output codes (ECOC) model

returns the classification loss (`L`

= loss(`Mdl`

,`tbl`

,`ResponseVarName`

)`L`

), a scalar representing how well
the trained multiclass error-correcting output codes (ECOC) model `Mdl`

classifies the predictor data in `tbl`

compared to the true class
labels in `tbl.ResponseVarName`

. By default, `loss`

uses the classification error to compute
`L`

.

specifies options using one or more name-value pair arguments in addition to any of the
input argument combinations in previous syntaxes. For example, you can specify a decoding
scheme, classification loss function, and verbosity level.`L`

= loss(___,`Name,Value`

)

Load Fisher's iris data set. Specify the predictor data `X`

, the response data `Y`

, and the order of the classes in `Y`

.

load fisheriris X = meas; Y = categorical(species); classOrder = unique(Y); % Class order rng(1); % For reproducibility

Train an ECOC model using SVM binary classifiers. Specify a 15% holdout sample, standardize the predictors using an SVM template, and specify the class order.

t = templateSVM('Standardize',true); PMdl = fitcecoc(X,Y,'Holdout',0.15,'Learners',t,'ClassNames',classOrder); Mdl = PMdl.Trained{1}; % Extract trained, compact classifier

`PMdl`

is a `ClassificationPartitionedECOC`

model. It has the property `Trained`

, a 1-by-1 cell array containing the `CompactClassificationECOC`

model that the software trained using the training set.

Estimate the test-sample classification error, which is the default classification loss.

```
testInds = test(PMdl.Partition); % Extract the test indices
XTest = X(testInds,:);
YTest = Y(testInds,:);
L = loss(Mdl,XTest,YTest)
```

L = 0

The ECOC model correctly classifies all irises in the test sample.

Determine the quality of an ECOC model by using a custom loss function that considers the minimal binary loss for each observation.

Load Fisher's iris data set. Specify the predictor data `X`

, the response data `Y`

, and the order of the classes in `Y`

.

load fisheriris X = meas; Y = categorical(species); classOrder = unique(Y); % Class order rng(1) % For reproducibility

Train an ECOC model using SVM binary classifiers. Specify a 15% holdout sample, standardize the predictors using an SVM template, and define the class order.

t = templateSVM('Standardize',true); PMdl = fitcecoc(X,Y,'Holdout',0.15,'Learners',t,'ClassNames',classOrder); Mdl = PMdl.Trained{1}; % Extract trained, compact classifier

`PMdl`

is a `ClassificationPartitionedECOC`

model. It has the property `Trained`

, a 1-by-1 cell array containing the `CompactClassificationECOC`

model that the software trained using the training set.

Create a function that takes the minimal loss for each observation, then averages the minimal losses for all observations. `S`

corresponds to the `NegLoss`

output of `predict`

.

lossfun = @(~,S,~,~)mean(min(-S,[],2));

Compute the test-sample custom loss.

testInds = test(PMdl.Partition); % Extract the test indices XTest = X(testInds,:); YTest = Y(testInds,:); loss(Mdl,XTest,YTest,'LossFun',lossfun)

ans = 0.0033

The average minimal binary loss for the test-sample observations is `0.0033`

.

`Mdl`

— Full or compact multiclass ECOC model`ClassificationECOC`

model object | `CompactClassificationECOC`

model
objectFull or compact multiclass ECOC model, specified as a
`ClassificationECOC`

or
`CompactClassificationECOC`

model
object.

To create a full or compact ECOC model, see `ClassificationECOC`

or `CompactClassificationECOC`

.

`tbl`

— Sample datatable

Sample data, specified as a table. Each row of `tbl`

corresponds to one
observation, and each column corresponds to one predictor variable. Optionally,
`tbl`

can contain additional columns for the response variable
and observation weights. `tbl`

must contain all the predictors used
to train `Mdl`

. Multicolumn variables and cell arrays other than cell
arrays of character vectors are not allowed.

If you train `Mdl`

using sample data contained in a
`table`

, then the input data for `loss`

must also be in a table.

When training `Mdl`

, assume that you set
`'Standardize',true`

for a template object specified in the
`'Learners'`

name-value pair argument of `fitcecoc`

. In
this case, for the corresponding binary learner `j`

, the software standardizes
the columns of the new predictor data using the corresponding means in
`Mdl.BinaryLearner{j}.Mu`

and standard deviations in
`Mdl.BinaryLearner{j}.Sigma`

.

**Data Types: **`table`

`ResponseVarName`

— Response variable namename of variable in

`tbl`

Response variable name, specified as the name of a variable in `tbl`

. If
`tbl`

contains the response variable used to train
`Mdl`

, then you do not need to specify
`ResponseVarName`

.

If you specify `ResponseVarName`

, then you must do so as a character vector
or string scalar. For example, if the response variable is stored as
`tbl.y`

, then specify `ResponseVarName`

as
`'y'`

. Otherwise, the software treats all columns of
`tbl`

, including `tbl.y`

, as predictors.

The response variable must be a categorical, character, or string array, a logical or numeric vector, or a cell array of character vectors. If the response variable is a character array, then each element must correspond to one row of the array.

**Data Types: **`char`

| `string`

`X`

— Predictor datanumeric matrix

Predictor data, specified as a numeric matrix.

Each row of `X`

corresponds to one observation, and each column corresponds
to one variable. The variables in the columns of
`X`

must be the same as the
variables that trained the classifier
`Mdl`

.

The number of rows in `X`

must equal the number of rows in
`Y`

.

When training `Mdl`

, assume that you set
`'Standardize',true`

for a template object specified in the
`'Learners'`

name-value pair argument of `fitcecoc`

. In
this case, for the corresponding binary learner `j`

, the software standardizes
the columns of the new predictor data using the corresponding means in
`Mdl.BinaryLearner{j}.Mu`

and standard deviations in
`Mdl.BinaryLearner{j}.Sigma`

.

**Data Types: **`double`

| `single`

`Y`

— Class labelscategorical array | character array | string array | logical vector | numeric vector | cell array of character vectors

Class labels, specified as a categorical, character, or string array, a logical or numeric
vector, or a cell array of character vectors. `Y`

must have the same
data type as `Mdl.ClassNames`

. (The software treats string arrays as cell arrays of character
vectors.)

The number of rows in `Y`

must equal the number of rows in
`tbl`

or `X`

.

**Data Types: **`categorical`

| `char`

| `string`

| `logical`

| `single`

| `double`

| `cell`

Specify optional
comma-separated pairs of `Name,Value`

arguments. `Name`

is
the argument name and `Value`

is the corresponding value.
`Name`

must appear inside quotes. You can specify several name and value
pair arguments in any order as
`Name1,Value1,...,NameN,ValueN`

.

`loss(Mdl,X,Y,'BinaryLoss','hinge','LossFun',@lossfun)`

specifies `'hinge'`

as the binary learner loss function and the custom
function handle `@lossfun`

as the overall loss function.`'BinaryLoss'`

— Binary learner loss function`'hamming'`

| `'linear'`

| `'logit'`

| `'exponential'`

| `'binodeviance'`

| `'hinge'`

| `'quadratic'`

| function handleBinary learner loss function, specified as the comma-separated pair consisting of
`'BinaryLoss'`

and a built-in loss function name or function handle.

This table describes the built-in functions, where

*y*is a class label for a particular binary learner (in the set {–1,1,0}),_{j}*s*is the score for observation_{j}*j*, and*g*(*y*,_{j}*s*) is the binary loss formula._{j}Value Description Score Domain *g*(*y*,_{j}*s*)_{j}`'binodeviance'`

Binomial deviance (–∞,∞) log[1 + exp(–2 *y*)]/[2log(2)]_{j}s_{j}`'exponential'`

Exponential (–∞,∞) exp(– *y*)/2_{j}s_{j}`'hamming'`

Hamming [0,1] or (–∞,∞) [1 – sign( *y*)]/2_{j}s_{j}`'hinge'`

Hinge (–∞,∞) max(0,1 – *y*)/2_{j}s_{j}`'linear'`

Linear (–∞,∞) (1 – *y*)/2_{j}s_{j}`'logit'`

Logistic (–∞,∞) log[1 + exp(– *y*)]/[2log(2)]_{j}s_{j}`'quadratic'`

Quadratic [0,1] [1 – *y*(2_{j}*s*– 1)]_{j}^{2}/2The software normalizes binary losses so that the loss is 0.5 when

*y*= 0. Also, the software calculates the mean binary loss for each class._{j}For a custom binary loss function, for example

`customFunction`

, specify its function handle`'BinaryLoss',@customFunction`

.`customFunction`

has this form:where:bLoss = customFunction(M,s)

`M`

is the*K*-by-*L*coding matrix stored in`Mdl.CodingMatrix`

.`s`

is the 1-by-*L*row vector of classification scores.`bLoss`

is the classification loss. This scalar aggregates the binary losses for every learner in a particular class. For example, you can use the mean binary loss to aggregate the loss over the learners for each class.*K*is the number of classes.*L*is the number of binary learners.

For an example of passing a custom binary loss function, see Predict Test-Sample Labels of ECOC Model Using Custom Binary Loss Function.

The default `BinaryLoss`

value depends on the score ranges returned
by the binary learners. This table describes some default
`BinaryLoss`

values based on the given assumptions.

Assumption | Default Value |
---|---|

All binary learners are SVMs or either linear or kernel classification models of SVM learners. | `'hinge'` |

All binary learners are ensembles trained by
`AdaboostM1` or
`GentleBoost` . | `'exponential'` |

All binary learners are ensembles trained by
`LogitBoost` . | `'binodeviance'` |

All binary learners are linear or kernel classification models of
logistic regression learners. Or, you specify to predict class
posterior probabilities by setting
`'FitPosterior',true` in `fitcecoc` . | `'quadratic'` |

To check the default value, use dot notation to display the
`BinaryLoss`

property of the trained model at the command
line.

**Example: **`'BinaryLoss','binodeviance'`

**Data Types: **`char`

| `string`

| `function_handle`

`'Decoding'`

— Decoding scheme`'lossweighted'`

(default) | `'lossbased'`

Decoding scheme that aggregates the binary losses, specified as the comma-separated pair
consisting of `'Decoding'`

and `'lossweighted'`

or
`'lossbased'`

. For more information, see Binary Loss.

**Example: **`'Decoding','lossbased'`

`'LossFun'`

— Loss function`'classiferror'`

(default) | function handleLoss function, specified as the comma-separated pair consisting of
`'LossFun'`

and `'classiferror'`

or a function
handle.

Specify the built-in function

`'classiferror'`

. In this case, the loss function is the classification error, which is the proportion of misclassified observations.Or, specify your own function using function handle notation.

Assume that

`n = size(X,1)`

is the sample size and`K`

is the number of classes. Your function must have the signature`lossvalue = lossfun(C,S,W,Cost)`

, where:The output argument

`lossvalue`

is a scalar.You specify the function name (

).`lossfun`

`C`

is an`n`

-by-`K`

logical matrix with rows indicating the class to which the corresponding observation belongs. The column order corresponds to the class order in`Mdl.ClassNames`

.Construct

`C`

by setting`C(p,q) = 1`

if observation`p`

is in class`q`

, for each row. Set all other elements of row`p`

to`0`

.`S`

is an`n`

-by-`K`

numeric matrix of negated loss values for the classes. Each row corresponds to an observation. The column order corresponds to the class order in`Mdl.ClassNames`

. The input`S`

resembles the output argument`NegLoss`

of`predict`

.`W`

is an`n`

-by-1 numeric vector of observation weights. If you pass`W`

, the software normalizes its elements to sum to`1`

.`Cost`

is a`K`

-by-`K`

numeric matrix of misclassification costs. For example,`Cost = ones(K) – eye(K)`

specifies a cost of 0 for correct classification and 1 for misclassification.

Specify your function using

`'LossFun',@lossfun`

.

**Data Types: **`char`

| `string`

| `function_handle`

`'ObservationsIn'`

— Predictor data observation dimension`'rows'`

(default) | `'columns'`

Predictor data observation dimension, specified as the comma-separated pair consisting of
`'ObservationsIn'`

and `'columns'`

or
`'rows'`

. `Mdl.BinaryLearners`

must contain
`ClassificationLinear`

models.

**Note**

If you orient your predictor matrix so that
observations correspond to columns and specify
`'ObservationsIn','columns'`

, you
can experience a significant reduction in
execution time. You cannot specify
`'ObservationsIn','columns'`

for
predictor data in a table.

`'Options'`

— Estimation options`[]`

(default) | structure array returned by `statset`

Estimation options, specified as the comma-separated pair consisting
of `'Options'`

and a structure array returned by `statset`

.

To invoke parallel computing:

You need a Parallel Computing Toolbox™ license.

Specify

`'Options',statset('UseParallel',true)`

.

`'Verbose'`

— Verbosity level`0`

(default) | `1`

Verbosity level, specified as the comma-separated pair consisting of
`'Verbose'`

and `0`

or `1`

.
`Verbose`

controls the number of diagnostic messages that the
software displays in the Command Window.

If `Verbose`

is `0`

, then the software does not display
diagnostic messages. Otherwise, the software displays diagnostic messages.

**Example: **`'Verbose',1`

**Data Types: **`single`

| `double`

`'Weights'`

— Observation weights`ones(size(X,1),1)`

(default) | numeric vector | name of variable in `tbl`

Observation weights, specified as the comma-separated pair consisting of
`'Weights'`

and a numeric vector or the name of a variable in
`tbl`

. If you supply weights, then `loss`

computes the weighted loss.

If you specify `Weights`

as a numeric vector, then the size of
`Weights`

must be equal to the number of rows in
`X`

or `tbl`

.

If you specify `Weights`

as the name of a variable in
`tbl`

, you must do so as a character vector or string scalar. For
example, if the weights are stored as `tbl.w`

, then specify
`Weights`

as `'w'`

. Otherwise, the software
treats all columns of `tbl`

, including `tbl.w`

,
as predictors.

If you do not specify your own loss function (using `LossFun`

),
then the software normalizes `Weights`

to sum up to the value of
the prior probability in the respective class.

**Data Types: **`single`

| `double`

| `char`

| `string`

`L`

— Classification lossnumeric scalar | numeric row vector

Classification loss, returned as a numeric scalar or row vector.
`L`

is a generalization or resubstitution quality measure. Its
interpretation depends on the loss function and weighting scheme, but in general, better
classifiers yield smaller classification loss values.

If `Mdl.BinaryLearners`

contains `ClassificationLinear`

models, then `L`

is a
1-by-*ℓ* vector, where *ℓ* is the number of
regularization strengths in the linear classification models
(`numel(Mdl.BinaryLearners{1}.Lambda)`

). The value
`L(j)`

is the loss for the model trained using regularization
strength `Mdl.BinaryLearners{1}.Lambda(j)`

.

Otherwise, `L`

is a scalar value.

The *classification error* is
a binary classification error measure that has the form

$$L=\frac{{\displaystyle \sum _{j=1}^{n}{w}_{j}{e}_{j}}}{{\displaystyle \sum _{j=1}^{n}{w}_{j}}},$$

where:

*w*is the weight for observation_{j}*j*. The software renormalizes the weights to sum to 1.*e*= 1 if the predicted class of observation_{j}*j*differs from its true class, and 0 otherwise.

In other words, the classification error is the proportion of observations misclassified by the classifier.

A *binary loss* is a function
of the class and classification score that determines how well a binary
learner classifies an observation into the class.

Suppose the following:

*m*is element (_{kj}*k*,*j*) of the coding design matrix*M*(that is, the code corresponding to class*k*of binary learner*j*).*s*is the score of binary learner_{j}*j*for an observation.*g*is the binary loss function.$$\widehat{k}$$ is the predicted class for the observation.

In *loss-based decoding*
[Escalera et al.], the class producing the minimum sum of the binary losses over
binary learners determines the predicted class of an observation, that is,

$$\widehat{k}=\underset{k}{\text{argmin}}{\displaystyle \sum _{j=1}^{L}\left|{m}_{kj}\right|g}({m}_{kj},{s}_{j}).$$

In *loss-weighted decoding*
[Escalera et al.], the class producing the minimum average of the binary losses
over binary learners determines the predicted class of an observation, that is,

$$\widehat{k}=\underset{k}{\text{argmin}}\frac{{\displaystyle \sum _{j=1}^{L}\left|{m}_{kj}\right|g}({m}_{kj},{s}_{j})}{{\displaystyle \sum}_{j=1}^{L}\left|{m}_{kj}\right|}.$$

Allwein et al. suggest that loss-weighted decoding improves classification accuracy by keeping loss values for all classes in the same dynamic range.

This table summarizes the supported loss functions, where
*y _{j}* is a class label for a particular binary
learner (in the set {–1,1,0}),

Value | Description | Score Domain | g(y,_{j}s)_{j} |
---|---|---|---|

`'binodeviance'` | Binomial deviance | (–∞,∞) | log[1 +
exp(–2y)]/[2log(2)]_{j}s_{j} |

`'exponential'` | Exponential | (–∞,∞) | exp(–y)/2_{j}s_{j} |

`'hamming'` | Hamming | [0,1] or (–∞,∞) | [1 – sign(y)]/2_{j}s_{j} |

`'hinge'` | Hinge | (–∞,∞) | max(0,1 – y)/2_{j}s_{j} |

`'linear'` | Linear | (–∞,∞) | (1 – y)/2_{j}s_{j} |

`'logit'` | Logistic | (–∞,∞) | log[1 +
exp(–y)]/[2log(2)]_{j}s_{j} |

`'quadratic'` | Quadratic | [0,1] | [1 – y(2_{j}s –
1)]_{j}^{2}/2 |

The software normalizes binary losses such that the loss is 0.5 when
*y _{j}* = 0, and aggregates using the average
of the binary learners [Allwein et al.].

Do not confuse the binary loss with the overall classification loss (specified by the
`'LossFun'`

name-value pair argument of the `loss`

and
`predict`

object functions), which measures how well an ECOC classifier
performs as a whole.

[1] Allwein, E., R. Schapire, and Y. Singer. “Reducing
multiclass to binary: A unifying approach for margin classiﬁers.” *Journal of
Machine Learning Research*. Vol. 1, 2000, pp. 113–141.

[2] Escalera, S., O. Pujol, and P. Radeva. “On the decoding
process in ternary error-correcting output codes.” *IEEE Transactions on
Pattern Analysis and Machine Intelligence*. Vol. 32, Issue 7, 2010, pp.
120–134.

[3] Escalera, S., O. Pujol, and P. Radeva. “Separability of
ternary codes for sparse designs of error-correcting output codes.” *Pattern
Recogn*. Vol. 30, Issue 3, 2009, pp. 285–297.

Calculate with arrays that have more rows than fit in memory.

Usage notes and limitations:

`loss`

does not support tall`table`

data when`Mdl`

contains kernel or linear binary learners.

For more information, see Tall Arrays.

Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™.

To run in parallel, set the `'UseParallel'`

option to `true`

.

Set the `'UseParallel'`

field of the options structure to `true`

using `statset`

and specify the `'Options'`

name-value pair argument in the call to this function.

For example: `'Options',statset('UseParallel',true)`

For more information, see the `'Options'`

name-value pair argument.

For more general information about parallel computing, see Run MATLAB Functions with Automatic Parallel Support (Parallel Computing Toolbox).

`ClassificationECOC`

| `CompactClassificationECOC`

| `fitcecoc`

| `predict`

| `resubLoss`

A modified version of this example exists on your system. Do you want to open this version instead?

You clicked a link that corresponds to this MATLAB command:

Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.

Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .

Select web siteYou can also select a web site from the following list:

Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.

- América Latina (Español)
- Canada (English)
- United States (English)

- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)

- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)