Resubstitution classification edge for naive Bayes classifier



e = resubEdge(Mdl) returns the resubstitution Classification Edge (e) for the naive Bayes classifier Mdl using the training data stored in Mdl.X and the corresponding class labels stored in Mdl.Y.

The classification edge is a scalar value that represents the weighted mean of the Classification Margins.


collapse all

Estimate the resubstitution edge (the average in-sample classification margin) of a naive Bayes classifier.

Load the fisheriris data set. Create X as a numeric matrix that contains four petal measurements for 150 irises. Create Y as a cell array of character vectors that contains the corresponding iris species.

load fisheriris
X = meas;
Y = species;
rng('default') % for reproducibility

Train a naive Bayes classifier using the predictors X and class labels Y. A recommended practice is to specify the class names. fitcnb assumes that each predictor is conditionally and normally distributed.

Mdl = fitcnb(X,Y,'ClassNames',{'setosa','versicolor','virginica'})
Mdl = 
              ResponseName: 'Y'
     CategoricalPredictors: []
                ClassNames: {'setosa'  'versicolor'  'virginica'}
            ScoreTransform: 'none'
           NumObservations: 150
         DistributionNames: {'normal'  'normal'  'normal'  'normal'}
    DistributionParameters: {3x4 cell}

  Properties, Methods

Mdl is a trained ClassificationNaiveBayes classifier.

Estimate the resubstitution edge.

e = resubEdge(Mdl)
e = 0.8944

The average of the training sample margins is approximately 0.89. This result indicates that the classifier labels the in-sample observations with high confidence.

The classifier edge measures the average of the classifier margins. One way to perform feature selection is to compare training sample edges from multiple models. Based solely on this criterion, the classifier with the highest edge is the best classifier.

Load the ionosphere data set. Remove the first two predictors for stability.

load ionosphere
X = X(:,3:end);

Define these two data sets:

  • fullX contains all predictors.

  • partX contains the 10 most important predictors.

fullX = X;
idx = fscmrmr(X,Y);
partX = X(:,idx(1:10));

Train a naive Bayes classifier for each predictor set.

FullMdl = fitcnb(fullX,Y);
PartMdl = fitcnb(partX,Y);

FullMdl and PartMdl are trained ClassificationNaiveBayes classifiers.

Estimate the training sample edge for each classifier.

fullEdge = resubEdge(FullMdl)
fullEdge = 0.6554
partEdge = resubEdge(PartMdl)
partEdge = 0.7796

The edge of the classifier trained on the 10 most important predictors is larger. This result suggests that the classifier trained using only those predictors has a better in-sample fit.

Input Arguments

collapse all

Full, trained naive Bayes classifier, specified as a ClassificationNaiveBayes model trained by fitcnb.

More About

collapse all

Classification Edge

The classification edge is the weighted mean of the classification margins.

If you supply weights, then the software normalizes them to sum to the prior probability of their respective class. The software uses the normalized weights to compute the weighted mean.

When choosing among multiple classifiers to perform a task such as feature section, choose the classifier that yields the highest edge.

Classification Margins

The classification margin for each observation is the difference between the score for the true class and the maximal score for the false classes. Margins provide a classification confidence measure; among multiple classifiers, those that yield larger margins (on the same scale) are better.

Posterior Probability

The posterior probability is the probability that an observation belongs in a particular class, given the data.

For naive Bayes, the posterior probability that a classification is k for a given observation (x1,...,xP) is



  • P(X1,...,XP|y=k) is the conditional joint density of the predictors given they are in class k. Mdl.DistributionNames stores the distribution names of the predictors.

  • π(Y = k) is the class prior probability distribution. Mdl.Prior stores the prior distribution.

  • P(X1,..,XP) is the joint density of the predictors. The classes are discrete, so P(X1,...,XP)=k=1KP(X1,...,XP|y=k)π(Y=k).

Prior Probability

The prior probability of a class is the assumed relative frequency with which observations from that class occur in a population.

Classification Score

The naive Bayes score is the class posterior probability given the observation.

Introduced in R2014b