Classification Score "fitcensemble" with Decision Trees - Ambiguous Matlab Documentation

2 views (last 30 days)
Hey,
I try to figure out how the classification score is calculated when using cecision trees with the fitcensemble function. In my opinion, the following link is ambiguous:
First, it is said that the score is equal to the following:
"A matrix with one row per observation and one column per class. For each observation and each class, the score generated by each tree is the probability of this observation originating from this class computed as the fraction of observations of this class in a tree leaf. predict averages these scores over all trees in the ensemble"
However this definition would end up (in my understanding) in a score element of [0,1] which is not the case when applying fitcensemble. Instead, a ScoreTransform is required as explained in https://de.mathworks.com/matlabcentral/answers/395526-how-do-i-obtain-scores-as-probabilistic-estimates-using-the-predict-function-on-a-fitcensemble-model.
Furthermore, the first link also provides the following definition: "Different ensemble algorithms have different definitions for their scores. Furthermore, the range of scores depends on ensemble type."
So could anyone explain what the real definition of score is when using fitcensemble with Decision Trees (does it depend on Boosting or Bagging?)
Thanks for your help!

Answers (1)

Aditya Patil
Aditya Patil on 20 Aug 2020
The statement about score in Output Arguments section of compact classification ensemble is about individual trees. Trees do indeed give probability as score,
load fisheriris.mat
mdl = fitctree(meas, species);
[~, score] = predict(mdl, meas);
sum(score, 2)
However, in case of ensemble, this depends upon how the ensemble technique calculates score. This is explained in the document for ensemble algorithms. This answer explains how to convert these scores to probabilities. Note that it might not be trivial/obvious how to do so in all cases.
  2 Comments
Dario Walter
Dario Walter on 20 Aug 2020
Dear Aditya,
thanks for your reply. The Output Arguments section you mentioned is not only about individual trees. The last sentence is "... predict averages these scores over all trees in the ensemble". This should be changed since, as you mentioned, the score depends on the ensemble technique.

Sign in to comment.

Products


Release

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!