the relation between score and probability for ensemble classification

Question

1 vote

I am dealing with a binary classification problem and I want to get the probability of prediction using RUSBoost algorithm. http://www.mathworks.com/help/stats/compactclassificationensemble.predict.html according to this doc, it says the score generated by each tree is the probability of this observation originating from this class computed as the fraction of observations of this class in a tree leaf. predict averages these scores over all trees in the ensemble. However, I cannot see that from the score I get. For the 2 classes, the sum of scores is not 1, and sometimes score is larger than 1. It seems that the classification is decided by the larger score for the 2 classes.

So what is the right way to understand the score? How can I infer the probability from the score? Is there any way to do it? Thanks!

1 Comment
Show -1 older comments Hide -1 older comments

Louis on 6 Nov 2023

I have the same question.

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Shubham on 10 Nov 2023

0 votes

Hi Hui,

The RUSBoost algorithm is an implementation of the AdaBoost algorithm with random undersampling. In terms of interpreting the scores generated by each tree in the ensemble, it's important to note that these scores do not directly represent probabilities. Instead, they reflect the strength or confidence of the prediction for each class.

The scores generated by each tree in the ensemble are combined and averaged to obtain the final prediction. In the case of binary classification, the class with the higher score is typically assigned as the predicted class.

To infer probabilities from the scores, you can use a technique called Platt scaling or sigmoid calibration. Platt scaling involves fitting a logistic regression model to the scores generated by the ensemble, using the true class labels as the response variable. This calibrated model can then be used to estimate the probabilities for each class.

Here's a step-by-step approach to perform Platt scaling:

Collect the scores generated by the RUSBoost ensemble for your dataset.
Split your dataset into a training set and a validation set.
Fit a logistic regression model using the scores as the predictor variable and the true class labels as the response variable, using the training set.
Predict the probabilities using the fitted logistic regression model for the validation set.
Evaluate the performance of the calibrated probabilities using appropriate metrics (e.g., log loss, Brier score) and adjust the model if necessary.

By applying Platt scaling, you can obtain calibrated probabilities that reflect the likelihood of an observation belonging to each class.

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

the relation between score and probability for ensemble classification

1 Comment
Show -1 older comments Hide -1 older comments

Answers (1)

0 Comments
Show -2 older comments Hide -2 older comments

Categories

Tags

Community Treasure Hunt

the relation between score and probability for ensemble classification

1 Comment Show -1 older comments Hide -1 older comments

Answers (1)

0 Comments Show -2 older comments Hide -2 older comments

Categories

Tags

See Also

Community Treasure Hunt

1 Comment
Show -1 older comments Hide -1 older comments

0 Comments
Show -2 older comments Hide -2 older comments