MATLAB Answers

Need Predictor Importance in Random Forest Expressed as a Percentage

4 views (last 30 days)
CMatlabWold on 25 Mar 2021
Commented: CMatlabWold on 9 Apr 2021
Hi. I'm running a code to see the importance of demographics (Predictors) on my response (Complaints). I need to express the importance as percentage, as a scale of 0 to 1 (or 0% to 100%). This is the figure I am getting is attached as "RF Importance Chart". My predictors data is attached as "PredictorsOnly.xlsx" and my response data is attached as "TotalComplaintsRF.xlsx"
X = readtable('PredictorsOnly.xlsx','PreserveVariableNames',true)
Y = readtable('TotalComplaintsRF.xlsx','PreserveVariableNames',true)
t = templateTree('NumVariablesToSample','all',...
rng(1); % For reproducibility
Mdl = fitrensemble(X,Y,'Method','Bag','NumLearningCycles',200, ...
yHat = oobPredict(Mdl);
R2 = corr(Mdl.Y,yHat)^2
impOOB = oobPermutedPredictorImportance(Mdl);
title('Unbiased Predictor Importance Estimates')
xlabel('Predictor variable')
h = gca;
h.XTickLabel = Mdl.PredictorNames;
h.XTickLabelRotation = 45;
h.TickLabelInterpreter = 'none';

Accepted Answer

Pratyush Roy
Pratyush Roy on 9 Apr 2021
oobPermutedPredictorImportance normalizes the predictor importance by the standard error (this is common practice in the field), therefore values are not strictly scaled between 0 and 1. However one can rescale predictor importance, for example:
imp(imp<0) = 0;
imp = imp./sum(imp);
Hope this helps!

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!