I’m using fitrtree for a regression tree and want to know the exact tie-breaking rule for choosing a split when multiple candidates have the same gain
27 views (last 30 days)
Show older comments
Hello,
I’m building a regression decision tree using fitrtree, and I’m a bit confused about the training behavior.
My understanding is that, at each node, the algorithm selects the split that maximizes the gain (variance reduction). However, when multiple candidate splits yield the same gain, how does fitrtree decide which split to use?
With the dataset below (six predictors and one response), fitrtree selected X1 = 1.04118541 as the split point. I’d like to know how this threshold was determined in this tie situation.
Thank you for your help.
___________________________________________________________________________________________
clc,clear
% MATLAB R2024b, Statistics and Machine Learning Toolbox
X = [1.009251097 1.80536032 1.165564448 1.25110136 1.467981461 1.006953748;
1.241451804 1.997602319 1.171344538 1.005294421 1.095276725 1.037755529;
1.310319355 1.122118985 1.154557899 1.874069515 1.666341109 1.044814097;
1.073119724 1.616009713 1.162926814 1.860931615 1.256857217 1.077478564;
1.234488925 1.953343979 1.160838383 1.605687801 1.315321605 1.110525849;
1.163513021 1.820879307 1.142907429 1.918040828 1.061789429 1.11876148;
];
Y = [1.02302277;
1.008476382;
1.011549694;
1.003012986;
1.001301743;
1.012252274;
];
tree = fitrtree(X, Y, ...
'MinLeafSize',1, 'MinParentSize',2, ...
'PredictorNames', {'X1','X2','X3','X4','X5','X6'});
tree.CutPredictorIndex(1)
tree.CutPoint(1)
0 Comments
Answers (1)
shantanu
on 14 Aug 2025 at 9:11
Hi!
Below documentation mentions in detail what algorithm “fitrtree” implements and the split criteria in detail.
In your case, look at point 3 of this node splitting rules in above documentation.
“fitrtree sorts xi in ascending order. Each element of the sorted predictor is a splitting candidate or cut point.”
So here the way CART algorithm would work is the best candidate for splitting (or CutPoint) is decided based on whichever Xi results in maximum variance/MSE reduction.
It traverses over potential candidates in the ascending order of “xi”s as mentioned in point 3 and whenever there is an increase in information gain, the previously set best candidate is replaced by current candidate.
Therefore in case of a tie breaking, the “Xi” occurring before in the sorted ascending order would get priority to be set as the cutpoint/splitting candidate. Therefore X1 is preferred over X6.
Hope this helps!
0 Comments
See Also
Categories
Find more on Get Started with Statistics and Machine Learning Toolbox in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!