Finding optimal regression tree using hyperparameter optimization

3 views (last 30 days)
I am calculating propensity scores using fitrensemble. I am interested in finding the tree with the lowest test RMSE (as I am using the resulting model to predict outcomes in a very large second dataset). I am currently using hyperparameter optimization to find the optimal tree using the below code:
% Optimize for model
rng default
propensity_final = fitrensemble(X,Y,...
'Learner',templateTree('Surrogate','on'),...
'Weights',W,'OptimizeHyperparameters',{'Method','NumLearningCycles','MaxNumSplits','LearnRate'},...
'HyperparameterOptimizationOptions',struct('Repartition',true,...
'AcquisitionFunctionName','expected-improvement-plus'));
loss_final = kfoldLoss(crossval(propensity_final,'kfold',10));
However, I find that when not optimizing for the model, hence doing one of the below, the cross-validation error is lower.
% Bagged
propensity1_bag = fitrensemble(X,Y,...
'Method','Bag',...
'Learner',templateTree('Surrogate','on'),...
'Weights',W,'OptimizeHyperparameters',{'NumLearningCycles','MaxNumSplits'},...
'HyperparameterOptimizationOptions',struct('Repartition',true,...
'AcquisitionFunctionName','expected-improvement-plus'));
loss1_bag = kfoldLoss(crossval(propensity1_bag,'kfold',10));
% LSBoost
propensity1_boost = fitrensemble(X,Y,...
'Method','LSBoost',...
'Learner',templateTree('Surrogate','on'),...
'Weights',W,'OptimizeHyperparameters',{'NumLearningCycles','MaxNumSplits','LearnRate'},...
'HyperparameterOptimizationOptions',struct('Repartition',true,...
'AcquisitionFunctionName','expected-improvement-plus'));
loss1_boost = kfoldLoss(crossval(propensity1_bag,'kfold',10));
What is the objective (best so far and estimated) that the function tries to minimize? And why are loss1_boost and loss1_bag lower than loss_final? How do I know which model to use?
Thank you!

Accepted Answer

Don Mathis
Don Mathis on 24 May 2017
Edited: Don Mathis on 24 May 2017
My guess is that your first run was worse because it was not run for enough iterations. The default MaxObjectiveEvaluations is 30 iterations, but since your first optimization searches a larger space (including a categorical variable) you should probably multiply that a few times. You're also using 'Repartition'=true which calls for more iterations. Try running it for at least 100 iterations. The more the better as time permits. You can pass MaxObjectiveEvaluations inside HyperparameterOptimizationOptions.
The objective being minimized for regression is log(1 + MSE) computed on the validation set. By default that's 5-fold crossvalidation. That's mentioned near the bottom of the OptimizeHyperparameters section on this doc page: http://www.mathworks.com/help/stats/fitrensemble.html#input_argument_d0e360201 Your final calls to kfoldLoss will return MSE, which will differ from the objective function values.
In any case, you should use the model that has the lowest cross-validated MSE no matter how you found it.
  2 Comments
Don Mathis
Don Mathis on 25 May 2017
That's the minimum of the Gaussian Process model of the objective function that bayesopt fits under the hood. Noise is estimated and taken into account, so the minimum of the model is usually higher than the best observed value. It's a better estimate of the true minimum than the observed minimum is.

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!