fitrgp: hyperparamter optimization method maximum likelihood & cross-validation

44 views (last 30 days)
Hi,
I am wondering how fitrgp optimizes hyperparameters.
According to the instructions in this link : https://kr.mathworks.com/help/stats/fitrgp.html
gprMdl2 = fitrgp(x,y,'KernelFunction','squaredexponential',...
'KernelParameters',kparams0,'Sigma',sigma0);
This code: ' The marginal log likelihood that fitrgp maximizes to estimate GPR parameters has multiple local solution '
That means fitrgp use maximum likelihood estimation (MLE) to optimize hyperparameter.
But in this code,
gprMdl2 = fitrgp(x,y,'KernelFunction','squaredexponential',...
'OptimizeHyperparameters','auto','HyperparameterOptimizationOptions',...
struct('AcquisitionFunctionName','expected-improvement-plus'));
This code : ' Find hyperparameters that minimize five-fold cross-validation loss by using automatic hyperparameter optimization.'
That means fitrgp use cross-validation (CV) to optimize hyperparameters. is this right?
So when train GPR models, there are MLE and CV methods to optimize hyperparameters. Just use fitrgp, use MLE to optimize hyperparameters, and if i put options separately in fitrgp like above code, i can optimize hyperparameters using CV instead of MLE. Did i get it right?

Accepted Answer

Don Mathis
Don Mathis on 10 Jan 2019
The hyperparameters and the objective function are different in the 2 cases.
  • When you do pass 'OptimizeHyperparameters', it will optimize the parameters you specify, which is some subset of {'BasisFunction','KernelFunction','KernelScale','Sigma','Standardize'}, using Bayesian Optimization, minimizing the out-of-sample MSE as measured using cross-validation.
There is some overlap between the 2 sets of parameters. Sigma is the same, and KernelScale corresponds to the length scales inside the kernel functions.
  1 Comment
Mahmoud
Mahmoud on 3 Apr 2024 at 17:17
Thanks for the answer, is there a way to specify how many fold cross-validation will be used in the optimization process? I do have a small data set (around 14 points) and using 5 of them in the corss-validation process will remove "A lot" of the useful infromation? any idea about what's the best solution to deal with such a small dataset? Also, what's the used objective function that is being optimized in this case?
Thanks

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!