Does k-fold cross validation in the Classification Learner app stratify the data by default?

3 views (last 30 days)
I would like to know whether when using k-fold cross validation in the Classification Learner app, the data are stratified by default or not.
If one selects the "Generate Function" option in the app, the resulting script uses the next function for cross validation:
%Perform cross-validation
partitionedModel = crossval(trainedClassifier.ClassificationEnsemble, 'KFold', 5);
According to Mathworks official resources for the function "crossval", an alternative to the argument 'KFold' is 'Stratify' which would perform a stratification of the dataset explicitly. Does anybody know how this works in the Classification Learner app internally?
Thanks

Answers (1)

Sameer
Sameer on 30 Jun 2025
Yes, the Classification Learner app in MATLAB uses stratified k-fold cross-validation by default when performing classification tasks.
Here’s how it works:
  • When you select cross-validation in the app, it internally uses the "cvpartition" function with the response variable (class labels) as the grouping variable.
  • In "cvpartition" when a grouping variable is provided, stratification is automatically applied. This ensures that each fold maintains the same class distribution as the original dataset.
  • Even in the generated code (e.g., crossval(trainedClassifier.ClassificationEnsemble, 'KFold', 5)), the crossval function uses the model’s response variable behind the scenes, which results in stratified partitioning.
This behavior is specific to classification problems. For regression tasks, stratification is not applied because there are no discrete class labels to stratify by.
For more details, please refer to the MathWorks documentation:
Hope this helps!

Products


Release

R2025a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!