With cvpartition, how to stratify the partitions with respect to more than one variable (with respect to class label and some other label)

27 views (last 30 days)
I am trying to use the convenient cvpartition object to have fitclinear internally perform cross-validation (more precisely for hyper parameter optimization). My data is grouped, with equal number of the two class label in each group. I need the Kfold partitioning of these data to be stratified with respect to both class labels and group label, such that at each fold: 1) class labels are balanced 2) any group label never show samples in both the train and test subsample.
More visually, below is an example of one possible partition, whith samples as rows in cLabel (the class labels), gLabel (the group labels) and kLabel (the index of the fold in which the sample is assigned to the test subsample):
>> cLabel = [1 1 1 1 1 1 2 2 2 2 2 2];
gLabel = [1 1 2 2 3 3 1 1 2 2 3 3 ];
kLabel = [3 3 2 2 1 1 2 2 3 3 1 1];
[cLabel' gLabel' kLabel']
ans =
1 1 1
1 1 1
1 2 2
1 2 2
1 3 3
1 3 3
2 1 2
2 1 2
2 2 3
2 2 3
2 3 1
2 3 1
I would be happy to manually specify values in a cvpartition object and then pass it to fitclinear. I tried some hack found to do so in another post (https://www.mathworks.com/matlabcentral/answers/203155-how-to-manually-construct-or-modify-a-cross-validation-object-in-matlab), but still was not able to manually change the cvpartition object. :-(
Any idea please?

Answers (1)

Cris LaPierre
Cris LaPierre on 2 Nov 2021
Edited: Cris LaPierre on 2 Nov 2021
The documenation seems to indicate that grouping can only be done on a single variable. The workaround, then, might be to use findgroups to create a new grouping variable based on the values in several variables.
cLabel = [1 1 1 1 1 1 2 2 2 2 2 2];
gLabel = [1 1 2 2 3 3 1 1 2 2 3 3 ];
% Group by cLable and gLabel
G = findgroups(cLabel, gLabel)
G = 1×12
1 1 2 2 3 3 4 4 5 5 6 6
% Create partition based on grouping variable G
c = cvpartition(G,'Kfold',2,'stratify',true)
c =
K-fold cross validation partition NumObservations: 12 NumTestSets: 2 TrainSize: 6 6 TestSize: 6 6
% Inspect assignment for first fold
training(c,1)
ans = 12×1 logical array
1 0 0 1 1 0 0 1 1 0
test(c,1)
ans = 12×1 logical array
0 1 1 0 0 1 1 0 0 1

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!