Cannot test model using cross validation using crossval and kFoldLoss

3 views (last 30 days)
I am very new to machine learning, but due to my course I have followed the materials and been able to fit a random forest on my data, and get an error rate that makes sense (beats a dumb prediction and gets better with better chosen features).
My predictor matrix (zscored, this is a subset) is:
-0.0767889379600161 1.43666113298993 4.83220576535887 4.59650550158967
-0.0767889379600161 -0.114493297876403 -0.217229093905045 -0.187718580390875
-0.0767889379600161 -0.114493297876403 -0.217229093905045 -0.187718580390875
-0.0767889379600161 -0.114493297876403 -0.187208672625236 -0.00955946380486005
-0.0767889379600161 -0.114493297876403 -0.217229093905045 -0.187718580390875
-0.0767889379600161 -0.114493297876403 -0.217229093905045 -0.187718580390875
7.39424877391969 1.12643024681666 -0.145180082833503 -0.187718580390875
-0.0767889379600161 2.05712290533646 -0.211225009649084 -0.187718580390875
-0.0767889379600161 0.195737588296863 1.35584098115696 0.229434473078818
And my response is:
'Highly Active'
'Inactive'
'Inactive'
'Inactive'
'Inactive'
'Highly Active'
'Highly Active'
'Highly Active'
'Inactive'
'Highly Active'
'Inactive'
'Highly Active'
My previous method was:
rng default
c = cvpartition(catresponse, 'HoldOut', 0.3);
% Extract the indices of the training and test sets.
trainIdx = training(c);
testIdx = test(c);
% Create the training and test data sets.
XTrain = predictormatrix(trainIdx, :);
XTest = predictormatrix(testIdx, :);
yTrain = catresponse(trainIdx);
yTest = catresponse(testIdx);
% Create an ensemble of 100 trees.
forestModel = fitensemble(XTrain, yTrain, 'Bag', 100,...
'Tree', 'Type', 'Classification');
% Predict and evaluate the ensemble model.
forestPred = predict(forestModel, XTest);
% errs = forestPred ~= yTest;
% testErrRateForest = 100*sum(errs)/numel(errs);
% display(testErrRateForest)
% Perform 10-fold cross validation.
cvModel = crossval(forestModel); % 10-fold is default
cvErrorForest = 100*kfoldLoss(cvModel);
display(cvErrorForest)
% Confusion matrix.
C = confusionmat(yTest, forestPred);
figure(figOpts{:})
imagesc(C)
colorbar
colormap('cool')
[Xgrid, Ygrid] = meshgrid(1:size(C, 1));
Ctext = num2str(C(:));
text(Xgrid(:), Ygrid(:), Ctext)
labels = categories(catresponse);
set(gca, 'XTick', 1:size(C, 1), 'XTickLabel', labels, ...
'YTick', 1:size(C, 1), 'YTickLabel', labels, ...
'XTickLabelRotation', 30, ...
'TickLabelInterpreter', 'none')
xlabel('Predicted Class')
ylabel('Known Class')
title('Forest Confusion Matrix')
Questions:
  • Am I doing my cross validation in the right way - my cvLoss code is based on a model built using the 30% holdout, and not something like cvpartition KFold so I am concerned about what cvLoss is actually calculating here.
  • Is my cross validation confusion matrix based on the cross validation, or the simpler holdout version with the above code?
  • How can I alter my code so that the whole model is "cross validated"?

Answers (0)

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!