Cannot test model using cross validation using crossval and kFoldLoss
3 views (last 30 days)
Show older comments
I am very new to machine learning, but due to my course I have followed the materials and been able to fit a random forest on my data, and get an error rate that makes sense (beats a dumb prediction and gets better with better chosen features).
My predictor matrix (zscored, this is a subset) is:
-0.0767889379600161 1.43666113298993 4.83220576535887 4.59650550158967
-0.0767889379600161 -0.114493297876403 -0.217229093905045 -0.187718580390875
-0.0767889379600161 -0.114493297876403 -0.217229093905045 -0.187718580390875
-0.0767889379600161 -0.114493297876403 -0.187208672625236 -0.00955946380486005
-0.0767889379600161 -0.114493297876403 -0.217229093905045 -0.187718580390875
-0.0767889379600161 -0.114493297876403 -0.217229093905045 -0.187718580390875
7.39424877391969 1.12643024681666 -0.145180082833503 -0.187718580390875
-0.0767889379600161 2.05712290533646 -0.211225009649084 -0.187718580390875
-0.0767889379600161 0.195737588296863 1.35584098115696 0.229434473078818
And my response is:
'Highly Active'
'Inactive'
'Inactive'
'Inactive'
'Inactive'
'Highly Active'
'Highly Active'
'Highly Active'
'Inactive'
'Highly Active'
'Inactive'
'Highly Active'
My previous method was:
rng default
c = cvpartition(catresponse, 'HoldOut', 0.3);
% Extract the indices of the training and test sets.
trainIdx = training(c);
testIdx = test(c);
% Create the training and test data sets.
XTrain = predictormatrix(trainIdx, :);
XTest = predictormatrix(testIdx, :);
yTrain = catresponse(trainIdx);
yTest = catresponse(testIdx);
% Create an ensemble of 100 trees.
forestModel = fitensemble(XTrain, yTrain, 'Bag', 100,...
'Tree', 'Type', 'Classification');
% Predict and evaluate the ensemble model.
forestPred = predict(forestModel, XTest);
% errs = forestPred ~= yTest;
% testErrRateForest = 100*sum(errs)/numel(errs);
% display(testErrRateForest)
% Perform 10-fold cross validation.
cvModel = crossval(forestModel); % 10-fold is default
cvErrorForest = 100*kfoldLoss(cvModel);
display(cvErrorForest)
% Confusion matrix.
C = confusionmat(yTest, forestPred);
figure(figOpts{:})
imagesc(C)
colorbar
colormap('cool')
[Xgrid, Ygrid] = meshgrid(1:size(C, 1));
Ctext = num2str(C(:));
text(Xgrid(:), Ygrid(:), Ctext)
labels = categories(catresponse);
set(gca, 'XTick', 1:size(C, 1), 'XTickLabel', labels, ...
'YTick', 1:size(C, 1), 'YTickLabel', labels, ...
'XTickLabelRotation', 30, ...
'TickLabelInterpreter', 'none')
xlabel('Predicted Class')
ylabel('Known Class')
title('Forest Confusion Matrix')
Questions:
- Am I doing my cross validation in the right way - my cvLoss code is based on a model built using the 30% holdout, and not something like cvpartition KFold so I am concerned about what cvLoss is actually calculating here.
- Is my cross validation confusion matrix based on the cross validation, or the simpler holdout version with the above code?
- How can I alter my code so that the whole model is "cross validated"?
0 Comments
Answers (0)
See Also
Categories
Find more on Gaussian Process Regression in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!