Why is my accuracy of trained classifier using function generated from classification learner is less than the model directly exported from the classification learner app?

2 views (last 30 days)
load("savedPumpData.mat");
disp(pumpData);
Data = removevars(pumpData,"flow");
save("Data.mat","Data");
disp(Data)
trainRatio = 0.7;
% Create a random partition of the data into training and test sets
c = cvpartition(size(Data, 1), 'HoldOut', 1 - trainRatio);
% Create the training and test sets
trainingData = Data(c.training, :);
testData = Data(c.test, :);
[featureTableTrain,outputTable0] = Features(trainingData);
disp(featureTableTrain)
[trainedClassifier, validationAccuracy] = BagTrees(featureTableTrain);
[featureTableTest,outputTable] = Features(testData);
disp(featureTableTest)
[yfit,scores] = BaggedTress.predictFcn(featureTableTest);
disp(yfit);
accuracy = sum(yfit==testData.faultCode)/numel(testData.faultCode)*100;
fprintf('Accuracy: %.2f%%\n', accuracy);
figure;
confusionchart(testData.faultCode, yfit);
title('Confusion Matrix RF');
[yfit1,scores1] = trainedClassifier.predictFcn(featureTableTest);
disp(yfit1);
accuracy = sum(yfit1==testData.faultCode)/numel(testData.faultCode)*100;
fprintf('Accuracy: %.2f%%\n', accuracy);
figure;
confusionchart(testData.faultCode, yfit1);
title('Confusion Matrix');
%Feature is the function code generated using Diagnostic feature designer
%BaggedTrees is the model exported to workspace using classification learner getting 90% accuracy
%BagTrees is the generated function code of the same model which is exported getting 70%
  1 Comment
Vinay Maruvada
Vinay Maruvada on 19 Oct 2023
I have datasest of total 240 rows which i spitted as mentioned in above code
I have imported featureTableTrain into the Classification learner for training and featureTableTest for Testing the data

Sign in to comment.

Accepted Answer

Drew
Drew on 18 Oct 2023
Based on what you sent, it looks like the short answer is that the model exported from Classification Learner was trained on all of the data (100%), while the model trained with the training function was trained with 70% of the data.
The final model Classification Learner exports is always trained using the full data set, excluding any data reserved for testing (See https://www.mathworks.com/help/stats/export-classification-model-for-use-with-new-data.html ). If you don't want Classification Learner to use the holdout validation data when training its final model for export, then do the following:
  • Start the Classification Learner session by loading only the training data (70%). Choose whichever validation scheme you would like to use within this 70% of data.
  • After the session is started, load the remaining 30% of the data as the test set.
  • Then, when the final model is exported, it will be trained on only 70% of the data.
When exporting the model, if you check the box to "Include training data in the exported model", then you can take a look at the size of the training data by examining the properties of the exported model. For example, if the exported trainedModel is an ensemble of trees, take a look at:
size(trainedModel.ClassificationEnsemble.X)
If this answer helps you, please remember to accept the answer.

More Answers (0)

Products


Release

R2023a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!