Predictors in classification learner app
4 views (last 30 days)
Show older comments
Hi all, I wonder about the predictor box ( red circle below) in classification learner app. Does it affect the classification results or is it just for illustration? Sometimes when I change the cells in the prediction box, the classification results change a little. Please help me to understand that, thank you :3
0 Comments
Accepted Answer
Askic V
on 24 May 2023
Edited: Askic V
on 24 May 2023
This is explaind in the Matlab documentation here:
Before you train (learn) the model, only Data option in the plot is available. After you train the model, then you cen switch between Data and Model predictions. If youswitch to Model prediction you can see which data model predicted correctly and which not. False predictions are marked with "x".
By choosing Predictor variables on X and Y axis, you can investigate which features separate classes well and which don't.
For example, features that have low predictive power will exhibit significant overlap between different class labels or show no clear separation. In this way, you can choose to omit then in the input data as predictor variables.
So in summary, this is just a visualisation tool to help you gain better understanding of the data and features predistive power.
I'll try to explain this as best as I can on an example that is already available in Matlab.
Execute this code:
load carbig
Origin = categorical(cellstr(Origin));
Origin = mergecats(Origin,["France","Japan","Germany", ...
"Sweden","Italy","England"],"NotUSA");
cars = table(Acceleration,Displacement,Horsepower, ...
Model_Year,MPG,Weight,Origin);
cars = rmmissing(cars);
An then start Classification learner App. Train the model using default settings as shown in the figure:
So the gaol is to learn model to predict origin of a car based on 6 predictor variables(Acceleration, Displacement, Horsepower, Model_Year, MPG, Weight). As you know, some of these variables don't really have a meaning when it comes to guess the country of origin, but let's confirm that.
So if you choose variables such as Acceleration and Model_Year on the scatter plot, you'll see a significant overlap, which indicates that these variables are not suitable to be used as predictors i.e. they have very low predictive power (of course that model year cannot determine the origin in any way).
So the model (Fine Tree) has accuracy about 90.1% with all 6 features. So it seems that we can omit predictors 1 and 4.
But let's confirm that. If add another Fine Tree model and use "Feature Selection" option to remove features 1 and 4 and then train the model with 4/6 features, you'll improve its accuracy a bit.
So that's it.
If you have a lot of features (wide data set), than this job by visually examining data and predictors becomes tedious, there is a built in function you can use.
Export your original model (6/6 features) to the workspace and execute the follwoing code:
impValues = predictorImportance(trainedModel.ClassificationTree);
pareto(impValues)
The result will be as shown:
You can see that the feature nr. 2 (Displacement) brings about 70% of predictive power. Features 2, 3 and 6 carry about 97% of predictive power.
I hope this answer your question.
More Answers (0)
See Also
Categories
Find more on Classification Ensembles in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!