Hi,
I understand that you have a class imbalance in your dataset. The imbalance is further worsened by the preprocessing performed. There are several ways to deal with class imbalance. Some of them are listed below:
- Undersampling:
You can remove some of the samples of the majority class (i.e., class 1 and 2) by randomly discarding them using the “datasample” function. You can use the function by adding the following line to your code:
dataSampled = datasample(data,k,'Replace',false);
Unrecognized function or variable 'data'.
Here’s what the code does:
- “k” is the number of samples you want to select.
- Setting the “Replace” input argument to “false” ensures that the sampling is done without replacement.
Refer to the below documentation for details about data sampling:
2. Oversampling:
You can use an oversampling technique like Synthetic Minority Over-sampling Technique (SMOTE) to create synthetic samples of class 3 so that the class imbalance is resolved. For details on how to use SMOTE in MATLAB please refer to the following FileExchange submission:
3. Class weighting:
You can assign different weights to each class such that class 3 is given more importance. You can do it by using the “ClassWeights” option in your classification layer as follows:
classificationLayer(Classes=classes,ClassWeights=classWeights)
Here is what the parameters mean:
- “classes” is a vector containing all class names i.e. [1,2,3].
- “ClassWeights” is a vector containing the weights for each class. Here you can specify more weight to class 3.
- A sample “ClassWeights” vector would be: [1,2,4].
Refer to the following documentation for more details on class weighting:
4. Evaluation metrics:
Use evaluation metrics like precision, recall, F1 score and AUC-ROC so that class imbalance does not affect the model.
Hope this helps.