How to decide value of 'ndim' when using ,[residual​s,reconstr​ucted] = pcares(X,ndim) , for Feature size reduction using PCA?

3 views (last 30 days)
I am working on feature classification using KNN and SVM. Data size is 2000 images and features are histograms (of various bin sizes ) of large size for each image. I read above function can be used but my query is how to decide value of ndim which will ensure that the best features are retained ?

Answers (1)

Ayush Aniket
Ayush Aniket on 20 Jan 2025 at 11:20
In MATLAB, you can experiment with different values of ndim (the number of retained components) to see which works best for your classification task. To choose the appropriate value of ndim you should aim to capture enough variance (from PCA) while avoiding overfitting.
To determine the value, you can apply PCA to the feature set and look at the cumulative variance explained by the principal components by plotting the explained variance:
% Assuming 'features' is a matrix of size [num_samples, num_features]
% where each row corresponds to an image, and each column is a feature
% Perform PCA on the feature set
[coeff, score, latent, ~, explained] = pca(features);
% Plot the cumulative explained variance
cumulative_variance = cumsum(explained); % Cumulative sum of explained variance
figure;
plot(1:length(cumulative_variance), cumulative_variance, 'b-', 'LineWidth', 2);
xlabel('Number of Principal Components');
ylabel('Cumulative Explained Variance (%)');
title('Explained Variance vs. Number of Principal Components');
grid on;
% Decide on the number of components to keep, based on a threshold of variance
threshold = 95; % Retain 95% of the variance
ndim = find(cumulative_variance >= threshold, 1);
For choosing ndim a typical choice is to retain enough components to explain 95% or 99% of the variance. The exact choice of ndim depends on your classification performance (e.g., using KNN or SVM) and the tradeoff between reducing dimensionality and retaining useful information.
After determining the ndim, you should evaluate the performance of your classifier (KNN or SVM) using cross-validation. This will help ensure that the reduced feature set doesn’t hurt your model's performance. Refer to the following documentation link to know the steps of performing cross-validation:

Categories

Find more on Dimensionality Reduction and Feature Extraction in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!