Is there an alternative for biplot to plot Principle Component Analysis (PCA) scores (like frequency/density score plot)?

4 views (last 30 days)
He
I ran Principle Component Analysis (PCA) on a large dataset (~2 500 000 x 9 matrix). This creates coefficient, score and latent data: [coeff_PCA_MWD_Blast,score_PCA_MWD_Blast,latent_PCA_MWD_Blast] = pca(numall); To gain a better understanding of my results I tried to plot this data in a biplot: biplot(coeff_PCA_MWD_Blast(:,1:2),'scores',score_PCA_MWD_Blast(:,1:2),'varlabels',{'Depth', 'Penetration Rate', 'Percussive Pressure','Feed Pressure','Damper Pressure','Rotation Speed','Rotaion Pressure','Water Flow','Water Pressure'}); Unfortunately, this fills up my pc's memory very quickly, and it is unable to produce the figure. Is there a work around this, so I can plot my PCA coefficients and scores in the same plot?
kind regards,
Jeroen

Answers (1)

Ayush Aniket
Ayush Aniket on 21 Jan 2025 at 5:47
When dealing with large datasets, plotting all the data points can indeed exhaust your system's memory. Here are a few strategies to work around this issue:
1. Subsampling: Randomly sample a subset of your data to plot. This reduces memory usage while still providing a representative visualization.
% Define the number of samples you want to plot
numSamples = 10000; % Adjust this number as needed
% Randomly select indices
idx = randperm(size(score_PCA_MWD_Blast, 1), numSamples);
% Subsample the scores
subsampledScores = score_PCA_MWD_Blast(idx, 1:2);
% Plot using the subsampled data
biplot(coeff_PCA_MWD_Blast(:, 1:2), 'scores', subsampledScores, ...
'varlabels', {'Depth', 'Penetration Rate', 'Percussive Pressure', ...
'Feed Pressure', 'Damper Pressure', 'Rotation Speed', ...
'Rotation Pressure', 'Water Flow', 'Water Pressure'});
2. Use a Scatter Plot for Scores: If you only need to visualize the scores, consider using a scatter plot for a quick look.
scatter(score_PCA_MWD_Blast(1:10000, 1), score_PCA_MWD_Blast(1:10000, 2), '.');
3. Use Incremental PCA: For very large datasets, consider using incremental PCA to process the data in chunks. MATLAB's incrementalPCA function can be helpful if you need to perform PCA without loading the entire dataset into memory. Refer the following documentation link to read about the function: https://www.mathworks.com/help/stats/incrementalpca.html

Categories

Find more on Dimensionality Reduction and Feature Extraction in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!