How can I import data from .xls using for Principal Component Analysis (PCA)?

10 views (last 30 days)
Hello,
I have a worksheet in excel that contains the following data:
17 rows x 5 columns of data - the first row contains the title 3 of the variables (names of 3 different people), meanwhile column 1 contains the title for the 16 different possibilities. the internal matrix, which ends up being a 16x4, contains the statistic. I import the data into Matlab, but don't know if i should import as a matrix, cell array, or dataset. From there, how do I perform PCA? I want it to contain the names of the variables.
So far, I've performed this:
>> coeff = princomp(Cocktail1)
coeff =
0.0453 0.9976 0.0500 -0.0130
0.4413 -0.0662 0.8831 -0.1453
0.5300 -0.0067 -0.1275 0.8383
0.7227 -0.0172 -0.4489 -0.5253
However, the problem is it does not show me titles of the variables.
Thank you.

Accepted Answer

Shashank Prasanna
Shashank Prasanna on 1 Aug 2014
Michelle, your approach is correct. princomp (or pca which is a newer and recommended function to use) accepts only matrices as inputs, and returns the principal component coefficients as the first output.
It is important to note however, that pca transforms your data to a new coordinate space. This means your original variable names don't make much sense for your transformed data. The second output of pca is the transformed data which is a linear combination of your original data:
If you want to simple assign some names to them, I recommend using Table as follows:
T = array2table(coeff,'VariableNames',{'Name1' 'Name2' 'Name3' 'Name4'})
  2 Comments
Michelle
Michelle on 1 Aug 2014
Thank you for your answer! I understand that pca transforms the data onto a new coordinate space. But now, I want to plot the original data onto this new coordinate space which is where I am having trouble. Do you know how to plot the original data onto the biplot?
Yaser Khojah
Yaser Khojah on 19 Apr 2019
Dear Shashank Prasanna I need help with PCA.
I'm looking for away to related PCA to the original data to find which variables have the highest impact. I understand there are
1) COEFF: Each column contains coefficients for one PC and they are in descending order
2) Score: Each column correspond to a PC and rows are related to observations
3) latent: the PC variances. I assume the first one is related to the first PC and so on
I have spent a lot of time trying to find a way to link the PC to the original data's variable. Is there anyway from looking at the PC and finds its related variables from the original data?

Sign in to comment.

More Answers (0)

Categories

Find more on Dimensionality Reduction and Feature Extraction in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!