MATLAB Answers


How can I import data from .xls using for Principal Component Analysis (PCA)?

Asked by Michelle on 31 Jul 2014
Latest activity Commented on by Yaser Khojah on 19 Apr 2019
I have a worksheet in excel that contains the following data:
17 rows x 5 columns of data - the first row contains the title 3 of the variables (names of 3 different people), meanwhile column 1 contains the title for the 16 different possibilities. the internal matrix, which ends up being a 16x4, contains the statistic. I import the data into Matlab, but don't know if i should import as a matrix, cell array, or dataset. From there, how do I perform PCA? I want it to contain the names of the variables.
So far, I've performed this:
>> coeff = princomp(Cocktail1)
coeff =
0.0453 0.9976 0.0500 -0.0130
0.4413 -0.0662 0.8831 -0.1453
0.5300 -0.0067 -0.1275 0.8383
0.7227 -0.0172 -0.4489 -0.5253
However, the problem is it does not show me titles of the variables.
Thank you.

  1 Comment

If your version of MATLAB supports them, I would suggest using Tables.

Sign in to comment.

1 Answer

Answer by Shashank Prasanna on 1 Aug 2014
 Accepted Answer

Michelle, your approach is correct. princomp (or pca which is a newer and recommended function to use) accepts only matrices as inputs, and returns the principal component coefficients as the first output.
It is important to note however, that pca transforms your data to a new coordinate space. This means your original variable names don't make much sense for your transformed data. The second output of pca is the transformed data which is a linear combination of your original data:
If you want to simple assign some names to them, I recommend using Table as follows:
T = array2table(coeff,'VariableNames',{'Name1' 'Name2' 'Name3' 'Name4'})


Thank you for your answer! I understand that pca transforms the data onto a new coordinate space. But now, I want to plot the original data onto this new coordinate space which is where I am having trouble. Do you know how to plot the original data onto the biplot?
Dear Shashank Prasanna I need help with PCA.
I'm looking for away to related PCA to the original data to find which variables have the highest impact. I understand there are
1) COEFF: Each column contains coefficients for one PC and they are in descending order
2) Score: Each column correspond to a PC and rows are related to observations
3) latent: the PC variances. I assume the first one is related to the first PC and so on
I have spent a lot of time trying to find a way to link the PC to the original data's variable. Is there anyway from looking at the PC and finds its related variables from the original data?

Sign in to comment.