MATLAB Answers

Please provide in depth details on PCA MATLAB function

2 views (last 30 days)
Hello Team,
I have a Cancer and Benign Dataset that does not cluster using the peak value feature extracted from the signals as shown in the figure below.
But after performing the PCA both the groups are seperating out well. Could you please please provide the reason behind this on how does PCA function in matlab actually calculates the values.
I have gone through the PCA function but not getting more insight on how the correlated data gets sepearted using PCA
Before PCA
After PCA


Sign in to comment.

Accepted Answer

Neuropragmatist on 10 Sep 2019
How many features do you give the PCA when you give it the dataset? Just the XY values you plotted in the first graph? Or do you also give it some other measurements? Do you give it the group identity (i.e the blue/orange colours)?
Maybe you could also show us the code you used to get the above result.


Ashwini Amin
Ashwini Amin on 11 Sep 2019
I give 8 features into my dataset.
below is the PCA code:
[coeffC, scoreC, latentC,tsquaredC,explainedC]=pca(Cancer);
[coeffN, scoreN, latentN,tsquaredN,explainedN]=pca(Normal);
This code gives me the PCA components
Neuropragmatist on 12 Sep 2019
OK, so I have multiple points/questions:
1) Because you give multiple variables to pca, there is no real way to know what the resulting principal components represent (because the represent a multivariate combination of variables). So asking why PCA clusters your two datasets in the way that it does is not really easy to answer. Usually we use PCA to find novel ways to cluster the data through dimensionality reduction, not find ways to cluster the data based on already existing dimensions.
2) This makes me wonder what your goal is here overall? What are you trying to get from the PCA?
3) You are running PCA on your cancer and normal groups seperately, but then plotting the results in one graph. That doesn't make a lot of sense to me (but I'm willing to be corrected) because the principal components found for one group may, and probably will be completely unrelated to the other group. i.e. maybe feature 1 explains the greatest variance in the cancer dataset but feature 5 explains the most in the normal data set. In which case the axes of your graph will not make 1 lick of sense.
4) Related to this, what are you plotting in your first graph? Do you expect some relationship between those variables and the PCA results? Because that won't be the case.
5) I really don't mean this in a derisory way, but maybe you need to understand the basics of PCA before embarking on this analysis. Perhaps you should consult someone who has either done what you are trying to do already or who knows PCA very well.
Ashwini Amin
Ashwini Amin on 13 Sep 2019
Thank you Neuropragmatist for your response.
Basically i have statistical features extracted from Cancer and Normal data as shown in figure 1 which is of multiple dimension. I want to reduce the dimension using PCA. So i was calculating the PC scores of Cancer and Normal seperately and plotting it together to see if it clusters together or sepeartely. Basically wanted to see if both data are unique.
Can you please advise if i am doing PCA wrong.

Sign in to comment.

More Answers (0)