Issues in pca transformation

4 views (last 30 days)
Tze Hei Cho
Tze Hei Cho on 10 Aug 2020
Commented: Tze Hei Cho on 14 Aug 2020
As I was trying to conduct PCA transformation for testing set based on the training set in Matlab 2019b on a Windows 10 machine, the matrix after PCA transformation seems to be different in 3 different conditions which I originally expect to be identical. I engaged in testing the behavior of pca transformation after I found that the classification accuracy in SVM was not perfectly 50% when the testing set is composed of 2 identical set of data, one coded 1 and another coded 0. That cause my suspection and later I figure that was the problem of the pca transformation I conducted.
%generate training set with normalization per every observation
X=rand(100,50000);
train=zeros(100,50000);
for i=1:100
train(i,:)=normalize(X(i,:));
end
%generate testing set with normalization per every observation
x=rand(10,50000);
test=zeros(10,50000);
for i=1:10
test(i,:)=normalize(x(i,:));
end
%compute pca coefficient based on training set
[coeff,~,latent]=pca(train);
%record if difference exist
allDiff=[]; %reocrd if exist difference among one two and three
tempDiff=[]; %reocrd if exist difference between two and three
%selection of number of pca component to be included in the transformation
for count=1:size(coeff,2)
matrix=coeff(:,1:count);
%cases
one=test*matrix;
temp=[test;test]*matrix;
two=temp(1:10,:);
three=temp(11:20,:);
%comparison on whether the three expectedly identical matrix are indeed identical
if isequal(one,two,three)==false %check if exist difference among one two and three
allDiff=[allDiff,count];
end
if isequal(two,three)==false %check if exist difference between two and three
tempDiff=[tempDiff,count];
end
end
The difference as recorded in allDiff starts at 2 pca component, while that of tempDiff at 20 pca components. Occasionally, some component count will return with identical matrixs among one, two and three.
Is this issue related to the rounding error in matrix multiplication? And more importantly, which is the correct matrix after pca transformation? (I guess that is one) Thanks.

Answers (1)

Image Analyst
Image Analyst on 10 Aug 2020
Since you're using random numbers, why do you think that exactly 50% of your points should fall into each of two classes? Your numbers are continuously valued. It's not like they're in two distinct, well separated clusters. So of course there may not be exactly 50% in each class.
  10 Comments
Image Analyst
Image Analyst on 14 Aug 2020
Either I don't know what you're doing or you don't. Because something doesn't make sense to me. With the pca() function, you pass it data and it gives you back the data in the new PC coordinate system. It figures out what the transform is, not you. So I don't understand it when you say " the same PCA coeff is used in all pca trasnformations." If you start with different sets of data, you will not end up with the same coefficients. They may be close but they will not be the same. It almost sounds like you're transforming your data, like rotating your coordinate system, and getting new coordinates and expecting/hoping that pca() will give you the same transform you used.
Tze Hei Cho
Tze Hei Cho on 14 Aug 2020
What I am trying to do here is to transform the testing data to the PC space of the training data. Therefore, the input of the pca() function is the same training data, with the output as coeff. A number of PC (n) in coeff is selected such that it explains a given level of variance. I later use this coeff to transform the testing data on the PC space of the training data by different methods (i.e. row-by-row, stand alone, or two copies stack together as [testing; testing]) through matrix multiplication (i.e. data*coeff(:,1:n)). The results of these different methods of matrix multiplication are found to be different in a scale of e-14 or less, which I now believe is an issue of float point arithmetic.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!