Sign difference between coeff=pca(X) and [~,~,v] = svd(X)

Question

njj1 on 25 Aug 2016

0
Link

Direct link to this question

https://in.mathworks.com/matlabcentral/answers/300899-sign-difference-between-coeff-pca-x-and-v-svd-x

Edited: Jonathan Chien on 18 Aug 2021

I am experiencing sign differences between my computations for principal component analysis using the pca function and the svd function. Consider this matrix:

X = magic(4)
X =
    16     2     3    13
     5    11    10     8
     9     7     6    12
     4    14    15     1
coeff = pca(X) %this will automatically save only the first few principal components
coeff =
     0.5000    0.6708    0.4458
    -0.5000   -0.2236    0.3573
    -0.5000    0.2236    0.6229
     0.5000   -0.6708    0.5344

Now when I do svd, I get the following results.

x0 = bsxfun(@minus,X,mean(X,1)); %first center the data
[~,~,v] = svd(x0)
v =
   -0.5000    0.6708   -0.4458   -0.3182
    0.5000   -0.2236   -0.3573   -0.7565
    0.5000    0.2236   -0.6229    0.5586
   -0.5000   -0.6708   -0.5344    0.1202

So I get essentially negatives of the first and third principal component. Does anyone know why this is?

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Jonathan Chien on 30 Aug 2020

2
Link

Direct link to this answer

https://in.mathworks.com/matlabcentral/answers/300899-sign-difference-between-coeff-pca-x-and-v-svd-x#answer_486947

Edited: Jonathan Chien on 18 Aug 2021

Open in MATLAB Online

What we're seeing here is a sign indeterminancy problem that is often intrinsic to methods such as singular value decomposition or eigendecomposition. Briefly, the signs of a given pair of corresponding singular vectors are mathematically indeterminant, as both they and their reflections can satisfy the factorization A = UΣV*. The sign is thus essentially arbitrarily assigned during the implementation of algorithms that carry out SVD, with different SVD algorithms capable of producing different results from the same inputs. In many applications, e.g. finding subspaces, this doesn't really matter, but in other cases it can pose nontrivial problems (for more on this, see this paper).

Within the pca.m function, MATLAB thus enforces a convention such that the largest component (by absolute magnitude) in each right singular vector/eigenvector will be positive (there are other possible conventions as well that also resolve this ambiguity). The following code accomplishes this in lines 429 to 437 of the pca.m function in version R2019b. However, the built-in svd function does not seem to make any such correction (from the svd documentation: Different machines and releases of MATLAB can produce different singular vectors that are still numerically accurate. Corresponding columns in U and V can flip their signs, since this does not affect the value of the expression A = U*S*V'.), hence the difference you observed. At any rate, if you apply the following convention to the matrix V returned by svd, you should observe results identical to the coeff matrix returned by pca, inclusive of sign. Note that in order for the factorization A = U*S*V' to hold, the same signs calculated from V should be applied to the corresponding left singular vectors in U. Also note that since principal components (which MATLAB calls "scores") can be calculated from the eigenvectors/singular vectors, either as A*V or as U*S, you should make sure to apply the convention before calculating the principal components, or apply it directly to the principal components afterwards (as is done below).

% Enforce a sign convention on the coefficients -- the largest element in
% each column will have a positive sign.
[~,maxind] = max(abs(coeff), [], 1);
[d1, d2] = size(coeff);
colsign = sign(coeff(maxind + (0:d1:(d2-1)*d1)));
coeff = bsxfun(@times, coeff, colsign);
if nargout > 1
    score = bsxfun(@times, score, colsign); % scores = score
end

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Answer 2

Christine Tobler on 1 Sep 2020

1
Link

Direct link to this answer

https://in.mathworks.com/matlabcentral/answers/300899-sign-difference-between-coeff-pca-x-and-v-svd-x#answer_487934

I'm not sure why this happens. As mentioned by others, it doesn't matter what sign these outputs have. You could try stepping through the code of pca.m in the debugger, to see what they do differently before / after the call to the SVD.

Note that the signs returned by the SVD can also change based on which machine MATLAB runs on / which MATLAB version is used (and some other machine configuration-related settings can influence this, e.g. number of threads).

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Answer 3

John D'Errico on 26 Aug 2016

0
Link

Direct link to this answer

https://in.mathworks.com/matlabcentral/answers/300899-sign-difference-between-coeff-pca-x-and-v-svd-x#answer_232803

SO? Eigenvectors are not unique down to a sign difference. WTP?

2 Comments
Show NoneHide None

njj1 on 26 Aug 2016

I guess my real question is why this occurs. It seems reasonable to expect that if two different algorithms use the same method (as these two both use SVD to find the principal components), they would get the same answer, even down to the sign. Otherwise, I think you are probably correct, so long as I am consistent in whichever result I use for further data analysis.

John D'Errico on 27 Aug 2016

Probably correct? I know I am correct in this respect. (You are not even close to the first person to trip over this issue. Not even the 1000'th person.)

There may still be many subtle differences in codes, even if both of them use svd in the core. If the matrix may be subtly different, even down in the least significant bits, then you may get different results in those signs.

Sign in to comment.

Sign difference between coeff=pca(X) and [~,~,v] = svd(X)

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

0 Comments
Show -2 older commentsHide -2 older comments

More Answers (2)

0 Comments
Show -2 older commentsHide -2 older comments

2 Comments
Show NoneHide None

See Also

Categories

Tags

Products

Community Treasure Hunt

Sign difference between coeff=pca(X) and [~,~,v] = svd(X)

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

0 Comments Show -2 older commentsHide -2 older comments

More Answers (2)

0 Comments Show -2 older commentsHide -2 older comments

2 Comments Show NoneHide None

See Also

Categories

Tags

Products

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

0 Comments
Show -2 older commentsHide -2 older comments

0 Comments
Show -2 older commentsHide -2 older comments

2 Comments
Show NoneHide None