Differential Equations and Linear Algebra, 7.2b: Singular Value Decomposition, SVD
From the series: Differential Equations and Linear Algebra
Gilbert Strang, Massachusetts Institute of Technology (MIT)
The SVD factors each matrix A into an orthogonal matrix U times a diagonal matrix Σ (the singular value) times another orthogonal matrix VT: rotation times stretch times rotation.
Published: 27 Jan 2016
The previous video was about positive definite matrices. This video is also linear algebra, a very interesting way to break up a matrix called the singular value decomposition. And everybody says SVD for singular value decomposition. And what is that factoring? What are the three pieces of the SVD?
So this is the fact is every matrix, rectangular, every matrix factors into-- these are the three pieces. U sigma V transpose. People use those letters for the three factors.
The factor U is an orthogonal matrix, an orthogonal matrix. The factor sigma in the middle is a diagonal matrix. The factor V transpose on the right is also an orthogonal matrix. So I have orthogonal, diagonal, orthogonal, or physically, rotation, stretching, rotation. Now we have seen three factors for a matrix, V, lambda, V inverse. What's the difference? What's the difference between this SVD, this, and the V, lambda, V transpose, V inverse, V lambda, V inverse for diagonalizing other matrices?
So the lambda is diagonal and the sigma is diagonal, but they're different. The key point is I now have two different matrices, not just V and V inverse, but two different matrices. But the new great advantage is they are orthogonal matrices, both of them. So by going to-- and I can do it for rectangular matrices also. Eigenvalues really worked for square matrices.
Now we really are-- we have two. We have an input matrix and an output matrix. In those spaces m and n can have different dimensions. So by allowing two separate bases, we get rectangular matrices, and we get orthogonal factors with, again, a diagonal. And this is called-- these numbers sigma instead of eigenvalues, are called singular values. So these are the singular values. These are the singular vectors, the right singular vectors and the left singular vectors. That's the statement of the factorization.
But we have to think a little bit, what are those factors? What are the-- can we see why this works? So I want that. And let me do, as you see this coming, I'll look at A transpose A. I like A transpose A. So A transpose will be, I transpose this. V sigma transpose U transpose, right? That's A transpose. Then I multiply by A U sigma V transpose. And what do I have? Well, I've got six matrices.
But U transpose U in here is the identity, because U is an orthogonal matrix. So I really have just the V on one side, a sigma transpose sigma, that'll be diagonal, and a V transpose the right. This I recognize. This I recognize. Here is a single V, a diagonal matrix, a V transpose. What I'm showing you here, what we reached is the eigenvalue, the diagonalization, the usual eigenvalues are in here and the eigenvectors are in here. But the matrix is A transpose A.
Once again, A was rectangular and completely general and we couldn't see perfect results. But when we went to A transpose A, that gave us a positive semidefinite matrix, symmetric for sure. Its eigenvectors will be orthogonal. That's how I know this V matrix, the eigenvectors for this symmetric matrix, are orthogonal and the eigenvalues are positive. And they're the squares of the singular value. So this is telling me the lambdas for A transpose A are the sigma squareds for s-- for A. For A itself. Lambda is the same. Lambda for A transpose A is sigma squared for the matrix A.
Well that tells me V, that tells me sigma, and U disappeared here because U transpose U was the identity. It just went away. How would I get hold of U? Well, here's one way to see it. I multiply A times A transpose in that order, in that order. So now I have U sigma V transpose times the transpose, which is the V sigma transpose U transpose-- I'm having a lot of fun here with transposes.
But V transpose V is now the identity in the middle. So what do I learn here? I learn that U is the eigenvector matrix for AA transpose. So these have the same eigenvalues, A times B has the same eigenvalues as B times A in this case, it comes out here. Same eigenvalues. This has eigenvectors V, this has eigenvectors U, and those are the V and the U in the singular value decomposition.
Well, I have to show you an example I have to show you an example and an application, and that's it. So here's an example. Suppose A, I'll make it a square matrix, 2, 2, minus 1, 1, not symmetric. Certainly not positive definite. I wouldn't use the word because that matrix is not symmetric. But it's got an SVD, three factors.
And I work them out. This is the orthogonal matrix. I have to divide by square root of 5 to make it unit vectors. Oops, that's not going to work. How about that? The two columns are orthogonal and that's a perfectly good U. And then in the sigma, I got, well that's a-- oh, I did want 1 and 1. I did want 1 and 1, yes. So I have a singular matrix, determinant 0, singular matrix. So my eigenvalues will be 0 and it turns out square root of 10 is the other eigenvalue for that-- other singular value for this guy.
And now I'll put in the V transpose matrix, which is 1, 1, and 1, minus 1 is it? And those have length square root of 2, which I have to divide by. Well, I didn't do that so smoothly, but the result is clear. U, sigma, V transpose, so here's the sigma. And the singular values of this matrix are square root of 10 and then 0 because it's a singular matrix. And the eigenvectors, well the singular vectors of the matrix are the left singular vectors and the right singular vectors. That looks good to me.
And now the application to finish. A first application is, well, very important. All the time in this century, we're getting matrices with data in them. Maybe in life sciences, we test a bunch of sample people for genes. So I have a-- my data comes somehoe-- I have a gene expression matrix. I have samples, people, people 1, 2, 3 in those columns. And I have in the rows, let me say four rows, I have genes, gene expressions.
That would be completely normal. A rectangular matrix, because the number of people and the number of genes is not the same. And in reality, those are both very, very big numbers, so I have a large matrix. And out of it, I want to-- and each number in the matrix is telling me how much the gene is expressed by that person. We may be searching for genes causing some disease. So we take several people, some well, some with the disease, we check on the genes. We get a big matrix and we look to understand something about of it.
What can we understand? What are we looking for? We're looking for the correlation, the connection, between some combination maybe of genes and some-- we're looking for a gene people connection here. But it's not going to be person number one. We're not looking for one person. We're going to find a mixture of those people, so we're going to have sort of an eigensample, eigenpeople. Oh, that's a terrible-- eigenperson would be better. So I think I'm seeing an eigenperson. Let me see where I'm going to put this.
So yeah, I think my matrix would be written-- oh, here is the main point. That just as I see in this example, it's the first vector and the first vector and the biggest sigma that are all important. Well, in that example the other sigma was 0, nothing. But in this example, I'll probably have three different sigmas. But the largest sigma, the first, the U1 and the V1, it's that combination that I want. I want U1 sigma 1 V1 transpose, the first eigenvector of A transpose A and of AA transpose.
And the first singular, the biggest singular value, that's the information. That's the best sort of put together person, eigenperson, combination of these people and the best combination of genes. It has the-- in statistics, I would say the greatest variance. In ordinary English, I would say the most information. The most information in this big matrix is in this very special matrix with only rank one, only a single column repeated. A single row repeated, and a number sigma 1, the number that tells me that. Because remember, U is a unit vector. V is a unit vector. It's that number sigma 1 that's selling me.
So it's like that unit vector times that number, key number, times that unit vector, that's this. I'm talking here about principle component analysis. I'm looking for the principle component, this part. Principle component analysis. A big application in applied statistics. You know, in large scale drug tests, statisticians really have a central place here. And this is on the research side, to find the-- get the information out of a big sample.
So U1 is sort of a combination of people. V1 is a combination of genes. Sigma 1 is the biggest number I can get. So that's PCA, all coming from the singular value decomposition. Thank you.