Given a rank deficient design matrix, fitlm does not use pinv or lsqr to find the minimum norm solution. Why not?

3 views (last 30 days)
A minimum norm solution can be found even if the design matrix is rank deficient. Why does fitlm not incorporate this? Is there another in-built function for this purpose? Given a rank deficient system, it appears that one must compute everything (coefficients, CI's, etc) 'by hand' using matrix formulas. I'm surprised that Matlab does not take care of this automatically.

Answers (2)

John D'Errico
John D'Errico on 22 Dec 2015
Edited: John D'Errico on 22 Dec 2015
Typically, it is columns of the problem that are deleted in singular problems, in the sense that each column corresponds to one variable in a linear least squares problem of the form A*x=b.
Deleting a column is equivalent to making that variable zero. Removing a row is equivalent to deleting a data point, or giving it a zero weight.
The use of PINV or LSQR have different issues compared to a backslash or QR solution. On a singular system, you cannot intelligently compute confidence intervals, since that singularity means that some linear combination of the variables is zero. So there will be infinitely many solutions, all of which are equally valid. In that case, a confidence interval is MEANINGLESS. I repeat: if your problem is singular, then you cannot compute confidence intervals.
If the problem is not singular, then there is no need to use tools like PINV, as it will yield the same results as does QR. And while you could compute confidence intervals from a pinv-like solution (for a non-singular problem), they would be the same as what you will get from a QR solution. LSQR, as an iterative scheme, does not offer any way to get confidence intervals directly from the computation, but you could still do it with some additional effort (that is far more work than needed.)
As an example, consider this simple problem, with a matrix A that has a replicate column.
b = randn(100,1);
A = rand(100,3);
A = A(:,[1 2 3 3]);
A\b
Warning: Rank deficient, rank = 3, tol = 1.348639e-13.
ans =
0.0454490954375159
0.27593332891598
-0.279873674791995
0
pinv(A)*b
ans =
0.0454490954375159
0.27593332891598
-0.139936837395998
-0.139936837395997
lsqr(A,b)
lsqr converged at iteration 3 to a solution with relative residual 0.99.
ans =
0.0454490954375158
0.27593332891598
-0.139936837395998
-0.139936837395998
As you can see, pinv both yield the minimum norm solution. But as clearly, since the 3rd and 4th columns of A are identical, we could have chosen any linear combination of coefficients for those last two columns that add up to -0.279873674791995. ANY such linear combination, subject to the constraint
x(3) + x(4) = -0.279873674791995
is equally valid as a solution. A confidence interval has no meaning on this problem. At best, you could compute a confidence interval on the sum of those variables, thus x(3)+x(4).
So if you want to know why FITLM does not use PINV or LSQR, it is because they would be useless for singular problems when it tried to compute those additional pieces of information. As well, PINV will be slower in general for large problems, since an SVD is more computationally intensive. Why throw away CPU cycles when the extra effort will gain you nothing? An efficient algorithm was chosen that has the ability to yield confidence interval estimates on non-singular problems. On singular problems, it gives a warning of that observed singularity, as well (I expect) infinitely wide intervals (or NaNs) for any confidence estimates.
  3 Comments
Isabel Chen
Isabel Chen on 22 Dec 2015
Actually, I'm not sure whether the diagonal entries of pinv(X'X) can be used as a reasonable estimate for the standard errors of the regression coefficients. Any thoughts?
John D'Errico
John D'Errico on 22 Dec 2015
Edited: John D'Errico on 22 Dec 2015
No. That simply is not true that all you need is a standard error estimate. I think you misunderstand what those confidence intervals mean, at least what they would mean in the case of a rank deficient matrix. You are blindly trying to apply formulas without understanding what the formulas mean, and then hoping you can simply replace the inv operator with pinv. This is not a bad thing here. It shows that you are thinking about the problem. However, the substitution of pinv here for inv is not appropriate.
Consider the case that I showed. I'll make it even simpler.
u = randn(100,1);
A = [u,u];
b = u;
Now, can we solve the problem
A*x = b
where x is a 2x1 matrix?
Clearly the problem has no error at all. But what is the solution? The model is effectively:
c1*u + c2*u = u
Can we infer the value of both c1 AND c2 uniquely? Of course not. In fact, they can take on any values at all. You can choose any arbitrary value for c1, which will exactly imply the value of c2. There is absolutely no information available as to their value, only that the sum of those two coefficients is 1.
If you do try to use the classic linear algebra solution for asymptotic confidence intervals around c1 and c2, you would find the matrix is singular. Since it has no inverse, the standard error estimate that you are looking for will be essentially infinite. That reflects the effective lack of knowledge as to the actual value of those parameters.
As to your other question, NO the diagonal values of pinv(A'*A) simply cannot be used for standard errors for a singular problem. If the problem is non-singular, then there is effectively no difference between pinv(A'*A) and inv(A'*A). So of course you got reasonable results for a non-singular problem. But using pinv there for a singular problem will give you completely wrong results. Sorry, but it will give you garbage.
The idea of using the diagonal values of inv(A'*A) (or pinv) makes the most sense when the off-diagonal terms in that matrix are small compared to the diagonal elements. Think of those off-diagonal terms as covariances. When they are large, that means you have significant correlations between coefficients. By assuming the off-diagonal terms are essentially zero, you implicitly assume a completely uncorrelated set of coefficients. That is good for a well-conditioned problem, but is simply wrong here. For a rank deficient matrix, you end up with a perfectly correlated set of parameters!
At best, all that you can learn is information about the uncertainty in the SUM of those two parameters. I did say this above. We cannot use mathematics to infer information about the uncertainty in the parameters when there is no information to be had.

Sign in to comment.


Vidya Viswanathan
Vidya Viswanathan on 22 Dec 2015
The function "fitlm" does support rank-deficient design matrices. When I say it supports, I mean it does not throw an error when the design matrix is rank-deficient. However, when you look at the implementation of the function "fitlm", you observe that the linearly independent rows/ columns are first extracted from the design matrix using QR decomposition and then linear regression is performed on the resultant. This seems logical because the presence of a linearly dependent row/ column does not provide additional information that is required for regression.
For example, assume that your design matrix is of dimensions 13 x 4 with the values of the first row being twice that of the second row, i.e.:
X =
2 58 30 104
1 29 15 52
11 56 8 20
11 31 8 47
7 52 6 33
11 55 9 22
3 71 17 6
1 31 22 44
2 54 18 22
21 47 4 26
1 40 23 34
11 66 9 12
10 68 8 12
Further, discard the first row of X (linearly dependent) and store the resultant 12 x 4 matrix as X1, i.e:
X1 =
1 29 15 52
11 56 8 20
11 31 8 47
7 52 6 33
11 55 9 22
3 71 17 6
1 31 22 44
2 54 18 22
21 47 4 26
1 40 23 34
11 66 9 12
10 68 8 12
You can observe that equivalent results (in terms of regression coefficients) are obtained for both X and X1. This goes to prove that the linearly dependent row does not provide any additional information for regression.
  1 Comment
Isabel Chen
Isabel Chen on 22 Dec 2015
Thanks Vidya. A non-square matrix is considered full-rank if rank = min(ncol, nrow). In your example, I'm guessing that removing one row didn't change the rank of the matrix: looks like both X and X1 have rank 4, both full-rank, resulting in the same regression estimates. However, when the design matrix is truly rank-deficient, one gets the following error message:
Warning: Regression design matrix is rank deficient to within machine precision.
> In TermsRegression>TermsRegression.checkDesignRank at 98
In LinearModel.LinearModel>LinearModel.fit at 868
In fitlm at 117
Perhaps from a modeling perspective it is better to report rank-deficiency rather than compute the minimum-norm solution. Is this the reason?

Sign in to comment.

Categories

Find more on Polynomials in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!