Fitting data in second order polynomial

4 views (last 30 days)
hi,
I am trying to solve this question.
A=[0 0 1 ; 1 1 1 ; 4 2 1 ; 9 3 1]
b=[0;2;3;2]
The solution in the book shows that:
B=A'*A
a=B/(A'*b)
which gives us the 3 required values of a1,a2 and a3.
I dont how is it done. All I know is that to solve matrix equation like: AX=B we use X=inverse of A * B
But in this case we cant take inverse of A as its not a square matrix. How come we find the solution using: B=A'*A
and then a=B/(A'*b)
Also when the book is plotting,it is using the following code
My 2nd question is that why only one column is used for plotting? I mean why its written
plot(A(:,2),b,'o')
thanks

Accepted Answer

John D'Errico
John D'Errico on 30 Mar 2018
Edited: John D'Errico on 30 Mar 2018
There is a lot here to say. I'll try to do so in a fairly intuitive way, while going a bit light on rigor. Assume you have a problem of the form:
A*x = y
where A is a non-square matrix (with more rows than columns), so the problem is overdetermined. That just means you have more equations than variables to estimate. It also means you will essentially never find an exact solution. All that you can really hope to find is a solution that minimizes the sum of squares of the residuals.
A glib way of looking at this is if you start with the problem
A*x = y
then you might try to multiply both sides of the equation by the transpose of A. That would yield the related system:
A'*A*x = A'*y
But if we recognize that A'*A is a square matrix, then we might see that now we can solve for x simply.
(A'*A)*x = A'*y
So since we understand how to solve square matrix problems, we can solve for x as
x = (A'*A)\(A'*y)
Some might want to write this as
x = inv(A'*A)*(A'*y)
While neither form is really a good idea numerically, you can see at least some intuition for the solution. What that does not do is imply the result would be the optimal least squares vector x. We will be able to compute x, but why would that vector be "good" in any way? Is there a better way to understand where this equation comes from? By the way, that last form is often called the Normal equations, at least in some circles.
Can we find an intelligent way to motivate that form, in terms of regression? Lets start with the expression:
norm(A*x-y)
we would like to find an expression that minimizes the above norm. Remember that this vector norm essentially takes the residual vector, thus
R = A*x-y
The norm of R squares the elements, takes the sum of those squares, and then takes the sqrt. If we ignore the sqrt, since minimizing the sqrt happens for the same vector x as minimizing the square of that expression. But using a dot product, we can write the square of the norm as:
(A*x-y)'*(A*x-y)
Expand that
x'*A'*A*x - x'*A'*y - y'*A*x + y'*y
Now, we need to look at these two terms (x'*A'*y) and (y'*A*x). They are both scalars, but the transpose of each other. But still, both are scalars. (I recall this bothered me when I first learned about the normal equations, long ago, in a galaxy far far away. But just remember that you can safely transpose a scalar and change nothing.)
What is the transpose of a term that is a product of terms?
(y'*A*x)' = x'*A'*y
You just reverse the sequence, and transpose each sub-element in the product. So we can transpose one of them, and they combine perfectly.
x'*A'*A*x - 2*y'*A*x + y'*y
This is called a quadratic form. Think of it as sort of a quadratic polynomial, but using matrices. How can we find the minimum of a quadratic? We differentiate, set to zero, and solve. Amazingly, this actually works for matrices! We can indeed differentiate with respect to the vector x. The result will be
2*A'*A*x - 2*A'*y = 0
We can arbitrarily divide by 2, but then we see the result is the same as we got before, just by multiplying by the matrix A'.
A'*A*x = A'*y
So we did do something reasonable before, even though there was no intelligent motivation behind it at the time. Here we see that it gives us a least squares solution. Now we can solve for x, using the same square matrix solvers that you already understand.
The only problem is the above form is actually a really bad way to solve the problem, in terms of linear algebra.
DRAT! So I just spent how much time deriving a bad way to solve this problem? Well, you did ask where it comes from, so I'll blame it all on you.
In fact, I am often wont to rant and rave on this site about how people should NOT use that solution! I point out that books teach it, and never seem to tell you that it is a bad idea. Effectively the Normal equations are a bad meme that propagates itself. Student learns it from teacher. Student graduates, becomes a teacher, maybe even writes a book, teaching the same bad thing. It never ends.
The problem is it squares the condition number of A. And if A is at all poorly conditioned as a matrix, then A'*A is way worse. I know that may not make a lot of sense right now, but it is important. And there are much better ways to solve this problem, and one of them is built into the guts of backslash.
Better in MATLAB is to simply use backslash. (There are multiple good alternatives in MATLAB, like LSQMINNORM, PINV, LSQR, REGRESS, etc. So don't use the Normal equations. And PLEASE DON'T TEACH THEM TO OTHERS EITHER.) Thus
x = A\y
This works because the nice people who built MATLAB for you to use, they understand the linear algebra involved, they know how to solve that problem efficiently and accurately. So part of me wants to tell you that hey, backslash was intelligently written to solve the problem in a good way. Well, it does that. Trust me. ;-)
Do I need to explain how backslash works? Or why the normal equations are actually bad numerically? Its not that hard to explain, but I'm lazy as hell some days. Or, can I stop here, as long as you accept that the correct solution in MATLAB is x=A\y.
  3 Comments
Steven Lord
Steven Lord on 30 Mar 2018
And if you are interested in a little more in depth explanation, read through the Linear Equations chapter in Numerical Computing with MATLAB, written by Cleve Moler (the first of the "nice people who built MATLAB for you to use".) It talks about such considerations as pivoting, roundoff, and condition numbers.
John D'Errico
John D'Errico on 30 Mar 2018
Yes. I used very broad strokes to paint this picture. Cleve will have done a much better job.

Sign in to comment.

More Answers (1)

Walter Roberson
Walter Roberson on 29 Mar 2018
A\b does the least squares approximation even for non-square A.
I have no idea why they are doing those things with A'*A and A'*b . They don't even work, because the sizes are wrong. If you use B\(A'*b) then you get an answer, but that is because you effectively multiply and then divide by A' with a net result of A\b except that you have introduced unnecessary numeric error.
The plotting in the code must be for a different situation, one in which linear coefficients are being looked for. Notice that a(1)*x+a(2) is for a straight line, not a quadratic.
  3 Comments
Walter Roberson
Walter Roberson on 29 Mar 2018
A\b does the least squares approximation even for non-square A.
John D'Errico
John D'Errico on 30 Mar 2018
Give me a minute. I can explain all.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!