I got the wrong answer even my code is right. Please help!

--------------Main program---------------------------------
X = [ones(m, 1), data(:,1)]; % Add a column of ones to x
theta = zeros(2, 1); % initialize fitting parameters
% Some gradient descent settings
iterations = 1500;
alpha = 0.01;
fprintf('\nTesting the cost function ...\n')
% compute and display initial cost
J = computeCost(X, y, theta);
fprintf('With theta = [0 ; 0]\nCost computed = %f\n', J);
fprintf('Expected cost value (approx) 32.07\n');
--------------Calculating function J---------------------------------
function J = computeCost(X, y, theta)
m = length(y);
h = X * theta;
J = (1/(2*m))*(sum(h-y).^2);
end
-------------------Here is my result----------------------------------------
Testing the cost function ...
With theta = [0 ; 0]
Cost computed = 1653.631660------------------> my result
Expected cost value (approx) 32.07 ----------------> this is the right answer
With theta = [-1 ; 2]
Cost computed = 4359.141958------------------->my result
Expected cost value (approx) 54.24----------------->this is the right answer
------------------------------------------------------------

17 Comments

What value should we use for data ?
What is iterations doing for you here? It looks to me as if you are only taking a single step instead of iterating.
@walter Roberson: Yes, I am taking a single step on gradient descent method for machine learning. This method require calculating the cost function J. However, my result for cost function J is wrong so that I cannot move to the next step and I need to figure this out.
@Image Analyst: m is number of training examples.
data is on another script that I took from the file in the machine learning course
We cannot test without those values.
Your theta is 0, so X * theta is 0, so h is 0, so sum((h-y).^2) is the same as sum(y.^2) .
When I asked what are they, I meant their values. I still don't know what they are. Can you attach a script where you define them so we have actual values to work with?
@image Analyst, @walter Roberson: There are my scripts. You can do some tests on it. And then please give me your results
Well, there isn't much we can say other than with that code and those initial values, the cost is what you are finding.
In order to get a result of about 32.07, then you would have to have the relationship that theta(2) is one of the two values
0.61594281439169177418320648558984 - 0.12255202333390524277555822446629*theta(1)
0.81525281578218301504626472390766 - 0.12255202333390524277555822446629*theta(1)
@walter Roberson: you mean Htheta(X), because Htheta(X) = theta(0)+theta(1)*X. By the way, can I have your code for this exe.
I do not know where you are getting that about Htheta . It is not in the code you posted.
The code I used was to stop inside the cost function and do
theta = sym('T', [2 1]);
h = X * theta;
J = (1/(2*m))*(sum(h-y).^2);
double(solve(J==32.07,theta(2)))
Here are the equations that I get from
There is a conflict there. h0(x) is defined in terms of theta_0 and theta_1 but the batch update of theta involves theta_1 and upwards, with theta_0 not existing. This starts to matter because theta is initialized to 0, leading to an h0 calculation equal to 0 instead of a linear combination of x.
There is only 1 variable - X, and the equation h(x) = theta_0*X0 + theta_1*X1+...+theta_n*Xn = tranpose(theta)*X( instead of using for loop, we can use vectorization). The initial setting theta = zeros(2,1) means theta_0=0 and theta_1=0
No, h0(x) is defined in terms of the model theta_0 + theta_1*x_1 at the bottom of page 5. This conflicts with your implementation of h = X * theta in that the model does not multiply theta_0 by any X component.
But with you initializing theta to [0 0] then it is temporarily irrelevant, as you are going to have h0(x) being 0 for all x.
Using theta_0 conflicts with the notation of the definition of J in the middle of page 5, which clearly shows Sigma i = 1 to something, with i being used as the index of some variables. It is a conflict because i = 1 to something tells us that the lowest index is 1, not 0 -- unless, that is, for some hidden reason you are intended to ignore the first x entry, which would be unusual. But if the index is intended to start from 0 then the upper bound would have to be length minus 1, but that is not the case: the upper bound is the length itself. i = 1 to the length uses all of the values in a natural way -- but it does tell us that 1 is the first index, so theta_0 does not exist.
This conflict with the definition of h0 requires clarification from your instructor -- because if you interpret in the natural way that you did, you cannot possibly get the results that they want you to get.
From the lecturer, he assumes that X0 = 1 for all cases( so my code is X = [ones(m, 1), data(:,1)]) , and the reason why I use h = X*theta is that I follow to this guy from youtube, let's see: https://www.youtube.com/watch?v=QHpKxM5Bho0. What he did is exactly like me, however he still got the right answer.
I think we temporarily ignore theta = zeros(2,1). I still have other theta = [-1;2], and I still dont get right answer.
No, the description of the exercise is broken and needs clarification from the person who assigned it to resolve what the indices are.
You are assigning ones for the entire first column, which is a lot more than assigning a 1 as the single first entry.
The X val are:-
X = [
1.00 6.11
1.00 5.53
1.00 8.52
1.00 7.00
1.00 5.86
1.00 8.38
1.00 7.48
1.00 8.58
1.00 6.49
1.00 5.05
1.00 5.71
1.00 14.16
1.00 5.73
1.00 8.41
1.00 5.64
1.00 5.38
1.00 6.37
1.00 5.13
1.00 6.43
1.00 7.07
1.00 6.19
1.00 20.27
1.00 5.49
1.00 6.33
1.00 5.56
1.00 18.95
1.00 12.83
1.00 10.96
1.00 13.18
1.00 22.20
1.00 5.25
1.00 6.59
1.00 9.25
1.00 5.89
1.00 8.21
1.00 7.93
1.00 8.10
1.00 5.61
1.00 12.84
1.00 6.35
1.00 5.41
1.00 6.88
1.00 11.71
1.00 5.77
1.00 7.82
1.00 7.09
1.00 5.07
1.00 5.80
1.00 11.70
1.00 5.54
1.00 7.54
1.00 5.31
1.00 7.42
1.00 7.60
1.00 6.33
1.00 6.36
1.00 6.27
1.00 5.64
1.00 9.31
1.00 9.45
1.00 8.83
1.00 5.18
1.00 21.28
1.00 14.91
1.00 18.96
1.00 7.22
1.00 8.30
1.00 10.24
1.00 5.50
1.00 20.34
1.00 10.14
1.00 7.33
1.00 6.01
1.00 7.23
1.00 5.03
1.00 6.55
1.00 7.54
1.00 5.04
1.00 10.27
1.00 5.11
1.00 5.73
1.00 5.19
1.00 6.36
1.00 9.77
1.00 6.52
1.00 8.52
1.00 9.18
1.00 6.00
1.00 5.52
1.00 5.06
1.00 5.71
1.00 7.64
1.00 5.87
1.00 5.31
1.00 8.29
1.00 13.39
1.00 5.44];
Y:-
Y = [
17.59
9.13
13.66
11.85
6.82
11.89
4.35
12.00
6.60
3.82
3.25
15.51
3.16
7.23
0.72
3.51
5.30
0.56
3.65
5.39
3.14
21.77
4.26
5.19
3.08
22.64
13.50
7.05
14.69
24.15
-1.22
6.00
12.13
1.85
6.54
4.56
4.12
3.39
10.12
5.50
0.56
3.91
5.39
2.44
6.73
1.05
5.13
1.84
8.00
1.02
6.75
1.84
4.29
5.00
1.42
-1.42
2.48
4.60
3.96
5.41
5.17
-0.74
17.93
12.05
17.05
4.89
5.74
7.78
1.02
20.99
6.68
4.03
1.28
3.34
-2.68
0.30
3.88
5.70
6.75
2.06
0.48
0.20
0.68
7.54
5.34
4.24
6.80
0.93
0.15
2.82
1.85
4.30
7.20
1.99
0.14
9.06
0.62];
That is the data we already had.

Sign in to comment.

Answers (1)

Asked:

on 8 Feb 2020

Commented:

on 17 Apr 2020

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!