How do I compute the R-square statistic for ROBUSTFIT using Statistics Toolbox 7.0 (R2008b)?

12 views (last 30 days)
The R-square statistic is returned in the 'stats' return argument of REGRESS when I perform linear regression on my data. I wish to compute the corresponding statistic when performing robust regression.

Accepted Answer

MathWorks Support Team
MathWorks Support Team on 14 Dec 2018
R-square is a concept generally associated with least squares. In least squares, R-square is the square of the correlation between the observed values and the fitted values. It may be possible to compute the fitted values from the robust fit and use this correlation definition. A consequence is that R-square for the robust fit will always be less than or equal to R-square for the least squares fit. Since the least squares fit maximizes R-square, so if the robust fit is different, its R-square must be less than the maximum value. Thus, it could be useful in measuring how different are the two fits.
x = (1:10)';
y = 10 - 2*x + randn(10,1);
y(10) = 0;
[b_ls,~,~,~,stats_linreg] = regress(y,[ones(size(x)) x]);
[b_rob, stats_rob] = robustfit(x,y);
scatter(x,y,'filled'); grid on; hold on
plot(x,b_ls(1)+b_ls(2)*x,'r','LineWidth',2);
plot(x,b_rob(1)+b_rob(2)*x,'g','LineWidth',2)
legend('Data','Ordinary Least Squares','Robust Regression')
rquare_linreg = stats_linreg(1)
rsquare_robustfit = corr(y,b_rob(1)+b_rob(2)*x)^2
Additionally, in least squares, R-square is the square of the correlation between the observed values and the fitted values. In least squares, the following also produces R^2:
sse = error sum of squares
ssr = sum of squares of fitted values around their mean
sst = sum of squares of observed values around their mean
sst = sse + ssr
R^2 = 1-sse/sst = 1-sse/(sse+ssr)
The following code demonstrates how one may compute a possible R-square value for the robust fit:
sse = stats_rob.dfe * stats_rob.robust_s^2;
phat = b_rob(1) + b_rob(2)*x;
ssr = norm(phat-mean(phat))^2;
possible_rsquare_robustfit = 1 - sse / (sse + ssr)
  3 Comments
Martin Kahn
Martin Kahn on 6 Mar 2018
Hi Nikhil,
I know this conversation happened a rather long time ago but I justed bumped into the exact same issue, got super confused, and did a quick experiment
. My guess it that you get better RSquared because matlab also uses the weighted datapoints to calculate the error.
Essentially what I did is just run the "fit" command with robust either turned 'on' or 'off' and output the inbuilt rsquared as well as calculated it myself.
For both robust on and off I shuffled the data 500 times and looked at the histogram. As you can see in the histogram, the robust fit doesn't seem to calculate the Rsquared in the same way as the normal one. I might be super wrong here of course.....

Sign in to comment.

More Answers (2)

Victor
Victor on 16 Jan 2019
Hi,
I have the same issue: I can't find the same r^2 from a robust (bisquared) linear fitting weither I'm using the matlab "fit" function (curve fitting toolbox), fitlm (statistics and machine learning toolbox) or if I calculate it myself as r2 = 1-sse/sst, which is also different that corr(y,y_estimated)^2.
This is how I calculate sse, sst and ssr using the weighths outputed by the fitlm function:
[mdl] = fitlm(x,y,'robustOpts','on');
w = mdl.Robust.Weights;
y_estimated = mdl.Coefficients.Estimate(2)*x + mdl.Coefficients.Estimate(1);
sse = sum( w .* (y - y_estimate).^2 ); % Sum of Squares due to Error //// Sum of Squares of residuals
sst = sum( w .* (y - mean(y)).^2 ); % total sum of squares
ssr = sum( w .* (y_estimate-mean(y)).^2); % sum of squares of regression
Note that if I use the following lines (classical linear regression), i get the same r2 from fitlm function and by the above formulas.
[mdl] = fitlm(x,y,'robustOpts','off');
w = ones(size(x));
Confusing.
  3 Comments
Victor
Victor on 12 Apr 2019
I ended up calculating r2 myself. At least I know how it's done and I can use it to compare different fits. I created a small function to that. The third input, w, is optional, and represents the weights of the points. If it s not specified, the weights are set to 1.
Victor
function r2 = computeR2(y,y_estimate,w)
if nargin==2, w = ones(size(y)); end
sse = sum( w .* (y - y_estimate).^2 ); % Sum of Squares due to Error //// Sum of Squares of residuals
sst = sum( w .* (y - mean(y)).^2 ); % total sum of squares
ssr = sum( w .* (y_estimate-mean(y)).^2 ); % sum of squares of regression
r2 = 1-sse/sst;

Sign in to comment.


Shi Chen
Shi Chen on 2 Dec 2020
hello!
actually i was confused their results at first. then i tried to make up what is the robust fit, and you guess what?
the Robust fit give a smaller weight to the outliers, results in a better R-square value. following is the reference website:
https://baike.baidu.com/item/%E7%A8%B3%E5%81%A5%E5%9B%9E%E5%BD%92

Tags

Products


Release

R2008b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!