How to remove outliers before prediction

Dears, I want to predict current End value based current Start values using previous historical data as I have shown below. I am using the below mention code, but I want to remove outlier if the start (or end) value >=0.28 (or if you have some better idea like if R2 is <0.9, and to make it 0.98 or more by removing suitable outliers). Please suggest me how can I remove outlier(s).
data = [0.25 0.256
0.24 0.24
0.29 0.33
0.224 0.24
0.26 0.27
0.24 0.26
0.26 0.31
0.29 0.34];
clc;
clear all;
scatter(data(:, 1), data(:, 2));
polystartend = polyfit(data(:,1), data(:, 2), 1);
todaystart = 21;
todayend = polyval(polystartend, todaystart)
Many many thanks in advance,

Answers (1)

the cyclist
the cyclist on 26 Feb 2015
Edited: the cyclist on 26 Feb 2015
Here is a technical way to remove the outliers based on your suggestion:
removeIdx = any(data >= 0.28,2);
data(removeIdx,:) = [];
The identification of outliers is a rich and complex subject. Iglewicz and Hoaglin have written a 90-page book on the subject.

8 Comments

Sir, Many thanks. If suppose, our R2(Rsquare) is poor, and how can we adjust the data (remove those data(s) which makes R2 worse) and recalculate the todayend. For example one or more data making the R2 (less than 0.9) worse, and how can we make R2 better (close to 0.9~0.999) and recalculate todayend.
Yours is a very, very bad way to think about removing outliers. Just because a data point makes the fit worse does not mean that is a "bad" data point to be ignored. In general, you should only remove outliers if you believe they were actually mismeasured (e.g. due to an equipment failure) or had some other problem.
Well you can just remove all data points except 2 and your line fit will fit the data absolutely perfectly with an R of 1. But what good is that?
Sir, How to get R2 value(s)?
Did you look up correlation in the help? You might find this:
R = corrcoef(x,y)
it giving me like R =
1.0000 0.9337
0.9337 1.0000
I dont understand which one is R2. I am trying to find in google, but did not get suitable explanation
Maybe...
R = corrcoef(data(:,1), data(:,2))
R2 = R(1,2)^2

Sign in to comment.

Asked:

on 26 Feb 2015

Commented:

on 28 Feb 2015

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!