# How to fit a line through this data "smartly" ?

11 views (last 30 days)
Luca Amerio on 13 Jan 2019
Commented: Image Analyst on 13 Jan 2019
I have the x-y data shown in the figure below (for the curious, it is the logarithm of the amplitude of a Hilbert transform). I'm trying to write an algorithm to automatically get the slope of the initial portion of the plot, i.e. before the noise overcomes the signal.
What's causing the task to be even more challenging, are the small saw-tooth jumps that you can see in the zoomed-in portion on the right.
I tried several approaches, but could not find a reliable way to automatically fit a line only of the left-hand side linear portion.
Do you have any suggestion?
PS: I know it would help, but I don't (and cannot have) the curve fitting toolbox.
Luca Amerio on 13 Jan 2019
The data should now be attached.
Yes, the data will always start with the sloping portion as this is the logarithm of the magnitude of the hilbert transform of a decay, so the initial portion will always be the one with the highest S/N ratio

Star Strider on 13 Jan 2019
Edited: Star Strider on 13 Jan 2019
One approach (using the Signal Processing Toolbox medfilt1 function):
X = D.X;
Y = D.Y;
Ts = mean(diff(X));
YF = medfilt1(Y, 2500);
gt0 = find(dYdX > -0.001, 1, 'first');
b = polyfit(X(1:gt0),Y(1:gt0),1);
Yf = polyval(b, X(1:gt0));
figure
plot(X, Y)
hold on
plot(X(1:gt0), Yf,, ':r', 'LineWidth',2)
hold off
text(10, -5, sprintf('y = %6.3f x %6.3f', b))
This code does a median filter (medfilt1) on ‘Y’, then uses the gradient function to calculate the derivative. It then uses a relatively unsophisticated threshold (the find call) to determine the end of the relatively linear section of the data, does a linear regression on that section, and plots the result.
I cannot determine how well it will work with any other data you have, so you will likely need to experiment with it. EDIT — (13 Jan 2019 at 18:05)
EDIT #2 — (13 jan 2019 at 18:50)
Another option, using the Signal Processing Toolbox findchangepts function:
X = D.X;
Y = D.Y;
ipt = findchangepts(X, 'Statistic','std');
b = polyfit(X(1:ipt), Y(1:ipt), 1);
Yf = polyval(b, X(1:ipt));
figure
plot(X, Y)
hold on
plot(X(1:ipt), Yf, ':r', 'LineWidth',2)
hold off
text(10, -5, sprintf('y = %6.3f x %6.3f', b))
This may be more robust. It produces similar statistics and plot.
Image Analyst on 13 Jan 2019
You said in your comment to me that you're getting a new curve from the server every 10 minutes. When you run an experiment there are some parameters you want to keep constant, and others that you allow to vary. Have you thought about whether, theoretically, it is better to compute the slope between fixed limits, or whether you want to allow the right-most index of the fitting data to vary in a way that depends on the variable data itself? I suppose arguments could be made either way, depending on what the overall situation is (which we don't know but you do).

John D'Errico on 13 Jan 2019
Edited: John D'Errico on 13 Jan 2019
A serious problem is your data also starts with a strongly nonlinear portion. You essentially need to trim it at both ends. I'd suggest the trick is to use a Savitsky-Golay style filter, combined with a simple call to median. Median will implicitly trim those ends for you, and because we use a filter of a fairly short span, that allows median to do some smoothing too. And the linear fits are short enough they will give you sufficient points inside that quasi-linear region.
The step is constant at 0.002.
dx = 0.002;
nfilt = 20;
M = [dx*(0:nfilt-1)',ones(nfilt,1)];
Mp = pinv(M);
slopeFilter = Mp(1,:);
allSlopes = conv(Y,slopeFilter,'valid');
histogram(allSlopes,1000) linearSegmentSlope = median(allSlopes)
linearSegmentSlope =
0.15569
As you see, the histogram has a very high spike, as we would expect. A simple median will give us a good estimate now. You could change the filter length, and the result should be fairly robust to that value, because the essentially linear segment isquite long.
Luca Amerio on 13 Jan 2019
I'm really impressed by your solution. Would you a link to understand a bit better what "a Savitsky-Golay style filter, combined with a simple call to median" is and what is that M matrix you compute in your code?
At the moment it looks like black magic to me!

Image Analyst on 13 Jan 2019
I plotted the data then it looked like the squirrely stuff stopped around element 2300 or so, and started to get fairly linear after that. So I found the slopes from element 2300 to the end of the data and plotted it.
fontSize = 15;
x = s.X;
y = s.Y;
subplot(1, 2, 1);
plot(x, y, 'b-');
grid on;
title('Original Data', 'FontSize', fontSize);
xlabel('x', 'FontSize', fontSize);
ylabel('y', 'FontSize', fontSize);
startingElement = 2310;
for k = 2310 : 100 : length(x) - 1
thisX = x((startingElement-10):k);
thisY = y((startingElement-10):k);
coefficients = polyfit(thisX, thisY, 1);
slope(k) = coefficients(1);
end
slope(1) = slope(2);
subplot(1, 2, 2);
plot(x2, y2, 'b-');
grid on;
caption = sprintf('Slope from element %d to x', startingElement);
title(caption, 'FontSize', fontSize);
xlabel('x', 'FontSize', fontSize);
ylabel('Slope', 'FontSize', fontSize); As you can see there is sort of a flat place for slopes, where they're fairly constant, but not quite. So it kind of becomes a judgment call as to what slope you want to use, or where you want to stipulate that the slopes are starting to move away from a "constant" value. There is no clear winner - it's not like the slopes are totally constant and then abruptly switch to a different value. There is more of a gradual change of slope as you progress into the noisy area. Since there is no clear winner, you can just pick wherever you want to stop. How about around x=30? Anything wrong with that value of -0.165?
Luca Amerio on 13 Jan 2019
Absolutelly not. Nothing wrong with x=30.
I know there is a certain degree of arbitrariness in this. Since this will need to be repeated on a server automatically every 10 minutes, however, I was trying to get something robust that does not neet human intervention with ginput or "looking at the data".
I've to be honest saying that I'm a bit surprised by the change in slope in your second plot from 10 to 15 and back to 20. Looking at the data I can't really tell why fitting data up to 15 is different from fitting it to 10 or 20. The data seams pretty linear in that portion...
I will investigate this
Thank you for the help anyway

R2017b

### Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!