I want to compare fit() using option 'sin1' with the same equation defined using 'fittype'.
35 views (last 30 days)
Show older comments
I have used fit() with the option 'sin1' to fit a noisy set of data that very approximately look like a sine wave. To understand how 'fittype' works, I would like to fit the curve 'a*sin(b*x+c)' using 'fittype'. This should give the same answer. It does if the StartPoint is the answer; it does not if I omit StartPoint or put in some random values.
How do I make 'fittype' robust to a wide range of (noisy) data and different starting points. The standard option using fit() and 'sin1' does this.
I attach some simple sample code.
% I am trying to use the fit function to fit sine and/or cosine curves to a
% cyclic dataset that is very noisy. I am most interested in the frequency
% but the noise often prevents the FFT function from giving the frequency
% that I am interested in.
% This code uses the fit function with the predefined curve 'sin1' to
% calculate the best fit. This works very well. I then try and do the
% same thing using my own curve, 'a*sin(b*x+c)', i.e. the same curve, but I
% fail. I only succeed when the StartPoint is very close to the correct
% solution.
% I feel that I must be doing something wrong, but I cannot work out what
% it is.
xdata = (-19.5:19.5)'; % I think that any evenly spaced data will do
rdata = [-3.8699, -11.9740, -1.0781, 9.4637, ... % Real data from my application
18.2971, 3.7348, -11.5774, 0.3397, 8.4653, 7.7785, -1.4079, -13.4483, -7.6968, ...
5.1592, 6.4323, -8.2318, -14.8538, 0.8788, 10.9452, 5.6580, -9.0037, -10.0607, ...
-0.5338, 8.2230, 3.8553, 1.3426, -1.0028, -1.8683, 6.7254, 10.0907, 3.2068, ...
-7.2804, -4.3917, 6.5188, 5.7636, -3.0115, -13.8272, -12.7043, 5.0029, 9.9405]';
p = polyfit(xdata, rdata, 3); % Subtracting a cubic curve to level the ydata
py = polyval(p, xdata);
ydata = rdata - py;
[f, gof, output] = fit(xdata, ydata, 'sin1'); % Using the standard MATLAB curve, 'sin1'
fprintf(' ''sin1'': f.a: %-6.2f f.b: %-6.2f f.c: %-6.2f gof.sse: %-6.1f\n', ...
f.a1, f.b1, f.c1, (gof.sse/numel(xdata)));
xdata10 = xdata(1):0.1:xdata(end); % Defining extra xdata to give a smoother line
fy = feval(f, xdata10);
mysin = fittype('a*sin(b*x+c)', ... % My attempt to do the same thing
'dependent', {'y'}, 'independent', {'x'}, 'coefficients', {'a','b','c'});
[f2, gof2, output2] = fit(xdata, ydata, mysin, 'StartPoint', [1.0, 1.0, 1.0]);
fprintf(' mysin: f.a: %-6.2f f.b: %-6.2f f.c: %-6.2f gof.sse: %-6.1f\n', ...
f2.a, f2.b, f2.c, (gof2.sse/numel(xdata)));
fy2 = feval(f2, xdata10);
plot(xdata, ydata, 'xb:')
hold on
plot(xdata10, fy, ' m--')
plot(xdata10, fy2, ' r-')
legend('Real data', '''sin1'' fit', 'My fit');
hold off
2 Comments
dpb
on 20 Feb 2023
Edited: dpb
on 20 Feb 2023
xdata = (-19.5:19.5)'; % I think that any evenly spaced data will do
rdata = [-3.8699, -11.9740, -1.0781, 9.4637, ... % Real data from my application
18.2971, 3.7348, -11.5774, 0.3397, 8.4653, 7.7785, -1.4079, -13.4483, -7.6968, ...
5.1592, 6.4323, -8.2318, -14.8538, 0.8788, 10.9452, 5.6580, -9.0037, -10.0607, ...
-0.5338, 8.2230, 3.8553, 1.3426, -1.0028, -1.8683, 6.7254, 10.0907, 3.2068, ...
-7.2804, -4.3917, 6.5188, 5.7636, -3.0115, -13.8272, -12.7043, 5.0029, 9.9405]';
p = polyfit(xdata, rdata, 3); % Subtracting a cubic curve to level the ydata
py = polyval(p, xdata);
ydata = rdata - py;
plot(xdata,rdata)
hold on
plot(xdata,py)
p
The data are symmetric; the polynomial does nothing useful, at least for the sample dataset.
However, just as a minor simplification, MATLAB supplies the equivalent with detrend so one can write the above as
ydata=detrend(rdata,3);
if there are datasets for which it might be helpful.
As for the Q? posed, as the doc notes, sinN models are very sensitive to starting points and the MATLAB builtin solver uses an (AFAICT undocumented) optimization technique internally to get an initial starting point estimate and bounds the coefficients >0 whereas the custom model uses simply a randomized starting point on the [0,1] interval and no bounds on the coefficients.
To reproduce that result, you would also have to use a fit options object and starting points at least reasonably close to those the MATLAB heuristic model uses (which, of course, is undocumented/proprietary).
So, the Q? in return is, if the builtin model produces the desired results, why not use it instead? Other than simply knowing what the builtin algorithm is, which may be of interest and perhaps even needed (to publish?).
I couldn't in a quick look, find anything more about the algorithm TMW uses...whether there's any other information publicly available or not, I don't know.
Accepted Answer
Matt J
on 20 Feb 2023
Edited: Matt J
on 20 Feb 2023
How do I make 'fittype' robust to a wide range of (noisy) data and different starting points. The standard option using fit() and 'sin1' does this.
When you specify 'sin1', the solver knows what kind of fit it is looking for and can choose an appropriate strategy for forming the initial StatPoint. Conversely, when you specify the fittype as a custom model, the burden falls upon you to do that.
Choosing a StartPoint randomly is not a shrewd strategy. You know for example, that
a=(max(ydata)-min(ydata))/2
will be a better choice for a than an arbitrary selection. You can also then evaluate curve at particular x,y points to solve approximately for the other parameters. For example, since Y(0)=a*sin(c), you could do,
Y0=interp1(xdata,ydata,0);
c=asin(Y0/a);
6 Comments
Matt J
on 21 Feb 2023
I am still working on getting a robust value of sp3 (or 'c') that is not overly dependent on the first value of my data.
The method in my comment above does so.
More Answers (0)
See Also
Categories
Find more on Linear and Nonlinear Regression in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!