findchangepts failing based on vector length and name-value inputs

7 views (last 30 days)
Why does findchangepts have issues finding a solution for lengthy input vectors when "MaxNumChanges" is set to 1, specifically for std and rms statistics? Is there something internal to the minimization procedure that actually changes when "MaxNumChanges" is manually set to 1 compared to omitting it and the function setting it to 1 by default? This issue has been raised previously but I didn't think the question author or answer really got closer to identifying the issue or providing a useful workaround.
My expectation is that these two lines of code would operate identically... but they don't?
  • findchangepts(x,'Statistics','std') - works fine
  • findchangepts(x,'Statistics','std','MaxNumChanges',1) - seems to fail and need to be manually terminated
It is understandable that solution might take longer to resolve when 'MaxNumChanges" is >1, but even for that input, selecting std or rms, or having too long of an input vector seem to make failure to converge much more likely. I don't think that switching to ischange is necessary as was suggested in the previous question that was linked above. Instead, I have found that some of the following steps can help avoid this when the data is appropriate.
Some Workarounds:
  1. Use 'mean' or 'linear' statistics if the data will allow it as they seem less likely to fail
  2. Downsample data to reduce number of points
  3. Limit the search region so there are fewer points. Decide the regions manually or use 'mean' or 'linear' points as seed changepoints if they seem reasonable
Example of issue
% Generate some simple instantaneously changing signals
a = [zeros(1,3e4), ones(1,5e4)]; % Step Function
b = 0.5 + [zeros(1,2e4), ones(1,2e4), zeros(1,2e4), ones(1,2e4)]; % Square wave
% Generate normally distributed noise to add to signal that changes
% alongside mean changes
nsig = 0.01; % 1 standard deviation of noise deviation
noisea = nsig.*[randn(1,3e4), 9*randn(1,5e4)]; % Randomly distributed noise
noiseb = nsig*(2*b).^2.*randn(1,8e4); % Also x9 for upper steps
% Combine to make signals
A = a + noisea; B = b + noiseb;
% Use the default setting to find first changepoint using any statistic
figure; findchangepts(A,'Statistic','std');
figure; findchangepts(B,'Statistic','std'); % Only Finds first jump but works
%% Now adding in 'MaxNumChanges' = 1 for findchangepts using signal A.....
% When using 'mean' or 'linear', it might be a few ms slower than when
% omitting 'MaxNumChanges', but still only around 60 ms for me
figure; findchangepts(A,'Statistic','mean','MaxNumChanges',1);
% When using 'std' or 'rms', both failed to finish computing in over 10
% seconds before manual termination. It only took 60 ms on my machine
% when omitting 'MaxNumChanges'
% figure; findchangepts(A,'Statistic','std','MaxNumChanges',1);
%% Error persists when searching for more changepoints
% Finds all 3 changes and plots in ~100 ms for both linear and mean
figure; findchangepts(B,'Statistic','mean','MaxNumChanges',3);
% Again fails in under 10s for std and rms
% figure; findchangepts(B,'Statistic','std','MaxNumChanges',3);
Example of Applying Workarounds
% Use approximately the same signals but make the changes mean changes
% gradual while keeping the instant jumps in error amplitude
a2 = a; a2(30000:30200) = linspace(0,1,201);
b2 = b; b2ramp = linspace(0.5,1.5,101); % Ramp for gradual b2 changes
b2(2e4:20100) = b2ramp; b2(6e4:60100) = b2ramp;
b2(4e4:40100) = fliplr(b2ramp);
% And add the same noise onto these signals
A2 = a2 + noisea; B2 = b2 + noiseb;
%% Workaround 1 - avoid using std or rms if multiple change points
% This was already shown to work above but prove that it works again for
% B2. The changepoint using 'mean' is about 50 points into the 100 point
% gradual change.
figure; findchangepts(B2,'Statistic','mean','MaxNumChanges',3);
%% Workaround 2 - downsample the number of points
% Downsampling by 100 should be small enough that these std and rms work
Ad2 = downsample(A2,100); Bd2 = downsample(B2,100); % Now only 800 points
% Now there are no issues getting solutions for std or rms for any number
% of 'MaxNumChanges'
figure; findchangepts(Bd2,'Statistic','std','MaxNumChanges',3);
%% Workaround 3 - seed guesses using the mean or linear
% The minimization of 'mean' and 'linear' results in the changepoint
% halfway up the gradual change from the lower to upper values. That would
% not be accurate for the std change but searching the 1000 points
% surrounding the 'mean' changepoint would the search vector to a small
% enough range that there are no issues using 'std'. This method is very
% dependent on the data signal.
Maybe the issue is solely related to the memory available on my computer, but I have experienced this on more than one computer. I am not sure what length vector this starts to become an issue but I have not experienced it so long as my data vector is around 2000 points or smaller. Maybe there is something quirky within the findchangepts function itself? I really have no explanation.

Accepted Answer

Ayush
Ayush on 1 Jan 2024
Edited: Ayush on 1 Jan 2024
I understand that you are facing issue with findchangepts when Statistic method is specified as 'rms' or 'std' with 'MaxNumChanges' being specified. I have answered the earlier issue [https://www.mathworks.com/matlabcentral/answers/788084-function-findchangept-doesn-t-work-when-option-maxnumchanges-in-added], that it was fixed in R2017a release.
The issue which persist now (R2023b) is, it takes huge computation time when statistic method is 'rms' or 'std' and MaxNumChanges is specified for large number of data points.
% this runs in 0.148067 seconds in R2023b
figure; findchangepts(A,'Statistic','std');
% this runs in 283.095514 seconds in R2023b
figure; findchangepts(A,'Statistic','std', 'MaxNumChanges',1);
Thanks,
Ayush
  1 Comment
Emma Farnan
Emma Farnan on 6 Jan 2024
Thanks Ayush,
I suspected that inputting a 'MaxNumChanges' of 1 was not actually crashing the computeration but was merely slowing it down drastically. I couldn't see any reason this would obviously happen based on the math in the More About section. Hopefully this is an issue that can be resolved now that it is officially recognized because I do think it is a really useful function overall.
Cheers,
Emma

Sign in to comment.

More Answers (0)

Products


Release

R2023b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!