Calculating moving standard deviation (bounds) for time series
5 views (last 30 days)
Show older comments
I have a N x 6 matrix and of interest are columns 1 and 5. Column 1 is time data in Excel's serial date format (e.g. 4.264400001156250e+04 = 01-Oct-2016 00:00:00) and Column 5 is Measurement data.
The time data spans across M days and has interval in seconds (not regular), so the value of N is quite large (~40,000).
What I would like to do is this: For a Measurement data on day X, calculate the standard deviation(σ) for that day based on the Measurement data for past Y days. How can I script this out? I can do this in Excel, but due to the sheer number of rows, it tends to get somewhat slow.
2 Comments
John Chilleri
on 18 Jan 2017
Edited: John Chilleri
on 18 Jan 2017
Would you mind attaching the data file?
I'd prefer checking my code after running it with your data to ensure its correctness.
Also, do you want the resulting standard deviations stored as their own column-vector variable, written to a csv file, or printed to the command window?
Or if you'd prefer, I can outline the process of how to code this and let you do it yourself!
Answers (1)
John Chilleri
on 19 Jan 2017
Hello,
I created the following code which should do as you wanted. An explanation of the code is below.
%
% function standard_deviations = pickfcnname(data, Y)
%
% Inputs:
%
% data N x 6 matrix where column 1 is time, column 5 is measurements
% Y Number of days (if available) to use in std calculation
%
% Outputs:
%
% stdevs column of length N containing associated standard deviations
%
function stdevs = pickfcnname(data, Y)
%
% Initialize
N = size(data,1);
stdevs = zeros(N,1);
%
% Note: the first few standard deviations will be computed from few points
% and will thus be weird.
stdevs(1) = 0; % NaN probably more truthful
stdevs(2) = 0;
%
Dates = floor(data(:,1));
Values = data(:,5);
for i = 3:N
%
% Acquire starting point
if (Dates(1) < Dates(i)-Y)
j = 1;
while (Dates(j) ~= Dates(i)-Y)
j = j+1;
end
lbound = j;
else
lbound = 1;
end
%
% Compute standard deviation from (starting point:previous point)
stdevs(i) = std(Values(lbound:i-1));
end
end
Explanation:
The code takes in the N x 6 data and an integer Y number of days (i.e. Y = 3 uses the past 3 days to compute the standard deviation).
The code sets the first two standard deviations to 0 as there aren't enough data points to compute a standard deviation. You should also recognize that the first few points after that have standard deviations made from few points.
The code then loops from the third to final data point, setting a lower bound lbound at Y days prior if possible, 1 otherwise (for initial stds). The standard deviations are then computed using the Y days ago to previous point's measurements and stored in the output vector stdevs, which has the same number of rows as your input data.
Hope this helps!
2 Comments
John Chilleri
on 23 Jan 2017
Edited: John Chilleri
on 23 Jan 2017
Yes, you can definitely start at 3 days delayed, which would require a check for which point indicates the fourth day.
As it is now, it computes the standard deviation based on what's available, so on day 3 it uses days 1 and 2 only since those are all that are available.
If you want to make it so no standard deviations are computed until day 4, we can change that pretty easily:
start = 1;
while (Dates(start) < Dates(1)+3)
start = start + 1;
end
for i = start:N
% etc
The above finds the first data point corresponding to day 4 and labels that as start, where the for loop can begin.
I believe that would do the job!
See Also
Categories
Find more on Spreadsheets in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!