Repeating a time series from yyy-MM-dd to yyyy-MM-dd-hh-mm

I have a matrix where the first column is a time series with dates down to one minute resolution, in the format: yyyy-MM-dd-hh-mm. And in the columns next to it I want to add several types of measured data.
The problem is that some of this data has a lower resolution, for example only measured once a day in the format: yyyy-MM-dd.
What I want to do is insert this data in the high resolution matrix and having the daily values repeated across all hours and minutes of the correct day so that there are no gaps between the values.
Example: I have carbon dioxide data from 2011 measured daily. I want the measured value for 2011-01-01 to be repeated in the bigger matrix all the way from 2011-01-01-00-00 to 2011-01-01-23-59 where then 2011-01-02 starts and so on..
I imagine this shouldn’t be too complicated to do but I’m having trouble figuring out how to do it. Any suggestions?
Thanks.

 Accepted Answer

Actually, for this I would use date vectors. You can then use ismember with the 'rows' option to locate the matching rows of minute and daily data (using only the year month day columns of the date vectors):
%demo data, yours does not overlap:
DailyDatenum = [734504:734868; 1:365]';
MinuteDatenum = [734502:1/24/60:734870]';
Dailydv = datevec(DailyDatenum(:, 1));
Minutedv = datevec(MinuteDatenum(:, 1));
[ispresent, row] = ismember(Minutedv(:, 1:3), Dailydv(:, 1:3), 'rows');
MinuteDatenum(ispresent, 2) = DailyDatenum(row(ispresent), 2);

6 Comments

Oh I just realized that the minute time series I attached was for 2000 instead of 2011, that was a mistake! But when I changed that this method worked beautifully and did just what I had hoped, thank you so much! However I have two additional questions:
1. I also have more data that I want to fit in the same time series, some of it is in really strange intervals like every 7 min for a while and then a gap for a few days and then 15 min intervals and so on… really random stuff. Since this method breaks down the time so elegantly in date vectors, could I use it to automatically fit that data in the same way and fill in the gaps between the data? And what would I have to change in that case, would:
ismember(Minutedv(:, 1:5), Dailydv(:, 1:5), 'rows')
do the same thing but also include hours and minutes? Or is there something else I should adjust?
2. Could I use this method but instead of having it repeat the data linearly interpolate between the data gaps? Like in the code Star Strider suggested, but with the simplicity of using date vectors?
For 1., I'm not sure I understand what you want to do. Do you just want to fill missing data with whatever previous value is available?
2. For interpolating, you probably need a different method.
Ok, maybe I will drop the interpolation part then. But what I mean by my first point is simply that I have several sets of data all measured during 2011, and there is very often gaps in the data.
What I want to do is place the measured data, whenever it is possible, in the correct time position (according to when it was measured) in the big time series (the one referred to as MinuteDatenum here) and then where there are gaps – fill them in with the closest existing value in the series.
Do you understand what I mean? Or would it be easier if I attached a .mat file with the values?
That's a lot more complicated than your initial question. You can use ismember on the first 5 columns of the date vectors to insert your randomly spaced data into the minute data, but that won't fill the gaps. To find the beginnings and ends of gaps, a combination of diff and find should do the trick. You can then fill them using interpolation of the last value.
%assuming
%randomdv a n*6 matrix of date vectors, not necessarily ordered, e.g:
randomdv = [2011 01 01 12 01 00
2011 01 01 12 04 00
2011 01 01 12 05 00
2011 01 01 12 30 00
2011 01 01 12 21 00
2011 01 01 12 20 00];
%randomdata a n*1 vector of corresponding data, e.g:
randomdata = [1 4 5 30 21 20]';
%minutedv a m*6 matrix of date vectors in minute increment, not necessarily ordered, e.g:
minutedv = [repmat([2011 01 01 12], 30, 1) (1:30)'];
filloption = 'interpolate'; %or 'fillwithlast'
minutedata = nan(size(minutedv, 1), 1); %output
[ispresent, row] = ismember(minutedv(:, 1:5), randomdv(:, 1:5), 'rows');
minutedata(ispresent) = randomdata(row(ispresent));
%at this point we just copied randomdata in the corresponding row of minutedata
%find the NaN runs:
nantransitions = find(diff([0; isnan(minutedata); 0]));
runstarts = nantransitions(1:2:end);
runends = nantransitions(2:2:end)-1;
for run = 1:numel(runstarts)
runstart = runstarts(run);
runend = runends(run);
switch filloption
case 'fillwithlast'
filldata = minutedata(runstart - 1);
case 'interpolate'
filldata = linspace(minutedata(runstart-1), minutedata(runend+1), runend-runstart+3);
filldata = filldata(2:end-1);
otherwise
error('fill method not defined');
end
minutedata(runstart:runend) = filldata;
end
What an amazing answer, thank you so much this is something I never would have figured out myself!
The code works great when I use your demo data, but when I try it with my real data it only works until the point where you have written “at this point we just copied..”, when I run the loop after that it gives me the error ”Attempted to access minutedata(0); index must be a positive integer or logical” .
This is because the first element in my nantransitions is 1 (which makes runstarts(1)=1 ) and therefore filldata = minutedata(runstart - 1) becomes impossible.
Maybe this is very easy to resolve but it’s a bit difficult for me to figure out how I should tweak the code to avoid this because I don’t fully understand what the whole nantransition part is about. Do you have any suggestions in mind about what to do when the first element of nantransitions =1?
Yes, I should have mentioned that the code assumes that there's no missing data at the beginning (for 'fillwithlast' and 'interpolate') and the end (for 'interpolate' only) of minutedata. Because, if there is, there's nothing to replicate / interpolate between.
You can deal with these special case anywhere within the for loop, in the case statements for example if you want different behaviour:
case 'fillwithlast'
if runstart == 0
%missing data at the beginning
filldata = minutedata(runend+1); %not sure what you want to do. This will fail if everything is NaN since runend+1 is then past the end
else
filldata = minutedata(runstart - 1);
end
case 'interpolate'
if runstart == 0
filldata = minutedata(runend+1); %maybe?
elseif runend == numel(minutedata)
filldata = minutedata(runstart-1); %maybe?
else
filldata = linspace(minutedata(runstart-1), minutedata(runend+1), runend-runstart+3);
filldata = filldata(2:end-1);
end
nantransitions is a temporary variable to help in finding the beginning and end of each run of NaN. Since isnan returns a vector of 0 and 1, (e.g. [0 0 1 1 1 0 1 1 ]), you know that you go from non-nan to nan when the diff is 1 and from nan to non-nan when the diff is -1. The rest of the diff is always 0. Therefore, you've got a transition whenever the diff is non-zero. I just flank it with 0 to make sure that the first transition is always non-nan to nan and the last nan to non-nan (and to make sure that nan at the beginning or end are caught). Hence odd transitions are non-nan to nan and even transitions are nan to non-nan.

Sign in to comment.

More Answers (0)

Products

Asked:

on 17 Feb 2015

Commented:

on 20 Feb 2015

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!