Time series, resample, axes, datetime

Anyone to lead me in the right direction please?
I have energy consumption data of more than 3 years for every two seconds (43200 per day). Different households. How do I set the x axis into 24hrs (00:00 - 23:59) and plot each day data on one graph?
I have two variables (power and time). Files are saved as dates and have 43200 data each for power and time. I also have two variables called start_date and end_date. I would like to select any range of dates and get those dates data plotted in one graph. Later I will need to resample the time since, color each day plot, etc. I have attached a figure to explain more/less what im trying to do.
I highly appreciate your assistance in this. Thanks

 Accepted Answer

dpb
dpb on 24 Feb 2015
Edited: dpb on 24 Feb 2015
Use date numbers...example for an axes...
>> dn=datenum(2000,1,1,0,0,[0:2:86400-1].'); % a day at 2-sec interval
>> t=dn-dn(1); % make 0-based
>> p=normpdf(t,0.5,.1); % some data to plot...
>> plot(t,p)
>> set(gca,'xtick',[0:2:24]/24) % set tick marks at 2-hr
>> datetick('x','HH:MM','keeplimits','keepticks') % format the time axis
>> set(gca,'xminortick','on') % minor ticks for odd hrs
>>
For the selection, convert the requested time ranges into date numbers, too, and use the logical operations to select between, inclusive or not, etc., ...
NB: When generating series of date numbers, always use integer multiples of the smaller time granule such as seconds above rather than dividing out the fraction of an hour or day and using floating point deltas. The latter will end up with floating point comparisons that do NOT compare owing to that rounding whereas the rounding will be consistent inside datenum and friends if use the integer deltas. As can be seen in the above example, the functions are "smart enough" to know how to wrap times when building the vector.
In your case, you'll be reading a time vector, presumably as string; simply convert it or ignore it and just build the time history as shown.
The example does not use the newer timeseries class; I don't have it in the version installed here. If choose to use it, some of the machinations are a little different but the ideas are the same. One thing you do have to be aware of is that there's a whole different set of formatting abbreviations between the two.

15 Comments

ccs
ccs on 25 Feb 2015
Edited: ccs on 25 Feb 2015
Thank you so much, I have not tried it yet but this is definitely in line with what Im looking for. Im going to start working on it.
Thanks for taking time.
ccs
ccs on 26 Feb 2015
Edited: dpb on 26 Feb 2015
It is not working. the time vector is already date numbers and at 2 sec interval. I cannot ignore it as i need to automate it for any choosen data range. I also want to reduce the number of data plotted (from 43200 per day to say 2000). This line does not show the hours.
>>datetick('x','HH:MM','keeplimits','keepticks'
dpb
dpb on 26 Feb 2015
Edited: dpb on 26 Feb 2015
What, specifically, does "not working" mean? Show what you did and any errors, what you got and what you expected. A small subset of actual data would be helpful no doubt.
By "ignoring" I meant simply if you do have the data for every 2 seconds and the time column was some funky text format, the same datenumber can be generated as shown as would get from doing the actual conversion. Again, to do any selection, convert to datenumbers and use logical addressing on the value(s) or range of values desired.
Again, it's hard to know where you went wrong without seeing what you actually have tried.
ADDENDUM
The example I gave above does, indeed, show an x axis essentially identical to your example plot (other than also showing '00:00' at the two ends which is simple enough to clean up).
tix=get(gca,'xticklabel'); % retrieve the tick labels
tix(1,:)=blanks(5);
tix(end,:)=tix(1,:); % clear the first/last
set(gca,'xticklabel',tix) % rewrite w/o first/last values
I'm guessing maybe if you have something else showing up you've used the new timeseries class...as noted, since I don't have it the above uses the "date number classic" but if you've got the newer the time abbreviations aren't the same and there are some methods implemented that are supposed to handle the time axes more nearly automagically.
ccs
ccs on 27 Feb 2015
Edited: ccs on 27 Feb 2015
Thanks again for response. What I mean is, I have tried to run your lines first and it doesn't display the x axis as in my example. It only plots a normal distribution curve with the first x axis mark "00:00" and that's it. My other question is the date in the first line how can I replace it with my "startdate" (to be automated)? Below is how i ran it in my program and it displays similar axes without any data plots.
orient landscape;
time=DAQ.time;
power=DAQ.MAIN.power;
dn=datenum(start_date),[0:2:86400-1].');
tn=dn-dn(1);
plot(tn,power(:,1));
set(gca,'xtick',0:2:24/24)
datetick('x','HH:MM','keeplimits','keepticks')
set(gca,'xminortick','on')
print -append -dpsc -noui -r600 'my_pdf.ps'
What if you run my example exactly? It certainly works here w/ R2012b and afaik, should also work in later versions.
What, specifically, was start_date above and just for grins, what does
datestr([dn(1):dn(5)].')
return?
And, it would be interesting to know what
xlim
returns.
I ran your example now, it works :). start_date is a variable that stores entered start date of time range.
>>start_date='2013-06-02';
in your example d
datestr([dn(1):dn(5)].') returns
01-Jan-2000
and xlim
ans =
0 1
How then, can I replace that with my start date, time and power variable? time and power variables are vectors with 43200 each?
Just for more clarity, i have the same startdate and end date running for other data plots and analysis already. if this helps....
OK, now we're "cookin' with gas" as my grandpa used to say when things started going well...(ok, you had to have had to used lump coal first before you can really appreciate the meaning :) )....
Anyway, ok, to use my technique above of generating the time history by two-second intervals you need the full six-vector date vector so the increment is in the seconds (last) position in that vector.
If you look after your code of just the string you'll see
>> dn=datenum([datevec(start_date,'yyyy-mm-dd')],[0:2:86400-1].');
>> dn=datenum(start_date,[0:2:86400-1].');
>> whos dn
Name Size Bytes Class Attributes
dn 1x1 8 double
>>
there's only a single value created as the use of the function datevec inside the argument list by Matlab syntax doesn't return the full six-vector components but simply the first year value; the others are discarded. That's why you only got the one label on the graph.
What I'd do here instead, for computational efficiency since as you say you've got the full data set for every case is not recompute the date numbers for every case from the input; you know already each is 2-second interval beginning at offset zero from the beginning of the file date. So, the question is, as you say, how to get that initial point?
Simply take your start date and convert it only and add to the previous tn array and you're done.
>> start_date
start_date =
2013-06-02
>> dn0=datenum(start_date,'yyyy-mm-dd')
dn0 =
735387
>> datestr(dn0,'dd-mmm-yyyy HH:MM:SS.FFF')
ans =
02-Jun-2013 00:00:00.000
>>
So, the "real" date number for the file is
dn=dn0+tn;
and you can do selection for ranges and so on with that time when you get tStart and tEnd values and convert those to datenumbers. But, for plotting, keep the tn vector as is and you save recomputing the same numbers over and over and over...
Make sense?
Oh noo, I still do not get it! I'm not sure I undaerstand this "Simply take your start date and convert it only and add to the previous tn array and you're done.". What happens to the first line? if i discard it, where will the tn come from?
>>dn=datenum(start_date,[0:2:86400-1].');
>>dn0=datenum(start_date,'yyyy-mm-dd')
>>datestr(dn0,'dd-mmm-yyyy HH:MM:SS.FFF') >>dn=dn0+tn;
>>plot(tn,power(:,1));
>>set(gca,'xtick',0:2:24/24)
>>datetick('x','HH:MM','keeplimits','keepticks')
>>set(gca,'xminortick','on')
dpb
dpb on 27 Feb 2015
Edited: dpb on 28 Feb 2015
OK, what I'm saying is just go ahead and compute one time vector at two second intervals first and keep it around...that was the first step I had originally--
>> dn=datenum(2000,1,1,0,0,[0:2:86400-1].'); % a day at 2-sec interval
>> t=dn-dn(1); % make 0-based
(I see looking back I had used t alone instead of tn, same idea)
The day of year and year here is absolutely immaterial since there are always 24 hr/day, etc., it doesn't matter which one you pick. The key point is that by using datenum to generate the time series you will avoid floating point roundoff when you get to the point of comparing requested dates/times that occurs if you were to use the computed fraction of a day in a colon expression.
Alternatively, you can read the date/time from the file and convert every single time but since you have known fixed times a priori other than the beginning date, it's doing a lot work over and over that can do only once instead. That's just an efficiency thingie with me...
Now, when you process each file, read that start date and as above convert it to a date number. For that file the specific datenumber for the day is that integer date plus the fractional days of the t vector.
When you want to look for a value or range of values, compare to those if the input from the user is a full date string; if it's a range of times within the day then only need the day fractional part.
A specific example of what you're wanting to do could help in expanding a little more code beyond the generalities; as is I don't know what steps precisely to demonstrate.
Hi dpb, I think i understand where the difference between your example code and what i want is. Firstly, I understand your reason for efficiency. I know your t is my tn (i think this line explains why your plot works >>p=normpdf(t,0.5,.1);). I also now get it that the date in _dn_doesn't matter as it labels x axis 24 hours. The problem is I don't see how this code can ever have a connection to plot different "power" values for each corresponding date (for days range selected between start_date and end_date) and plot them over that t(x) axis. I think my explanation for what Im trying to do has not been clear all along.
Let me try again: I have measured power consumption daily data, a column vector save as variable power. Another vector, time for every measurement recorded. I have up to 43200 measurements everyday which means every two seconds (not all data are accurately measured every two seconds, some even missing data for a day). Over 3 years measurements.
My task is to create a timeseries that Im able to select any range of dates and plot their power consumption IN ONE PLOT. Because it is a lot of data I will also need to resample the data to choose the number of data points i want to plot per day. The plot example I gave, plots data for 2 years (some other different data offcourse) but it is exactly similar to what im trying to do.
I PRAY YOU DON'T GET TIRED OF MY CRAZY QUESTIONS. Please help , Im an intern and need to present my progress tomorrow.
...'...task is to create a timeseries that Im able to select any range of dates and plot their power consumption IN ONE PLOT."
OK, my misunderstanding; I thought your intent was to plot the various days on the single plot as multiple lines overlaid to see the typical daily pattern, not as a single time series.
Similar then excepting you simply concatenate the times and data from each day into a long vector/array. You don't say how the date is encoded into the file names, but take whatever format that is and convert them to date numbers for the selection logic of which data you need to read. Or, alternatively, read each file and create a database that is the date number that corresponds to that date to use to select them. If the date stamp on the files corresponds to the actual date you could get clever and use the OS dir() command to select the ones but the TMW-implemented function in Matlab doesn't have all the niceties built into it, unfortunately.
But, a couple of hints...to select a range of dates between a start/end date the idea is to take the two dates from the user and do the selection...
d1=datenum(dStart,'formatStyleForInputDate');
d2=datenum(dEnd,'formatStyleForInputDate');
Presuming you've built the the aforementioned time vector then
idx=iswithin(dn,d1,d2);
wantedData=dataArray(idx,:);
will select those values that are within those ranges. Here, again, iswithin is my helper function
function flg=iswithin(x,lo,hi)
% returns T for values within range of input
% SYNTAX:
% [log] = iswithin(x,lo,hi)
% returns T for x between lo and hi values, inclusive
flg= (x>=lo) & (x<=hi);
that returns the logical addressing vector for those elements matching the condition.
To do the decimation, ask for or otherwise decide on the increment needed and convert that number to the number of two-second intervals. 1-min would obviously be every 30 values. So then the decision is whether you just select the points or averages or max or what...points are easy, simply use the above determined interval as the number in the colon expression and
wantedData=wantedData(1:Interval:end,:);
to select those points by row, all columns.
A neat "trick" in Matlab to do things like average over N terms is, since memory is by row-major storage, you can reshape an array as
wantedDataAvg=reshape(mean(reshape(wantedData,N,[])), ...
size(wantedData,2),[]));
This works by turning the data array into an array of number of rows on length over which want to average, does the average by column and then rearranging that row vector back to the number of original columns where each is now the average over the desired length.
Hopefully those are some clues...unfortunately, we're going to be leaving town tomorrow for a few days so it'll be what can do today until next week sometime for more...maybe somebody else can pick up or if you get stuck on another specific issue, post that as another question.
Good luck; it seems complicated but you'll get the hang of it here...
OK, my misunderstanding; I thought your intent was to plot the various days on the single plot as multiple lines overlaid to see the typical daily pattern, not as a single time series.
Wow...I think I have panicked already or english is also a problem. That is exactly what I intend to do. I don't want to concatenate them
Well, in that case we're back to the previous of "every day's the same" excepting for the base time. So, the selection is only on the data files, basically, and there the simple way would be to process all the dates once building a date number time series that spans the overall time of the available data and use that to look up the actual data.
We're heading out; I think if you just take the basic pieces we've talked about and start will get there..

Sign in to comment.

More Answers (0)

Asked:

ccs
on 24 Feb 2015

Commented:

dpb
on 2 Mar 2015

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!