Time series, resample, axes, datetime

Question

0 votes

Anyone to lead me in the right direction please?

I have energy consumption data of more than 3 years for every two seconds (43200 per day). Different households. How do I set the x axis into 24hrs (00:00 - 23:59) and plot each day data on one graph?

I have two variables (power and time). Files are saved as dates and have 43200 data each for power and time. I also have two variables called start_date and end_date. I would like to select any range of dates and get those dates data plotted in one graph. Later I will need to resample the time since, color each day plot, etc. I have attached a figure to explain more/less what im trying to do.

I highly appreciate your assistance in this. Thanks

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

dpb on 24 Feb 2015

Edited: dpb on 24 Feb 2015

Open in MATLAB Online

0 votes

Use date numbers...example for an axes...

>> dn=datenum(2000,1,1,0,0,[0:2:86400-1].');  % a day at 2-sec interval
>> t=dn-dn(1);                                  % make 0-based
>> p=normpdf(t,0.5,.1);                         % some data to plot...
>> plot(t,p)
>> set(gca,'xtick',[0:2:24]/24)                 % set tick marks at 2-hr
>> datetick('x','HH:MM','keeplimits','keepticks') % format the time axis
>> set(gca,'xminortick','on')                   % minor ticks for odd hrs
>>

For the selection, convert the requested time ranges into date numbers, too, and use the logical operations to select between, inclusive or not, etc., ...

NB: When generating series of date numbers, always use integer multiples of the smaller time granule such as seconds above rather than dividing out the fraction of an hour or day and using floating point deltas. The latter will end up with floating point comparisons that do NOT compare owing to that rounding whereas the rounding will be consistent inside datenum and friends if use the integer deltas. As can be seen in the above example, the functions are "smart enough" to know how to wrap times when building the vector.

In your case, you'll be reading a time vector, presumably as string; simply convert it or ignore it and just build the time history as shown.

The example does not use the newer timeseries class; I don't have it in the version installed here. If choose to use it, some of the machinations are a little different but the ideas are the same. One thing you do have to be aware of is that there's a whole different set of formatting abbreviations between the two.

15 Comments
Show 13 older comments Hide 13 older comments

dpb on 26 Feb 2015

Edited: dpb on 26 Feb 2015

Open in MATLAB Online

What, specifically, does "not working" mean? Show what you did and any errors, what you got and what you expected. A small subset of actual data would be helpful no doubt.

By "ignoring" I meant simply if you do have the data for every 2 seconds and the time column was some funky text format, the same datenumber can be generated as shown as would get from doing the actual conversion. Again, to do any selection, convert to datenumbers and use logical addressing on the value(s) or range of values desired.

Again, it's hard to know where you went wrong without seeing what you actually have tried.

ADDENDUM

The example I gave above does, indeed, show an x axis essentially identical to your example plot (other than also showing '00:00' at the two ends which is simple enough to clean up).

tix=get(gca,'xticklabel');  % retrieve the tick labels
tix(1,:)=blanks(5);
tix(end,:)=tix(1,:);        % clear the first/last
set(gca,'xticklabel',tix)   % rewrite w/o first/last values

I'm guessing maybe if you have something else showing up you've used the new timeseries class...as noted, since I don't have it the above uses the "date number classic" but if you've got the newer the time abbreviations aren't the same and there are some methods implemented that are supposed to handle the time axes more nearly automagically.

dpb on 27 Feb 2015

Open in MATLAB Online

OK, now we're "cookin' with gas" as my grandpa used to say when things started going well...(ok, you had to have had to used lump coal first before you can really appreciate the meaning :) )....

Anyway, ok, to use my technique above of generating the time history by two-second intervals you need the full six-vector date vector so the increment is in the seconds (last) position in that vector.

If you look after your code of just the string you'll see

>> dn=datenum([datevec(start_date,'yyyy-mm-dd')],[0:2:86400-1].');
>> dn=datenum(start_date,[0:2:86400-1].');
>> whos dn
Name      Size            Bytes  Class     Attributes
dn        1x1                 8  double

>>

there's only a single value created as the use of the function datevec inside the argument list by Matlab syntax doesn't return the full six-vector components but simply the first year value; the others are discarded. That's why you only got the one label on the graph.

What I'd do here instead, for computational efficiency since as you say you've got the full data set for every case is not recompute the date numbers for every case from the input; you know already each is 2-second interval beginning at offset zero from the beginning of the file date. So, the question is, as you say, how to get that initial point?

Simply take your start date and convert it only and add to the previous tn array and you're done.

>> start_date
start_date =
2013-06-02
>> dn0=datenum(start_date,'yyyy-mm-dd')
dn0 =
    735387
>> datestr(dn0,'dd-mmm-yyyy HH:MM:SS.FFF')
ans =
02-Jun-2013 00:00:00.000
>>

So, the "real" date number for the file is

dn=dn0+tn;

and you can do selection for ranges and so on with that time when you get tStart and tEnd values and convert those to datenumbers. But, for plotting, keep the tn vector as is and you save recomputing the same numbers over and over and over...

Make sense?

dpb on 27 Feb 2015

Edited: dpb on 28 Feb 2015

Open in MATLAB Online

OK, what I'm saying is just go ahead and compute one time vector at two second intervals first and keep it around...that was the first step I had originally--

>> dn=datenum(2000,1,1,0,0,[0:2:86400-1].');  % a day at 2-sec interval
>> t=dn-dn(1);                                  % make 0-based

(I see looking back I had used t alone instead of tn, same idea)

The day of year and year here is absolutely immaterial since there are always 24 hr/day, etc., it doesn't matter which one you pick. The key point is that by using datenum to generate the time series you will avoid floating point roundoff when you get to the point of comparing requested dates/times that occurs if you were to use the computed fraction of a day in a colon expression.

Alternatively, you can read the date/time from the file and convert every single time but since you have known fixed times a priori other than the beginning date, it's doing a lot work over and over that can do only once instead. That's just an efficiency thingie with me...

Now, when you process each file, read that start date and as above convert it to a date number. For that file the specific datenumber for the day is that integer date plus the fractional days of the t vector.

When you want to look for a value or range of values, compare to those if the input from the user is a full date string; if it's a range of times within the day then only need the day fractional part.

A specific example of what you're wanting to do could help in expanding a little more code beyond the generalities; as is I don't know what steps precisely to demonstrate.

ccs on 2 Mar 2015

Hi dpb, I think i understand where the difference between your example code and what i want is. Firstly, I understand your reason for efficiency. I know your t is my tn (i think this line explains why your plot works >>p=normpdf(t,0.5,.1);). I also now get it that the date in _dn_doesn't matter as it labels x axis 24 hours. The problem is I don't see how this code can ever have a connection to plot different "power" values for each corresponding date (for days range selected between start_date and end_date) and plot them over that t(x) axis. I think my explanation for what Im trying to do has not been clear all along.

Let me try again: I have measured power consumption daily data, a column vector save as variable power. Another vector, time for every measurement recorded. I have up to 43200 measurements everyday which means every two seconds (not all data are accurately measured every two seconds, some even missing data for a day). Over 3 years measurements.

My task is to create a timeseries that Im able to select any range of dates and plot their power consumption IN ONE PLOT. Because it is a lot of data I will also need to resample the data to choose the number of data points i want to plot per day. The plot example I gave, plots data for 2 years (some other different data offcourse) but it is exactly similar to what im trying to do.

I PRAY YOU DON'T GET TIRED OF MY CRAZY QUESTIONS. Please help , Im an intern and need to present my progress tomorrow.

dpb on 2 Mar 2015

Open in MATLAB Online

...'...task is to create a timeseries that Im able to select any range of dates and plot their power consumption IN ONE PLOT."

OK, my misunderstanding; I thought your intent was to plot the various days on the single plot as multiple lines overlaid to see the typical daily pattern, not as a single time series.

Similar then excepting you simply concatenate the times and data from each day into a long vector/array. You don't say how the date is encoded into the file names, but take whatever format that is and convert them to date numbers for the selection logic of which data you need to read. Or, alternatively, read each file and create a database that is the date number that corresponds to that date to use to select them. If the date stamp on the files corresponds to the actual date you could get clever and use the OS dir() command to select the ones but the TMW-implemented function in Matlab doesn't have all the niceties built into it, unfortunately.

But, a couple of hints...to select a range of dates between a start/end date the idea is to take the two dates from the user and do the selection...

d1=datenum(dStart,'formatStyleForInputDate');
d2=datenum(dEnd,'formatStyleForInputDate');

Presuming you've built the the aforementioned time vector then

idx=iswithin(dn,d1,d2);
wantedData=dataArray(idx,:);

will select those values that are within those ranges. Here, again, iswithin is my helper function

function flg=iswithin(x,lo,hi)
% returns T for values within range of input
% SYNTAX:
%  [log] = iswithin(x,lo,hi)
%      returns T for x between lo and hi values, inclusive
flg= (x>=lo) & (x<=hi);

that returns the logical addressing vector for those elements matching the condition.

To do the decimation, ask for or otherwise decide on the increment needed and convert that number to the number of two-second intervals. 1-min would obviously be every 30 values. So then the decision is whether you just select the points or averages or max or what...points are easy, simply use the above determined interval as the number in the colon expression and

wantedData=wantedData(1:Interval:end,:);

to select those points by row, all columns.

A neat "trick" in Matlab to do things like average over N terms is, since memory is by row-major storage, you can reshape an array as

wantedDataAvg=reshape(mean(reshape(wantedData,N,[])), ...
         size(wantedData,2),[]));

This works by turning the data array into an array of number of rows on length over which want to average, does the average by column and then rearranging that row vector back to the number of original columns where each is now the average over the desired length.

Hopefully those are some clues...unfortunately, we're going to be leaving town tomorrow for a few days so it'll be what can do today until next week sometime for more...maybe somebody else can pick up or if you get stuck on another specific issue, post that as another question.

Good luck; it seems complicated but you'll get the hang of it here...

ccs on 2 Mar 2015

OK, my misunderstanding; I thought your intent was to plot the various days on the single plot as multiple lines overlaid to see the typical daily pattern, not as a single time series.

Wow...I think I have panicked already or english is also a problem. That is exactly what I intend to do. I don't want to concatenate them

dpb on 2 Mar 2015

Well, in that case we're back to the previous of "every day's the same" excepting for the base time. So, the selection is only on the data files, basically, and there the simple way would be to process all the dates once building a date number time series that spans the overall time of the available data and use that to look up the actual data.

We're heading out; I think if you just take the basic pieces we've talked about and start will get there..

Sign in to comment.

Time series, resample, axes, datetime

0 Comments
Show -2 older comments Hide -2 older comments

Accepted Answer

15 Comments
Show 13 older comments Hide 13 older comments

More Answers (0)

Categories

Tags

Community Treasure Hunt

Time series, resample, axes, datetime

0 Comments Show -2 older comments Hide -2 older comments

Accepted Answer

15 Comments Show 13 older comments Hide 13 older comments

More Answers (0)

Categories

Tags

See Also

Community Treasure Hunt

0 Comments
Show -2 older comments Hide -2 older comments

15 Comments
Show 13 older comments Hide 13 older comments