Need help writing code to attach date (1-365) from 30 years of data to the actual data needed to be processed. Trying to group into months.

4 views (last 30 days)
Hi! So I have a 3 column matrix that is around 800,000 rows in length. The data is attached to the numerical day of the year 1-365 (over a period of around 30 years worth of data, hence the rows). I am trying to write a code that will allow me to group these dates (1-365) to the data that is attached. There is not a specific occurance of dates as some is 15 minute data and some is hourly. It would be fantastic if someone would be able to help me. It would need to group the data by 1-365 and then pull dates January for column 1 where the number is 1-31 ,February 32-59, ect. The end goal is to be able to take monthly averages of this data set that has been provided to me. Please let me know if you need more information.
  1 Comment
Lindsey Nixon
Lindsey Nixon on 7 Jul 2020
I am hoping there is a way to attach a "time" by use of the first column to then group these into monthly values based on what number is located in the first column.

Sign in to comment.

Accepted Answer

Sindar
Sindar on 7 Jul 2020
Edited: Sindar on 8 Jul 2020
edit: final code for this case:
"What I am working with is a 842,382x3 matrix where the first column is a repetition of numbers 1-365 (year 1), 1-365 (year 2), counting up through the years every time 1-365 and then starting again. The columns 2 and 3 are holding data that is directly related to the day of the year.
I am looking to find a way to put the occurance of 1-365 the first time to be (1992) and then the second time it runs through 1-365 be (1993)."
data = my_matrix;
rows = size(my_matrix,1);
% set the date of the first entry
start_year = 1992;
start_month = 1;
start_day = 1;
% find rows where a year ends (where the day # is larger than the next row)
% also, add the last row
idx = [find(diff(data(:,1))<0) ; rows];
% compute the year for each row
% basically, if the first year ends on row-9, then make 9 copies of year=1
% if the next year ends on row-12, add [12-9]=3 copies of year=2, etc.
y = repelem((1:length(idx))',diff([0;idx]));
% visual check: print the day column and the determined year side-by-side
tmp_check = [data(:,1) y]
% create a date-time array from the year and day arrays
t = datetime(start_year-1+y,start_month,start_day-1+data(:,1)');
% extract the month from the date-time
m = month(t);
% create a timetable from your data and the datetimes
TT = array2timetable(data(:,2:3),'RowTimes',t);
% group data by month and take the average
G = groupsummary(TT,'Time','month','mean');
---------------------------------------------------------------------------------
What form does the time stamp data take? Is it in an 800,000 x 1 array? Are the values just the integers 1-365, or is there extra info?
Let's say each row is a different day, with no gaps. You'd use this:
% create some example data
rows = 8e5;
data = rand(rows,3);
% set the date of the first entry
start_year = 1970;
start_month = 1;
start_day = 1;
% create a date-time array counting up from the first day
t = datetime(start_year,start_month,start_day-1+(1:rows));
% create a timetable from your data and the datetimes
TT = array2timetable(data,'RowTimes',t);
% group data by month and take the average by month-year (e.g., Jan 2001, Feb 2001, ..., Jan 2002)
G = groupsummary(TT,'Time','month','mean');
% or, take the average by month name (e.g., Jan, Feb, Mar, ... , Dec)
G = groupsummary(TT,'Time','monthname','mean');
If parts of this aren't quite what you need, ask here and/or check the documentation for datetime (esp. if you have an array of date data), timetables, and groupsummary
  15 Comments
Sindar
Sindar on 8 Jul 2020
Ah, silly error. I didn't think it would matter whether y was a row or column vector. Change:
y = repelem((1:length(idx)),diff([0;idx]));
% visual check: print the day column and the determined year side-by-side
tmp_check = [data(:,1) y']
to
y = repelem((1:length(idx))',diff([0;idx]));
% visual check: print the day column and the determined year side-by-side
tmp_check = [data(:,1) y]
(i.e. move the apostrophe from y' to (1:length(idx))' )
Lindsey Nixon
Lindsey Nixon on 8 Jul 2020
Thank you so very much. It works and is running perfectly. I really appreciate your help with this! I will be able to use it for all of my files with simply minor adjustments to the start year.

Sign in to comment.

More Answers (0)

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!