Removing a Date (Leap Year) from being Averaged

4 views (last 30 days)
Hi,.
I have a code. The data takes place over 10 years. And I need the average for each day of the year: 1-365. I think everything is fine with my code.
My biggest problem are the leap years of 2012 and 2016. I don't even want to know their average. I just want them removed.
t= readtable('SSArcGIS.xls'); % read as table
t.S1 = single(strcmp(t.ComplaintLocation,'S1'));
t.S2 = single(strcmp(t.ComplaintLocation,'S2'));
t.S3 = single(strcmp(t.ComplaintLocation,'S3'));
t.S4 = single(strcmp(t.ComplaintLocation,'S4'));
t.S5 = single(strcmp(t.ComplaintLocation,'S5'));
t.S6 = single(strcmp(t.ComplaintLocation,'S6'));
t.S7 = single(strcmp(t.ComplaintLocation,'S7'));
t.S8 = single(strcmp(t.ComplaintLocation,'S8'));
t.S9 = single(strcmp(t.ComplaintLocation,'S9'));
t.S10 = single(strcmp(t.ComplaintLocation,'S10'));
t.S11 = single(strcmp(t.ComplaintLocation,'S11'));
t.S12 = single(strcmp(t.ComplaintLocation,'S12'));
t.S13 = single(strcmp(t.ComplaintLocation,'S13'));
t.S14 = single(strcmp(t.ComplaintLocation,'S14'));
% convert to date time
t.Date = datetime(t.Date, 'Format', 'MM/dd/yyyy');
% get unique dates
T1=datetime('01/01/2010');
T2=datetime('12/31/2019');
T=T1:T2;
unique_dates = T;
for ii =1: length(unique_dates)
tmp = t(ismember(t.Date, unique_dates(ii)),:);
% get unique dates
data(ii).Date = unique_dates(ii);
data(ii).S1 = sum(tmp.S1);
data(ii).S2 = sum(tmp.S2);
data(ii).S3 = sum(tmp.S3);
data(ii).S4 = sum(tmp.S4);
data(ii).S5 = sum(tmp.S5);
data(ii).S6 = sum(tmp.S6);
data(ii).S7 = sum(tmp.S7);
data(ii).S8 = sum(tmp.S8);
data(ii).S9 = sum(tmp.S9);
data(ii).S10 = sum(tmp.S10);
data(ii).S11 = sum(tmp.S11);
data(ii).S12 = sum(tmp.S12);
data(ii).S13 = sum(tmp.S13);
data(ii).S14 = sum(tmp.S14);
end
% convert structure to table
data = struct2table(data);
temp = data;
temp.day = day(temp.Date,'dayofyear');
temp.Date = [];
% Now use varfun with @mean as your function and Day as your GroupingVariable
dailyAverage = varfun(@sum,temp,'GroupingVariables','day');
temp = data;
temp.Year = year(temp.Date);
temp.Date = [];
Any help will be appreciated.
Thanks

Accepted Answer

Peter Perkins
Peter Perkins on 14 Apr 2020
Not 100% sure what you are trying to end up with, but my guess is the average number of each type of complaint for each day of the year: 365 of them. You are on the right track using 'dayofyear', but there are easier ways to get to that point. First ,read the data into a timetable.
>> tt1 = readtimetable('SSArcGIS.xls');
>> tt1.ComplaintLocation = categorical(tt1.ComplaintLocation);
>> head(tt1)
ans =
8×1 timetable
Date ComplaintLocation
___________ _________________
02-Jan-2010 S13
02-Jan-2010 S11
02-Jan-2010 S5
04-Jan-2010 S5
05-Jan-2010 S13
06-Jan-2010 S12
06-Jan-2010 S5
06-Jan-2010 S3
You have one row per complaint, across all types in one variable. You want one variable for each type. unstack is good at doing that. In this case, you want counts, so you'll need to add a dummy variable to "count up".
>> tt1.Dummy = ones(height(tt1),1);
>> tt2 = unstack(tt1,'Dummy','ComplaintLocation');
>> head(tt2)
ans =
8×14 timetable
Date S1 S10 S11 S12 S13 S14 S2 S3 S4 S5 S6 S7 S8 S9
___________ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___
02-Jan-2010 NaN NaN 1 NaN 1 NaN NaN NaN NaN 1 NaN NaN NaN NaN
04-Jan-2010 NaN NaN NaN NaN NaN NaN NaN NaN NaN 1 NaN NaN NaN NaN
05-Jan-2010 NaN NaN NaN NaN 1 NaN NaN NaN NaN NaN NaN NaN NaN NaN
06-Jan-2010 NaN NaN NaN 1 NaN NaN NaN 1 NaN 1 1 NaN NaN NaN
07-Jan-2010 1 NaN 1 1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
08-Jan-2010 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1 NaN
09-Jan-2010 NaN NaN NaN 1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
13-Jan-2010 NaN NaN NaN NaN NaN 1 NaN NaN NaN NaN NaN NaN 1 NaN
Unstack uses @sum by default to "aggregate", in other words, each date might have three S1 complaints, so unstack would compute sum([1;1;1]), and fill the corresponding element of the output with 3. For empty cases, sum returns NaN. It would be easy to use the AggregationFunction parameter to fill those with 0, but it's even easier to replace the NaNs afterwards.
>> tt2 = fillmissing(tt2,'Constant',0);
>> head(tt2)
ans =
8×14 timetable
Date S1 S10 S11 S12 S13 S14 S2 S3 S4 S5 S6 S7 S8 S9
___________ __ ___ ___ ___ ___ ___ __ __ __ __ __ __ __ __
02-Jan-2010 0 0 1 0 1 0 0 0 0 1 0 0 0 0
04-Jan-2010 0 0 0 0 0 0 0 0 0 1 0 0 0 0
05-Jan-2010 0 0 0 0 1 0 0 0 0 0 0 0 0 0
06-Jan-2010 0 0 0 1 0 0 0 1 0 1 1 0 0 0
07-Jan-2010 1 0 1 1 0 0 0 0 0 0 0 0 0 0
08-Jan-2010 0 0 0 0 0 0 0 0 0 0 0 0 1 0
09-Jan-2010 0 0 0 1 0 0 0 0 0 0 0 0 0 0
13-Jan-2010 0 0 0 0 0 1 0 0 0 0 0 0 1 0
The data skips quite a few days:
>> caldiff(tt1.Date([1 end]),'days')
ans =
calendarDuration
3651d
>> height(tt2)
ans =
3122
One wonders if maybe the complaint center is closed on weekends?
>> unique(day(tt2.Date,'dow'))
ans =
1
2
3
4
5
6
7
Nope, apparently not, so all those missing days must have had no complaints. To get the true mean per day, you'll need to add them in.
>> t = datetime(2010,1,1):caldays(1):datetime(2019,12,31);
>> tt3 = retime(tt2,t,'FillWithConstant','Constant',0);
>> head(tt3)
ans =
8×14 timetable
Date S1 S10 S11 S12 S13 S14 S2 S3 S4 S5 S6 S7 S8 S9
___________ __ ___ ___ ___ ___ ___ __ __ __ __ __ __ __ __
01-Jan-2010 0 0 0 0 0 0 0 0 0 0 0 0 0 0
02-Jan-2010 0 0 1 0 1 0 0 0 0 1 0 0 0 0
03-Jan-2010 0 0 0 0 0 0 0 0 0 0 0 0 0 0
04-Jan-2010 0 0 0 0 0 0 0 0 0 1 0 0 0 0
05-Jan-2010 0 0 0 0 1 0 0 0 0 0 0 0 0 0
06-Jan-2010 0 0 0 1 0 0 0 1 0 1 1 0 0 0
07-Jan-2010 1 0 1 1 0 0 0 0 0 0 0 0 0 0
08-Jan-2010 0 0 0 0 0 0 0 0 0 0 0 0 1 0
>> height(tt3)
ans =
3652
Now we're in business: add day of year as a grouping variable, and compute grouped means. Once you have day of year, the actual dates are irrelevant, so convert to a table.
>> tt3.DoY = day(tt3.Date,'dayofyear');
>> tt4 = timetable2table(tt3,'ConvertRowTimes',false);
>> tt4 = varfun(@mean,tt4,'GroupingVariable','DoY');
>> head(tt4)
ans =
8×16 table
DoY GroupCount mean_S1 mean_S10 mean_S11 mean_S12 mean_S13 mean_S14 mean_S2 mean_S3 mean_S4 mean_S5 mean_S6 mean_S7 mean_S8 mean_S9
___ __________ _______ ________ ________ ________ ________ ________ _______ _______ _______ _______ _______ _______ _______ _______
1 10 0.1 0.1 0 0.4 0.1 0 0.1 0.1 0.1 0.1 0 0.1 0.1 0
2 10 0 0.2 0.4 0.5 0.6 0.2 0 0.4 0.2 0.5 0.1 0.1 0.1 0
3 10 0.6 0.7 0.1 0.7 0.4 0.2 0.3 0.4 0.2 1 0.1 0 0 0
4 10 0.1 0.2 0.2 0.6 0.4 0.2 0.2 0.3 0.2 0.9 0 0.4 0.1 0.1
5 10 0.1 0.2 0.1 0.3 0.3 0.3 0 0.3 0.2 0.3 0.4 0.1 0.4 0.1
6 10 0 0.1 0.4 0.9 0.5 0.2 0.2 0.6 0.1 0.6 0.6 0.1 0.1 0.3
7 10 0.1 0.1 0.4 0.8 0.3 0.3 0.1 0 0 0 0 0 0.2 0
8 10 0.1 0.2 0.2 0.6 0.3 0.1 0.3 0.3 0.1 0.6 0.2 0.1 0.4 0
You could also do that with groupsummary. Leap days? Just remove them.
>> tt3(tt3.DoY==366,:)
ans =
2×15 timetable
Date S1 S10 S11 S12 S13 S14 S2 S3 S4 S5 S6 S7 S8 S9 DoY
___________ __ ___ ___ ___ ___ ___ __ __ __ __ __ __ __ __ ___
31-Dec-2012 0 0 0 0 0 0 0 0 0 4 0 0 1 0 366
31-Dec-2016 0 0 0 0 0 0 0 0 0 0 0 0 0 0 366
>> tt5 = tt4(tt4.DoY~=366,:);
There were a bunch of steps there. Several of them could be combined, but simplicity.
  6 Comments
Adam Danz
Adam Danz on 29 Apr 2020
You can read in the data using readtable() and then convert the table to a timetable using table2timetable().

Sign in to comment.

More Answers (0)

Categories

Find more on Timetables in Help Center and File Exchange

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!