Break array or timetable into smaller versions and find max value

3 views (last 30 days)
I have a 10000x2 array which can be converted into a timetable. What I want to do is break the timetable into each year and find the maximum value for each year in column 1 and repeat in column 2.
data = randi(100, 10000, 2); %create example data
%Turn it into a timetable
data_timetable = array2timetable(data, 'StartTime', datetime(1990, 10, 1), 'TimeStep', caldays(1));
%Now find max value for each year in each column
I thought of doing something like this
ystart = 1990;
max_vals = [];
for i = 1 :28 %1 to 28 because it goes to 2018
for j = 1:length(data)
if year(data_timetable.Time(j)) == ystart && data(j,1) > max_vals
max_vals = data(j,1);
elseif year(data_timetable.Time(j)) == (ystart+i) && data(j,1) > max_vals
max_vals = [max_vals, data(j,1)];
end
end
end
but I know the logic behind this code is wrong and it also shows the following error
Any suggestions?
EDIT:
data = randi(100, 10000, 2);
data_timetable = array2timetable (data, 'StartTime', datetime(1990, 10,1), 'TimeStep',caldays(1));
y = unique(year(data_timetable.Time));
c = cell(length(y),1);
for i = 1:length(c)
c{i} = data_timetable(year(data_timetable.Time) == y(i),:);
end
I managed to separate the data into a cell containing multiple timetables, but how do I get the maximum values in each timetable and the date when they occur?

Accepted Answer

Voss
Voss on 3 Jan 2023
Edited: Voss on 3 Jan 2023
Here's one way. This finds the first maximum value in each year (for each column of data), and stores the value and the date it occurred.
data = randi(100, 10000, 2);
data_timetable = array2timetable (data, 'StartTime', datetime(1990, 10,1), 'TimeStep',caldays(1));
yy = year(data_timetable.Time); % store the years to avoid having to call year() inside the loop
y = unique(yy);
Ny = numel(y);
c = cell(Ny,1);
Ncols = size(data_timetable,2);
max_vals = zeros(Ny,Ncols); % initialize max_vals
max_dates = NaT(Ny,Ncols); % initialize the dates when max_vals occur
for i = 1:Ny
c{i} = data_timetable(yy == y(i),:);
[max_vals(i,:),idx] = max(c{i}{:,:},[],1); % max() of contents of c{i}, along 1st dimension, outputting index as well as max value
max_dates(i,:) = c{i}.Time(idx); % date of maximum value is c{i}.Time(idx)
end
max_vals
max_vals = 29×2
100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100
max_dates
max_dates = 29×2 datetime array
21-Nov-1990 15-Oct-1990 20-Jan-1991 08-Jan-1991 01-Jan-1992 29-Jan-1992 21-Mar-1993 20-May-1993 18-Jan-1994 31-Mar-1994 20-Mar-1995 22-Mar-1995 22-Sep-1996 03-Jul-1996 18-May-1997 11-Jan-1997 01-Feb-1998 24-Jan-1998 10-Apr-1999 01-Apr-1999 03-May-2000 14-Apr-2000 11-Jan-2001 22-Aug-2001 22-May-2002 01-Jun-2002 12-Jun-2003 01-Feb-2003 17-Jan-2004 21-Jan-2004 28-Jan-2005 03-Nov-2005 21-Jan-2006 13-Mar-2006 08-Feb-2007 10-Jan-2007 03-Aug-2008 30-Aug-2008 20-Mar-2009 03-Mar-2009 16-May-2010 14-Apr-2010 09-May-2011 25-Jan-2011 14-Feb-2012 18-Apr-2012 25-Mar-2013 19-Feb-2013 09-Apr-2014 19-Jan-2014 16-Jan-2015 30-Jan-2015 02-Mar-2016 23-Mar-2016 04-Jan-2017 23-May-2017 05-Feb-2018 12-Jan-2018
% plot the data and maxima:
plot(data_timetable.Time,data_timetable{:,:})
hold on
plot(max_dates(:,1),max_vals(:,1),'mo',max_dates(:,2),max_vals(:,2),'ro')
% zoom in on a portion, so we can see the max values:
xlim(data_timetable.Time([401 1000]))
ylim([90 105])
  9 Comments
Voss
Voss on 12 Jan 2023
You're welcome!
"Do you have to initialise max_vals with NaNs instead of zeros because there is no way to check if c{i} is empty inside the loop otherwise?"
No, you can check if c{i} is empty regardless of how max_vals is initialized.
"Because if you use zeros and an if statement to check for zeros and remove them, this would possibly cause problems within the data as some values might be 0."
That's it exactly!

Sign in to comment.

More Answers (1)

Eric Sofen
Eric Sofen on 4 Jan 2023
Normally, groupsummary or varfun should be the starting point for grouped calculations like this, but your particuar problem is a bit tricky because you also want the datetime when the max occurred. So you need rowfun, similar to the last example in the "Grouped Calculations" example.
data = randi(100, 10000, 2); %create example data
%Turn it into a timetable
data_timetable = array2timetable(data, 'StartTime', datetime(1990, 10, 1), 'TimeStep', caldays(1));
data_timetable.Year = data_timetable.Time.Year; % Need Year as an explicit grouping variable for rowfun
% We want Time as a first-class variable, so convert to table.
data_table = timetable2table(data_timetable);
t1 = rowfun(@findMax, data_table, "GroupingVariable","Year", "InputVariables",["Time","data1"], "OutputVariableNames",["Max1","Max1Time"]);
t2 = rowfun(@findMax, data_table, "GroupingVariable","Year", "InputVariables",["Time","data2"], "OutputVariableNames",["Max2","Max2Time"])
t2 = 29×4 table
Year GroupCount Max2 Max2Time ____ __________ ____ ___________ 1990 92 98 22-Nov-1990 1991 365 100 05-Jul-1991 1992 366 100 05-Oct-1992 1993 365 100 03-Aug-1993 1994 365 100 05-Jan-1994 1995 365 100 20-Apr-1995 1996 366 100 25-Jan-1996 1997 365 100 24-Apr-1997 1998 365 100 11-Apr-1998 1999 365 100 28-Apr-1999 2000 366 100 25-Mar-2000 2001 365 100 22-Aug-2001 2002 365 100 15-May-2002 2003 365 100 02-Sep-2003 2004 366 100 18-Feb-2004 2005 365 99 09-May-2005
t1.GroupCount = [];
t2.GroupCount = [];
tfinal = outerjoin(t1,t2,"Keys","Year","MergeKeys",true)
tfinal = 29×5 table
Year Max1 Max1Time Max2 Max2Time ____ ____ ___________ ____ ___________ 1990 100 12-Nov-1990 98 22-Nov-1990 1991 100 25-Jan-1991 100 05-Jul-1991 1992 100 27-Jun-1992 100 05-Oct-1992 1993 100 19-Jan-1993 100 03-Aug-1993 1994 100 06-Mar-1994 100 05-Jan-1994 1995 100 11-Jan-1995 100 20-Apr-1995 1996 100 04-Feb-1996 100 25-Jan-1996 1997 100 11-Jan-1997 100 24-Apr-1997 1998 100 23-Aug-1998 100 11-Apr-1998 1999 100 18-Jan-1999 100 28-Apr-1999 2000 100 11-Jan-2000 100 25-Mar-2000 2001 100 21-Mar-2001 100 22-Aug-2001 2002 100 10-Feb-2002 100 15-May-2002 2003 100 07-Jan-2003 100 02-Sep-2003 2004 100 11-May-2004 100 18-Feb-2004 2005 100 24-May-2005 99 09-May-2005
plot(data_timetable.Time,data_timetable{:,1:2})
hold on
plot(tfinal.Max1Time,tfinal.Max1,'go',tfinal.Max2Time,tfinal.Max2,'ro')
% zoom in on a portion, so we can see the max values:
xlim(data_timetable.Time([401 1000]))
ylim([90 105])
function [maxVal,maxTime] = findMax(times,vals)
% Return time at which maximum element of vals occurred
[maxVal,maxIndex] = max(vals);
if ~isnan(maxVal)
maxTime = times(maxIndex);
else
maxTime = NaT;
end
end
  1 Comment
Andrei
Andrei on 5 Jan 2023
I guess this solution works too, however I am not familiar at all with groupsummary or varfun so this is unknown territory for me. Also, all of this code has to be inserted in a function and the function to output the maximum values and the dates when they occur.

Sign in to comment.

Products


Release

R2022b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!