Specifying required number of data points in a timetable to calculate the mean using retime

5 views (last 30 days)
Joe Wheeler on 15 Feb 2022
Answered: Seth Furman on 15 Feb 2022
Hi, i was hoping someone might be able to help me solve a problem. I am looking to find a way to retime a timetable of data into daily average values from hourly readings by ensuring that days that have less than 50% of readings (ie not NaN values) are outputted as having an average of NaN. This is to prevent days with insufficient data contributing to the calculation of an index that I am am looking to obtain. I hope that makes sense, any help would be greatly appreciated. Thanks in advance

Benjamin Thompson on 15 Feb 2022
Posting sample data will help in the Community giving you a good answer. One possibility is using the hours, minutes, or seconds functions to convert the data in your timetable to hours (or some units easy to work with), then use the hist function to map the time data into 24-hour long bins. Then each bin having less than your desired number of readings needs to be looked at and used to change those timetable readings to NaN.
See the documentatoin article titled "Dates and Time - Functions" for a full list of functions available for working with time series data in a timetable.

Seth Furman on 15 Feb 2022
We can use a combination of dateshift and rowfun to get daily averages grouped by day with group counts. Then we can set the data to NaN for any rows with group counts less than 12.
fullDayRowTimes = (datetime(2020,1,1):hours(1):datetime(2020,1,3)-hours(1))';
lessThanHalfDayRowTimes = (datetime(2020,1,3):hours(1):datetime(2020,1,3)+hours(5))';
Temp = 100*rand(54,1);
rng default; tt = timetable([fullDayRowTimes; lessThanHalfDayRowTimes], Temp)
tt = 54×1 timetable
Time Temp ____________________ ______ 01-Jan-2020 00:00:00 21.847 01-Jan-2020 01:00:00 59.935 01-Jan-2020 02:00:00 5.6111 01-Jan-2020 03:00:00 78.476 01-Jan-2020 04:00:00 84.589 01-Jan-2020 05:00:00 17.812 01-Jan-2020 06:00:00 60.716 01-Jan-2020 07:00:00 53.379 01-Jan-2020 08:00:00 44.176 01-Jan-2020 09:00:00 30.988 01-Jan-2020 10:00:00 21.089 01-Jan-2020 11:00:00 64.152 01-Jan-2020 12:00:00 7.9935 01-Jan-2020 13:00:00 20.189 01-Jan-2020 14:00:00 35.09 01-Jan-2020 15:00:00 22.135
tt2 = tt;
tt2.Time = dateshift(tt2.Time, "start", "day")
tt2 = 54×1 timetable
Time Temp ___________ ______ 01-Jan-2020 21.847 01-Jan-2020 59.935 01-Jan-2020 5.6111 01-Jan-2020 78.476 01-Jan-2020 84.589 01-Jan-2020 17.812 01-Jan-2020 60.716 01-Jan-2020 53.379 01-Jan-2020 44.176 01-Jan-2020 30.988 01-Jan-2020 21.089 01-Jan-2020 64.152 01-Jan-2020 7.9935 01-Jan-2020 20.189 01-Jan-2020 35.09 01-Jan-2020 22.135
tt2 = rowfun(@mean, tt2, "GroupingVariables", "Time", "OutputVariableNames", tt2.Properties.VariableNames)
tt2 = 3×2 timetable
Time GroupCount Temp ___________ __________ ______ 01-Jan-2020 24 44.892 02-Jan-2020 24 49.495 03-Jan-2020 6 65.427
tt2.Temp(tt2.GroupCount < 12) = NaN
tt2 = 3×2 timetable
Time GroupCount Temp ___________ __________ ______ 01-Jan-2020 24 44.892 02-Jan-2020 24 49.495 03-Jan-2020 6 NaN