Specifying required number of data points in a timetable to calculate the mean using retime
5 views (last 30 days)
Hi, i was hoping someone might be able to help me solve a problem. I am looking to find a way to retime a timetable of data into daily average values from hourly readings by ensuring that days that have less than 50% of readings (ie not NaN values) are outputted as having an average of NaN. This is to prevent days with insufficient data contributing to the calculation of an index that I am am looking to obtain. I hope that makes sense, any help would be greatly appreciated. Thanks in advance
Benjamin Thompson on 15 Feb 2022
Posting sample data will help in the Community giving you a good answer. One possibility is using the hours, minutes, or seconds functions to convert the data in your timetable to hours (or some units easy to work with), then use the hist function to map the time data into 24-hour long bins. Then each bin having less than your desired number of readings needs to be looked at and used to change those timetable readings to NaN.
See the documentatoin article titled "Dates and Time - Functions" for a full list of functions available for working with time series data in a timetable.
Seth Furman on 15 Feb 2022
We can use a combination of dateshift and rowfun to get daily averages grouped by day with group counts. Then we can set the data to NaN for any rows with group counts less than 12.
fullDayRowTimes = (datetime(2020,1,1):hours(1):datetime(2020,1,3)-hours(1))';
lessThanHalfDayRowTimes = (datetime(2020,1,3):hours(1):datetime(2020,1,3)+hours(5))';
Temp = 100*rand(54,1);
rng default; tt = timetable([fullDayRowTimes; lessThanHalfDayRowTimes], Temp)
tt2 = tt;
tt2.Time = dateshift(tt2.Time, "start", "day")
tt2 = rowfun(@mean, tt2, "GroupingVariables", "Time", "OutputVariableNames", tt2.Properties.VariableNames)
tt2.Temp(tt2.GroupCount < 12) = NaN