Main Content

Clean Timetable with Missing, Duplicate, or Nonuniform Times

This example shows how to create a regular timetable from one that has missing, duplicate, or nonuniform times. A timetable is a type of table that associates a time-stamp, or row time, with each row of data. In a regular timetable, the row times are sorted and unique, and differ by the same regular time step.

Also, some toolboxes have functions that work on regularly spaced time series data in the form of numeric arrays. So the example also shows how to export the data from a timetable for use with other functions.

There are a number of issues with row times that can make timetables irregular. The row times can be missing. They can be out of order. They can be duplicates, creating multiple rows with the same time that might have the same or different data. And even when they are present, sorted, and unique, they can differ by time steps of different sizes.

Timetables provide a number of different ways to resolve missing, duplicate, or nonuniform times, and to resample or aggregate data to create regular row times.

  • To find missing row times, use ismissing.

  • To remove missing times and data, use rmmissing.

  • To sort a timetable by its row times, use sortrows.

  • To make a timetable with unique and sorted row times, use unique and retime.

  • To remove duplicate times, specify a vector of unique times and use retime.

  • To make a regular timetable, specify a regular time vector and use retime.

Load Timetable

Load a sample timetable from the MAT-file badTimes that contains weather measurements taken over several hours on June 9, 2016. The timetable TT includes temperature, rainfall, and wind speed measurements taken at irregular times during that day.

load badTimes
TT
TT=12×3 timetable
            Time            Temp    Rain    WindSpeed
    ____________________    ____    ____    _________

    09-Jun-2016 06:01:04      73    0.01       2.3   
    09-Jun-2016 07:59:23      59    0.08       0.9   
    09-Jun-2016 09:53:57      59    0.03       3.4   
    09-Jun-2016 09:53:57      67    0.03       3.4   
    NaT                       56       0         0   
    09-Jun-2016 09:53:57      67    0.03       3.4   
    09-Jun-2016 08:49:10      62    0.01       2.7   
    09-Jun-2016 08:49:10    75.8    0.01       2.7   
    09-Jun-2016 08:49:10      82    0.01       2.7   
    09-Jun-2016 05:03:11    66.2    0.05         3   
    09-Jun-2016 08:49:10    67.2    0.01       2.7   
    09-Jun-2016 04:12:00    58.8     NaN       NaN   

Find and Remove Rows with Missing Row Times

One way to begin is by finding and removing rows that have a NaT, or missing value, as the row time. To find missing values in the vector of row times, use ismissing. The ismissing function returns a logical vector that contains 1 wherever TT.Time has a missing value.

natRowTimes = ismissing(TT.Time)
natRowTimes = 12x1 logical array

   0
   0
   0
   0
   1
   0
   0
   0
   0
   0
      ⋮

To keep only those rows that do not have missing values as row times, index into TT using ~natRowTimes as row indices. Assign those rows to a new timetable, goodRowTimesTT.

goodRowTimesTT = TT(~natRowTimes,:)
goodRowTimesTT=11×3 timetable
            Time            Temp    Rain    WindSpeed
    ____________________    ____    ____    _________

    09-Jun-2016 06:01:04      73    0.01       2.3   
    09-Jun-2016 07:59:23      59    0.08       0.9   
    09-Jun-2016 09:53:57      59    0.03       3.4   
    09-Jun-2016 09:53:57      67    0.03       3.4   
    09-Jun-2016 09:53:57      67    0.03       3.4   
    09-Jun-2016 08:49:10      62    0.01       2.7   
    09-Jun-2016 08:49:10    75.8    0.01       2.7   
    09-Jun-2016 08:49:10      82    0.01       2.7   
    09-Jun-2016 05:03:11    66.2    0.05         3   
    09-Jun-2016 08:49:10    67.2    0.01       2.7   
    09-Jun-2016 04:12:00    58.8     NaN       NaN   

This method removes only the rows that have missing row times. The timetable variables still might have missing data values. For example, the last row of goodRowTimesTT has NaN values for the Rain and Windspeed variables.

Remove Rows with Missing Times and Missing Data

As an alternative, you can remove both missing row times and missing data values at the same time by using the rmmissing function. rmmissing removes any timetable rows that have missing row times, missing data values, or both.

Display the missing row time and missing data values of TT.

TT
TT=12×3 timetable
            Time            Temp    Rain    WindSpeed
    ____________________    ____    ____    _________

    09-Jun-2016 06:01:04      73    0.01       2.3   
    09-Jun-2016 07:59:23      59    0.08       0.9   
    09-Jun-2016 09:53:57      59    0.03       3.4   
    09-Jun-2016 09:53:57      67    0.03       3.4   
    NaT                       56       0         0   
    09-Jun-2016 09:53:57      67    0.03       3.4   
    09-Jun-2016 08:49:10      62    0.01       2.7   
    09-Jun-2016 08:49:10    75.8    0.01       2.7   
    09-Jun-2016 08:49:10      82    0.01       2.7   
    09-Jun-2016 05:03:11    66.2    0.05         3   
    09-Jun-2016 08:49:10    67.2    0.01       2.7   
    09-Jun-2016 04:12:00    58.8     NaN       NaN   

Remove all rows that have missing row times or data values. Assign the remaining rows to the timetable goodValuesTT.

goodValuesTT = rmmissing(TT)
goodValuesTT=10×3 timetable
            Time            Temp    Rain    WindSpeed
    ____________________    ____    ____    _________

    09-Jun-2016 06:01:04      73    0.01       2.3   
    09-Jun-2016 07:59:23      59    0.08       0.9   
    09-Jun-2016 09:53:57      59    0.03       3.4   
    09-Jun-2016 09:53:57      67    0.03       3.4   
    09-Jun-2016 09:53:57      67    0.03       3.4   
    09-Jun-2016 08:49:10      62    0.01       2.7   
    09-Jun-2016 08:49:10    75.8    0.01       2.7   
    09-Jun-2016 08:49:10      82    0.01       2.7   
    09-Jun-2016 05:03:11    66.2    0.05         3   
    09-Jun-2016 08:49:10    67.2    0.01       2.7   

Sort Timetable and Determine if It Is Regular

After dealing with missing values, you can go on to sort your timetable and then determine if the sorted timetable is regular.

To determine if goodValuesTT is already sorted, use the issorted function.

tf = issorted(goodValuesTT)
tf = logical
   0

Since it is not, sort the timetable on its row times by using the sortrows function.

sortedTT = sortrows(goodValuesTT)
sortedTT=10×3 timetable
            Time            Temp    Rain    WindSpeed
    ____________________    ____    ____    _________

    09-Jun-2016 05:03:11    66.2    0.05         3   
    09-Jun-2016 06:01:04      73    0.01       2.3   
    09-Jun-2016 07:59:23      59    0.08       0.9   
    09-Jun-2016 08:49:10      62    0.01       2.7   
    09-Jun-2016 08:49:10    75.8    0.01       2.7   
    09-Jun-2016 08:49:10      82    0.01       2.7   
    09-Jun-2016 08:49:10    67.2    0.01       2.7   
    09-Jun-2016 09:53:57      59    0.03       3.4   
    09-Jun-2016 09:53:57      67    0.03       3.4   
    09-Jun-2016 09:53:57      67    0.03       3.4   

Determine whether sortedTT is regular. A regular timetable has the same time interval between consecutive row times. Even a sorted timetable can have time steps that are not uniform.

tf = isregular(sortedTT)
tf = logical
   0

Since it is not, display the differences between row times.

diff(sortedTT.Time)
ans = 9x1 duration
   00:57:53
   01:58:19
   00:49:47
   00:00:00
   00:00:00
   00:00:00
   01:04:47
   00:00:00
   00:00:00

SInce the row times are sorted, this result shows that some row times are unique and some are duplicates.

Remove Duplicate Rows

Timetables can have duplicate rows. Timetable rows are duplicates if they have the same row times and the same data values. In this example, the last two rows of sortedTT are duplicate rows. (There are other rows in sortedTT that have duplicate row times but differing data values.)

To remove the duplicate rows from sortedTT, use unique. The unique function returns the unique rows and sorts them by their row times.

uniqueRowsTT = unique(sortedTT)
uniqueRowsTT=9×3 timetable
            Time            Temp    Rain    WindSpeed
    ____________________    ____    ____    _________

    09-Jun-2016 05:03:11    66.2    0.05         3   
    09-Jun-2016 06:01:04      73    0.01       2.3   
    09-Jun-2016 07:59:23      59    0.08       0.9   
    09-Jun-2016 08:49:10      62    0.01       2.7   
    09-Jun-2016 08:49:10    67.2    0.01       2.7   
    09-Jun-2016 08:49:10    75.8    0.01       2.7   
    09-Jun-2016 08:49:10      82    0.01       2.7   
    09-Jun-2016 09:53:57      59    0.03       3.4   
    09-Jun-2016 09:53:57      67    0.03       3.4   

Find Rows with Duplicate Times and Different Data

Timetables can have rows with duplicate row times but different data values. In this example, uniqueRowsTT has several rows with the same row times but different values.

Find the rows that have duplicate row times. First, sort the row times and find consecutive times that have no difference between them. Times with no difference between them are the duplicates. Index back into the vector of row times and return a unique set of times that identify the duplicate row times in uniqueRowsTT.

dupTimes = sort(uniqueRowsTT.Time);
tf = (diff(dupTimes) == 0);
dupTimes = dupTimes(tf);
dupTimes = unique(dupTimes)
dupTimes = 2x1 datetime
   09-Jun-2016 08:49:10
   09-Jun-2016 09:53:57

To display the rows with duplicate row times, index into uniqueRowsTT using dupTimes. When you index on times, the output timetable contains all rows with matching row times.

uniqueRowsTT(dupTimes,:)
ans=6×3 timetable
            Time            Temp    Rain    WindSpeed
    ____________________    ____    ____    _________

    09-Jun-2016 08:49:10      62    0.01       2.7   
    09-Jun-2016 08:49:10    67.2    0.01       2.7   
    09-Jun-2016 08:49:10    75.8    0.01       2.7   
    09-Jun-2016 08:49:10      82    0.01       2.7   
    09-Jun-2016 09:53:57      59    0.03       3.4   
    09-Jun-2016 09:53:57      67    0.03       3.4   

Select First and Last Rows with Duplicate Times

When a timetable has rows with duplicate times, you might want to select particular rows and discard the other rows having duplicate times. For example, you can select either the first or the last of the rows with duplicate row times by using the unique and retime functions.

First, create a vector of unique row times from TT by using unique.

uniqueTimes = unique(uniqueRowsTT.Time)
uniqueTimes = 5x1 datetime
   09-Jun-2016 05:03:11
   09-Jun-2016 06:01:04
   09-Jun-2016 07:59:23
   09-Jun-2016 08:49:10
   09-Jun-2016 09:53:57

Select the first row from each set of rows that have duplicate times. To copy data from the first rows, specify the 'firstvalue' method.

firstUniqueRowsTT = retime(uniqueRowsTT,uniqueTimes,'firstvalue')
firstUniqueRowsTT=5×3 timetable
            Time            Temp    Rain    WindSpeed
    ____________________    ____    ____    _________

    09-Jun-2016 05:03:11    66.2    0.05         3   
    09-Jun-2016 06:01:04      73    0.01       2.3   
    09-Jun-2016 07:59:23      59    0.08       0.9   
    09-Jun-2016 08:49:10      62    0.01       2.7   
    09-Jun-2016 09:53:57      59    0.03       3.4   

Select the last rows from each set of rows that have duplicate times. To copy data from the last rows, specify the 'lastvalue' method.

lastUniqueRowsTT = retime(uniqueRowsTT,uniqueTimes,'lastvalue')
lastUniqueRowsTT=5×3 timetable
            Time            Temp    Rain    WindSpeed
    ____________________    ____    ____    _________

    09-Jun-2016 05:03:11    66.2    0.05         3   
    09-Jun-2016 06:01:04      73    0.01       2.3   
    09-Jun-2016 07:59:23      59    0.08       0.9   
    09-Jun-2016 08:49:10      82    0.01       2.7   
    09-Jun-2016 09:53:57      67    0.03       3.4   

As a result, the last two rows of firstUniqueRowsTT and lastUniqueRowsTT have different values in the Temp variable.

Aggregate Data from All Rows with Duplicate Times

Another way to deal with data in the rows having duplicate times is to aggregate or combine the data values in some way. For example, you can calculate the means of several measurements of the same quantity taken at the same time.

Calculate the mean temperature, rainfall, and wind speed for rows with duplicate row times using the retime function.

meanTT = retime(uniqueRowsTT,uniqueTimes,'mean')
meanTT=5×3 timetable
            Time            Temp     Rain    WindSpeed
    ____________________    _____    ____    _________

    09-Jun-2016 05:03:11     66.2    0.05         3   
    09-Jun-2016 06:01:04       73    0.01       2.3   
    09-Jun-2016 07:59:23       59    0.08       0.9   
    09-Jun-2016 08:49:10    71.75    0.01       2.7   
    09-Jun-2016 09:53:57       63    0.03       3.4   

As a result, the last two rows of meanTT have mean temperatures in the Temp variable for the rows with duplicate row times.

Make Timetable Regular

Finally, you can resample data from an irregular timetable to make it regular by using the retime function. For example, you can interpolate the data from meanTT onto a regular hourly time vector. To use linear interpolation, specify 'linear'. Each row time in hourlyTT begins on the hour, and there is a one-hour interval between consecutive row times.

hourlyTT = retime(meanTT,'hourly','linear')
hourlyTT=6×3 timetable
            Time             Temp       Rain      WindSpeed
    ____________________    ______    ________    _________

    09-Jun-2016 05:00:00    65.826      0.0522     3.0385  
    09-Jun-2016 06:00:00    72.875    0.010737     2.3129  
    09-Jun-2016 07:00:00    66.027    0.044867     1.6027  
    09-Jun-2016 08:00:00    59.158    0.079133     0.9223  
    09-Jun-2016 09:00:00    70.287    0.013344     2.8171  
    09-Jun-2016 10:00:00    62.183    0.031868     3.4654  

Instead of using a predefined time step such as 'hourly', you can specify a time step of your own. To specify a time step of 30 minutes, use the 'regular' input argument and the 'TimeStep' name-value argument. You can specify a time step of any size as a duration or calendarDuration value.

regularTT = retime(meanTT,'regular','linear','TimeStep',minutes(30))
regularTT=11×3 timetable
            Time             Temp       Rain      WindSpeed
    ____________________    ______    ________    _________

    09-Jun-2016 05:00:00    65.826      0.0522     3.0385  
    09-Jun-2016 05:30:00     69.35    0.031468     2.6757  
    09-Jun-2016 06:00:00    72.875    0.010737     2.3129  
    09-Jun-2016 06:30:00    69.576    0.027118     1.9576  
    09-Jun-2016 07:00:00    66.027    0.044867     1.6027  
    09-Jun-2016 07:30:00    62.477    0.062616     1.2477  
    09-Jun-2016 08:00:00    59.158    0.079133     0.9223  
    09-Jun-2016 08:30:00    66.841     0.03695      2.007  
    09-Jun-2016 09:00:00    70.287    0.013344     2.8171  
    09-Jun-2016 09:30:00    66.235    0.022606     3.1412  
    09-Jun-2016 10:00:00    62.183    0.031868     3.4654  

Extract Regular Timetable Data into Array

You can export the timetable data for use with functions to analyze data that is regularly spaced in time. For example, the Econometrics Toolbox™ and the Signal Processing Toolbox™ have functions you can use for further analysis on regularly spaced data.

Extract the timetable data as an array. You can use the Variables property to return the data as an array, as long as the table variables have data types that allow them to be concatenated.

A = regularTT.Variables
A = 11×3

   65.8260    0.0522    3.0385
   69.3504    0.0315    2.6757
   72.8747    0.0107    2.3129
   69.5764    0.0271    1.9576
   66.0266    0.0449    1.6027
   62.4768    0.0626    1.2477
   59.1579    0.0791    0.9223
   66.8412    0.0370    2.0070
   70.2868    0.0133    2.8171
   66.2348    0.0226    3.1412
      ⋮

regularTT.Variables is equivalent to using curly brace syntax, regularTT{:,:}, to access the data in the timetable variables.

A2 = regularTT{:,:}
A2 = 11×3

   65.8260    0.0522    3.0385
   69.3504    0.0315    2.6757
   72.8747    0.0107    2.3129
   69.5764    0.0271    1.9576
   66.0266    0.0449    1.6027
   62.4768    0.0626    1.2477
   59.1579    0.0791    0.9223
   66.8412    0.0370    2.0070
   70.2868    0.0133    2.8171
   66.2348    0.0226    3.1412
      ⋮

See Also

| | | | | | | | |

Related Topics