Data Cleaner
Preprocess and organize columnoriented data
Open the Data Cleaner App
MATLAB Toolstrip: On the Apps tab, under MATLAB, click the Data Cleaner icon .
MATLAB command window: Enter
dataCleaner
.
Examples
Parameters
Clean Missing DataSelect indicators
— Values to treat as missing
Use only standard indicators
 Specify nonstandard indicators
Select one of these values to specify the missingvalue indicators:
Indicators  Indicator Parameters  Description 

Use only standard indicators  Not applicable  Use only standard indicators to detect missing values. Standard missing values depend on the data type:

Specify nonstandard indicators  Indicators  Inside single quotes, list nonstandard indicator values to treat as missing, separated by commas. For example, '–99, "N/A" ' 
Cleaning method
— Method for handling missing data
Fill missing
 Remove missing
Select one of these method values and, if necessary, additional method parameters to specify how to handle missing data:
Method  Method Parameters  Description 

Fill missing  Max gap to fill  Fill missing values. Gaps in data larger than this specified value are not
filled (positive scalar). See the Fill method
parameter. 
Units  Fill missing values. Specify the gap size unit type.  
Remove missing  Not applicable  Remove data rows with missing entries. 
Fill method
— Method for replacing missing data
Constant value
 Previous value
 Next value
 ...
Select one of these method values and, if necessary, additional method parameters to specify how to fill missing data:
Method  Method Parameters  Description 

Constant value  Constant value  Use a constant scalar value. 
Previous value  Not applicable  Use the previous nonmissing value. 
Next value  Not applicable  Use the next nonmissing value. 
Nearest value  Not applicable  Use the nearest nonmissing value. 
Linear interpolation  Not applicable  Use the linear interpolation of neighboring, nonmissing values. 
Spline interpolation  Not applicable  Use the piecewise cubic spline interpolation. 
Shapepreserving cubic interpolation
(PCHIP)  Not applicable  Use the shapepreserving piecewise cubic spline interpolation. 
Modified Akima cubic interpolation  Not applicable  Use the modified Akima cubic Hermite interpolation. 
Moving median  Moving window type  Center or asymmetrically align the moving window about the current element. 
Window length  Specify the length of the moving window (positive scalar).  
Right half window length (if moving window type is
Asymmetric )  Specify the number of window units after the current element to define the window alignment (positive scalar).  
Units  Specify the moving window unit type.  
Moving mean  Moving window type  Center or asymmetrically align the moving window about the current element. 
Window length  Specify the length of the moving window (positive scalar).  
Right half window length (if moving window type is
Asymmetric )  Specify the number of window units after the current element to define the window alignment (positive scalar).  
Units  Specify the moving window unit type. 
Cleaning method
— Method for handling outlier data
Fill outliers
 Remove outliers
Select one of these method values to specify how to handle outlier data:
Method  Description 

Fill outliers  Fill outlier values. See the Fill method
parameter. 
Remove outliers  Remove data rows with outlier values. 
Fill method
— Method for replacing outlier data
Constant value
 Center value
 Clip to threshold value
 ...
Select one of these method values to specify the fill method for replacing outlier data:
Method  Description 

Constant value  Use the specified constant scalar value. 
Center value  Use the center value determined by the find
method. 
Clip to threshold value  Use the lower threshold value for elements less than the lower threshold
determined by the find method. Use the upper threshold value
for elements greater than the upper threshold determined by the
find method. 
Previous value  Use the previous nonoutlier value. 
Next value  Use the next nonoutlier value. 
Nearest value  Use the nearest nonoutlier value. 
Linear interpolation  Use the linear interpolation of neighboring, nonoutlier values. 
Spline interpolation  Use the piecewise cubic spline interpolation. 
Shapepreserving cubic interpolation
(PCHIP)  Use the shapepreserving piecewise cubic spline interpolation. 
Modified Akima cubic interpolation  Use the modified Akima cubic Hermite interpolation. 
Detection method
— Method for identifying outlier data
Median
 Mean
 Quartiles
 ...
Select one of these method values and additional method parameters to specify the detection method for identifying outlier data:
Method  Method Parameters  Description 

Median  Threshold factor  Outliers are defined as elements more than the specified threshold of
scaled median absolute deviations (MAD) from the median. For input data
A , the scaled MAD is defined as
c*median(abs(Amedian(A))) , where
c=1/(sqrt(2)*erfcinv(3/2)) . 
Mean  Threshold factor  Outliers are defined as elements more than the specified threshold of
standard deviations from the mean. This method is faster but less robust than
Median . 
Quartiles  Threshold factor  Outliers are defined as elements more than the specified threshold of interquartile ranges above the upper quartile (75 percent) or below the lower quartile (25 percent). This method is useful when the input data is not normally distributed. 
Grubbs  Threshold factor  Outliers are detected using Grubbs’ test, which removes one outlier per iteration based on hypothesis testing. This method assumes that the input data is normally distributed. 
Generalized extreme studentized deviate
(GESD)  Threshold factor  Outliers are detected using the generalized extreme studentized deviate
test for outliers. This iterative method is similar to
Grubbs but can perform better when multiple
outliers are masking each other. 
Moving median  Threshold factor  Outliers are defined as elements more than the specified threshold of local scaled MAD from the local median over a specified window. 
Moving window type  Center or asymmetrically align the moving window about the current element.  
Window length  Specify the length of the moving window (positive scalar).  
Right half window length (if moving window type is
Asymmetric )  Specify the number of window units after the current element to define the window alignment (positive scalar).  
Units  Specify the moving window unit type.  
Moving mean  Threshold factor  Outliers are defined as elements more than the specified threshold of local standard deviations from the local mean over a specified window. 
Moving window type  Center or asymmetrically align the moving window about the current element.  
Window length  Specify the length of the moving window (positive scalar).  
Right half window length (if moving window type is
Asymmetric )  Specify the number of window units after the current element to define the window alignment (positive scalar).  
Units  Specify the moving window unit type.  
Percentiles  Lower threshold  Outliers are defined as elements outside of the percentile range specified by an upper and lower threshold. 
Upper threshold  Outliers are defined as elements outside of the percentile range specified by an upper and lower threshold. 
Normalization method
— Method for normalizing data
Zscore
 Norm
 Range
 ...
Select one of these method values and, if necessary, additional method parameters to specify the method for normalizing data:
Method  Method Parameters  Description 

Zscore  Zscore type  Center and scale to have mean 0 and standard deviation 1 by
specifying Center
and scale to have median of 0 and median
absolute deviation 1 by specifying 
Norm  PNorm  Scale data by pnorm (positive scalar or Inf
for infinity norm). 
Range  Left limit  Rescale
range of data with left and right range limits to an interval of the form
[a b] , where a < b . 
Right limit  Rescale
range of data with left and right range limits to an interval of the form
[a b] , where a < b .  
Median IQR  Not applicable  Center and scale data to have median 0 and interquartile range 1. 
Center  Center Type  Center to have mean 0 by subtracting the mean from the input data
with Center to have median 0 by
subtracting the median from the input data with
Shift center by the specified
numeric value with 
Scale  Scale type  Scale data by standard deviation with Scale data by median
absolute deviation with Scale data by the first element of the
data with Scale data by
interquartile range with Scale data by dividing by the specified
numeric factor (positive scalar) with 
Center and scale  Center Type  Center to have mean 0 by subtracting the mean from the input data
with Center to have median 0 by
subtracting the median from the input data with
Shift center by the specified
numeric value with 
Scale type  Scale data by standard deviation with Scale data by median
absolute deviation with Scale data by the first element of the
data with Scale data by
interquartile range with Scale data by dividing by the specified
numeric factor (positive scalar) with 
Smoothing method
— Method for smoothing noisy data
Moving mean
 Moving median
 Gaussian filter
 ...
Select one of these method values to specify the smoothing method for noisy data:
Method  Description 

Moving mean  Use the moving average. This method is useful for reducing periodic trends in data. 
Moving median  Use the moving median. This method is useful for reducing periodic trends in data when outliers are present. 
Gaussian filter  Use the Gaussianweighted moving average. 
Local linear regression (Lowess)  Use linear regression. This method can be computationally expensive, but it results in fewer discontinuities. 
Local quadratic regression (Loess)  Use quadratic regression. This method is slightly more computationally expensive than local linear regression. 
Robust Lowess  Use robust linear regression. This method is a more computationally expensive version of local linear regression, but it is more robust to outliers. 
Robust Loess  Use robust quadratic regression. This method is a more computationally expensive version of local quadratic regression, but it is more robust to outliers. 
SavitzkyGolay polynomial filter  Use the SavitzkyGolay polynomial filter, which smooths according to a specified polynomial degree and is fitted over each window. This method can be more effective than other methods when the data varies rapidly. 
Smoothing parameter
— Options for data smoothing
Smoothing factor
 Moving window
Select one of these parameter values and additional parameter options to specify the options for data smoothing:
Parameter  Parameter Options  Description 

Smoothing factor  Smoothing factor  Specify the amount of smoothing (positive scalar). 
Moving window  Moving window type  Center or asymmetrically align the moving window about the current element. 
Window length  Specify the length of the moving window (positive scalar).  
Right half window length (if moving window type is
Asymmetric )  Specify the number of window units after the current element to define the window alignment (positive scalar).  
Units  Specify the moving window unit type. 
Selection method
— Method for specifying row times
Time step
 Sample rate
Select one of these method values and additional method parameters to specify the selection method for retimed row times:
Method  Method Parameters  Description 

Time step  Time step  Specify the length of time between consecutive regularly spaced row times in the output table (positive scalar). 
Time step units  Specify the time step units.  
Sample rate  Sample rate  Specify the number of samples in the output table per unit of time (positive scalar). 
Sample rate units  Specify the sample rate units. 
Method
— Method for retiming
Fill with missing
 Fill with constant
 Fill with previous value
 ...
Select one of these method values to specify the retiming method:
Method  Description 

Fill with missing  Use the missing data indicators (for example, NaN for
numeric variables). 
Fill with constant  Use the specified constant value. The default value is 0. 
Fill with previous value  Copy data from the nearest preceding neighbor in the input timetable, proceeding from the end of the vector of row times. If there are duplicate row times, then use the last of the duplicates. 
Fill with next value  Copy data from the nearest following neighbor in the input timetable, proceeding from the beginning of the vector of row times. If there are duplicate row times, then use the first of the duplicates. 
Fill with nearest value  Copy data from the nearest neighbor in the input timetable. 
Linear interpolation  Use linear interpolation. 
Spline interpolation  Use piecewise cubic spline interpolation. 
Shapepreserving cubic interpolation
(PCHIP)  Use shapepreserving piecewise cubic interpolation. 
Modified Akima cubic interpolation  Use modified Akima cubic Hermite interpolation. 
Sum  Use the sum of the values in each time bin. 
Mean  Use the mean of the values in each time bin. 
Product  Use the product of the values in each time bin. 
Minimum  Use the minimum of the values in each time bin. 
Maximum  Use the maximum of the values in each time bin. 
Number of values  Use the number of values in each time bin. 
First value in bin  Use the first value in each time bin. 
Last value in bin  Use the last value in each time bin. 
Custom  Use the function specified by the function handle. 
Variables to stack
— Variables to combine
table variables
Select one or more table variables to combine.
Names of new table variables
— Variable containing the names of new table variables
table variable
Select a table variable containing the names of the new table variables.
Values in new table variables
— Variable to unstack into multiple variables
table variable
Select a table variable to unstack into multiple table variables.
Group by
— Variables that define groups of rows
table variables
Select one or more table variables to define groups of rows.
Aggregator for new table variable values
— Function to aggregate data values into a single value
Sum
 Mean
 Median
 ...
Select one of these values to specify the function to aggregate data values into a single value:
Function  Description 

Sum  Use sum of each group of values. 
Mean  Use the mean of each group of values. 
Median  Use the median of each group of values. 
Mode  Use the mode of each group of values. 
Maximum  Use the maximum of each group of values. 
Minimum  Use the minimum of each group of values. 
First  Use the first value of each group of values. 
Unique  Use the number of unique values in each group of values. 
Count  Use the number of values in each group of values. 
Custom  Use the function specified by the function handle. 
Tips
To interactively sort by a data variable, access the sorting options by clicking the arrow in the variable header in the Data tab. The sorting appears as a step in the Cleaning Steps panel.
To interactively rename a variable from the data, doubleclick the variable name in the Variables panel. The renaming appears as a step in the Cleaning Steps panel.
To interactively remove a variable from the data, rightclick the variable name in the Variables panel and select Delete. The removal appears as a step in the Cleaning Steps panel.
To alter previously performed cleaning steps, perform one of these actions:
View or edit cleaning parameters by clicking a step in the Cleaning Steps panel.
Change the order in which cleaning steps are performed by dragging a step to a new location in the Cleaning Steps panel.
Disable cleaning steps by clearing a cleaning step or rightclicking a step and selecting Disable Steps Below in the Cleaning Steps panel.
To view only the input data or cleaned data, select or clear elements in the plot legends in the Visualizations tab.