Main Content

Data Preprocessing

Clean, normalize, aggregate, and analyze data

Data preprocessing is the process of transforming raw data into a format that is easier to analyze. This process can include cleaning steps, such as handling missing values or smoothing noisy data. By cleaning, organizing, and summarizing the data, you can identify patterns, make predictions, and inform decision-making.

Apps

expand all

Data CleanerPreprocess and organize column-oriented data (Since R2022a)

Live Editor Tasks

expand all

Clean Missing DataFind, fill, or remove missing data in the Live Editor
Clean Outlier DataFind, fill, or remove outliers in the Live Editor
Smooth DataSmooth noisy data in the Live Editor
Find Local ExtremaFind local maxima and minima in the Live Editor
Find Change PointsFind abrupt changes in data in the Live Editor
Stack Table VariablesCombine values from multiple table variables into one table variable in the Live Editor (Since R2020a)
Unstack Table VariablesDistribute values from one table variable to multiple table variables in the Live Editor (Since R2020a)
Retime TimetableResample or aggregate timetable data in the Live Editor (Since R2020a)
Normalize DataCenter and scale data in the Live Editor (Since R2021b)
Find and Remove TrendsFind and remove polynomial or periodic trends from data in the Live Editor
Pivot TableSummarize tabular data in pivoted table in the Live Editor (Since R2023b)
Compute by GroupSummarize, transform, or filter by group in the Live Editor (Since R2021b)

Functions

expand all

Missing Values

fillmissingFill missing entries
fillmissing2Fill missing entries in 2-D data (Since R2023a)
standardizeMissingInsert standard missing values
rmmissingRemove missing entries
anymissingDetermine if any array element is missing (Since R2022a)
ismissingFind missing values

Outliers

filloutliersDetect and replace outliers in data
rmoutliersDetect and remove outliers in data
clipClip data to range (Since R2024a)
isoutlierFind outliers in data
isbetweenDetermine which elements are within specified range

Noise Reduction

smoothdataSmooth noisy data
smoothdata2 Smooth noisy data in two dimensions (Since R2023b)
movmeanMoving mean
movmedianMoving median
movsumMoving sum

Local Extrema and Change Points

islocalminFind local minima
islocalmin2Find local minima in 2-D data (Since R2024a)
islocalmaxFind local maxima
islocalmax2Find local maxima in 2-D data (Since R2024a)
ischangeFind abrupt changes in data

Sampling

isuniformDetermine if vector is uniformly spaced (Since R2022b)
isregularDetermine if input times are regular with respect to time or calendar unit
retimeResample or aggregate data in timetable, and resolve duplicate or irregular times

Reshape Tables

rows2varsReorient table or timetable so that rows become variables
stackStack data from input table or timetable into one variable in output table or timetable
unstackUnstack data from one variable into multiple variables

Sort and Compare Elements

sortSort array elements
sortrowsSort rows of matrix or table
issortedDetermine if array is sorted
issortedrowsDetermine if matrix or table rows are sorted
uniqueUnique values
uniquetolUnique values within tolerance
ismemberFind set members of data
ismembertolFind set members of data within tolerance

Resize

paddataPad data by adding elements (Since R2023b)
trimdataTrim data by removing elements (Since R2023b)
resizeResize data by adding or removing elements (Since R2023b)

Normalize

normalizeNormalize data
rescaleScale range of array elements

Find and Remove Trends

detrendRemove polynomial trend
trenddecompFind trends in data (Since R2021b)

Bin

discretizeGroup data into bins or categories
histcountsHistogram bin counts
histcounts2Bivariate histogram bin counts

Pivot

pivotSummarize tabular data in pivoted table (Since R2023a)

Summarize

summaryData summary
groupsummaryGroup summary computations
groupcountsNumber of group elements
groupfilterFilter by group
grouptransformTransform by group
findgroupsFind groups and return group numbers
splitapplySplit data into groups and apply function
accumarrayAccumulate vector elements

Topics

Clean Data

Remove Trends

Summarize

Featured Examples