Main Content

rmmissing

Remove missing entries

Description

R = rmmissing(A) removes missing entries from an array or table. If A is a vector, then rmmissing removes any entry that contains missing data. If A is a matrix or table, then rmmissing removes any row that contains missing data.

Missing values are defined according to the data type of A:

  • NaNdouble, single, duration, and calendarDuration

  • NaTdatetime

  • <missing>string

  • <undefined>categorical

  • {''}cell of character vectors

If A is a table, then the data type of each variable defines the missing value for that variable.

You can use rmmissing functionality interactively by adding the Clean Outlier Data task to a live script.

example

R = rmmissing(A,dim) specifies the dimension of A to operate along. By default, rmmissing operates along the first dimension whose size does not equal 1.

example

R = rmmissing(___,Name,Value) specifies additional parameters for removing missing entries using one or more name-value arguments. For example, you can use rmmissing(A,'MinNumMissing',n) to remove rows of A that contain at least n missing values.

example

[R,TF] = rmmissing(___) also returns a logical vector corresponding to the rows or columns of A that were removed.

example

Examples

collapse all

Create a vector with missing entries, and remove each missing entry.

A = [1 3 NaN 6 NaN];
R = rmmissing(A)
R = 1×3

     1     3     6

Remove incomplete rows from a table with multiple data types.

First, create a table whose variables include categorical, double, and char data types.

A = table(categorical({''; 'F'; 'M'}),[45; 32; NaN],{''; 'CA'; 'MA'},[6051; 7234; NaN],...
    'VariableNames',{'Gender' 'Age' 'State' 'ID'})
A=3×4 table
      Gender       Age      State        ID 
    ___________    ___    __________    ____

    <undefined>     45    {0x0 char}    6051
    F               32    {'CA'    }    7234
    M              NaN    {'MA'    }     NaN

Remove any row of the table that contains missing data.

R = rmmissing(A)
R=1×4 table
    Gender    Age    State      ID 
    ______    ___    ______    ____

      F       32     {'CA'}    7234

Only remove rows with missing values in the Age or ID table variables.

R = rmmissing(A,'DataVariables',{'Age','ID'})
R=2×4 table
      Gender       Age      State        ID 
    ___________    ___    __________    ____

    <undefined>    45     {0x0 char}    6051
    F              32     {'CA'    }    7234

Alternatively, use the isnumeric function to identify the numeric variables to operate on.

R = rmmissing(A,'DataVariables',@isnumeric)
R=2×4 table
      Gender       Age      State        ID 
    ___________    ___    __________    ____

    <undefined>    45     {0x0 char}    6051
    F              32     {'CA'    }    7234

Create a matrix with missing data and remove any column (second dimension) containing two or more missing values. Return the new matrix and the logical row vector that indicates which columns of A were removed.

A = [NaN NaN 5 3 NaN 5 7 NaN 9 2;
     8 9 NaN 1 4 5 6 5 NaN 5;
     NaN 4 9 8 7 2 4 1 NaN 3]
A = 3×10

   NaN   NaN     5     3   NaN     5     7   NaN     9     2
     8     9   NaN     1     4     5     6     5   NaN     5
   NaN     4     9     8     7     2     4     1   NaN     3

[R,TF] = rmmissing(A,2,'MinNumMissing',2)
R = 3×8

   NaN     5     3   NaN     5     7   NaN     2
     9   NaN     1     4     5     6     5     5
     4     9     8     7     2     4     1     3

TF = 1x10 logical array

   1   0   0   0   0   0   0   0   1   0

Create a table and remove missing entries defined as -99. Create a table of logical variables loc that indicates the locations of missing entries to remove.

A = [1; 4; 9; -99; 3];
B = [9; 0; 6; 2; 1];
C = [-99; 4; 2; 3; 8];
T = table(A,B,C)
T=5×3 table
     A     B     C 
    ___    _    ___

      1    9    -99
      4    0      4
      9    6      2
    -99    2      3
      3    1      8

loc = T==-99
loc=5×3 table
      A        B        C  
    _____    _____    _____

    false    false    true 
    false    false    false
    false    false    false
    true     false    false
    false    false    false

Then, specify the known missing entry locations for rmmissing using the MissingLocations name-value argument. In addition to the data with missing entries removed, return a logical vector indicating which rows were removed.

[R,TF] = rmmissing(T,MissingLocations=loc)
R=3×3 table
    A    B    C
    _    _    _

    4    0    4
    9    6    2
    3    1    8

TF = 5x1 logical array

   1
   0
   0
   1
   0

Input Arguments

collapse all

Input data, specified as a vector, matrix, cell array of character vectors, table, or timetable.

  • If A is a timetable, then rmmissing(A) removes any row of A containing missing data and also removes the corresponding time vector element. If the time vector contains a NaT or NaN, then rmmissing(A) removes it from the time vector and also removes the corresponding row of A.

  • If A is a cell array or a table with cell array variables, then ismissing only detects missing elements when the cell array contains character vectors.

Dimension to operate along, specified as 1 or 2. If you do not specify the dimension, then the default is the first array dimension whose size does not equal 1.

Consider an m-by-n input matrix array, A:

  • rmmissing(A,1) removes rows of A that contain missing data.

    rmmissing(A,1) row removal

  • rmmissing(A,2) removes columns of A that contain missing data.

    rmmissing(A,2) column removal

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: rmmissing(A,DataVariables=["Temperature" "Altitude"]) removes rows of A that contain missing data in the Temperature or Altitude variables

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: rmmissing(A,"DataVariables",["Temperature" "Altitude"]) removes rows of A that contain missing data in the Temperature or Altitude variables

Minimum number of missing entries required to remove a row or column, specified as a nonnegative scalar, which is 1 by default.

Example: rmmissing(A,'MinNumMissing',6)

Since R2024b

Known missing entry indicator, specified as a logical vector or matrix, or a table or timetable with logical variables. Elements with a value of 1 (true) indicate the locations of missing entries in A. Elements with a value of 0 (false) indicate nonmissing entries.

When you specify MissingLocations, rmmissing does not use standard missing values. Instead, it uses the elements of the known missing entry indicator to define missing entries.

If MissingLocations is a vector or matrix, it must be the same size as A. If MissingLocations is a table or timetable, it must contain logical variables with the same sizes and names as the input table variables to operate on.

Data Types: logical | table | timetable

Table variables to operate on, specified as one of the options in this table. The DataVariables value indicates which variables of the input table to examine for missing values.

Other variables in the table not specified by DataVariables pass through to the output without being examined for missing values.

Indexing SchemeValues to SpecifyExamples

Variable names

  • A string scalar or character vector

  • A string array or cell array of character vectors

  • A pattern object

  • "A" or 'A' — A variable named A

  • ["A" "B"] or {'A','B'} — Two variables named A and B

  • "Var"+digitsPattern(1) — Variables named "Var" followed by a single digit

Variable index

  • An index number that refers to the location of a variable in the table

  • A vector of numbers

  • A logical vector. Typically, this vector is the same length as the number of variables, but you can omit trailing 0 (false) values.

  • 3 — The third variable from the table

  • [2 3] — The second and third variables from the table

  • [false false true] — The third variable

Function handle

  • A function handle that takes a table variable as input and returns a logical scalar

  • @isnumeric — All the variables containing numeric values

Variable type

  • A vartype subscript that selects variables of a specified type

  • vartype("numeric") — All the variables containing numeric values

Example: rmmissing(T,'DataVariables',["Var1" "Var2" "Var4"])

Output Arguments

collapse all

Data with missing entries removed, returned as a vector, matrix, table, or timetable. The size of R depends on the number of removed rows or columns.

Removed entry indicator, returned as a logical vector. The value 1 (true) corresponds to rows or columns in A that were removed. The value 0 (false) corresponds to unchanged rows and columns. The orientation and size of TF depends on A and the dimension of operation.

Data Types: logical

Tips

  • For input data that is a structure array or a cell array of non-character vectors, rmmissing does not remove any entries. To remove missing entries from a structure array, apply rmmissing to each field in the structure by using the structfun function. To remove missing entries in a cell array of non-character vectors, apply rmmissing to each cell in the cell array by using the cellfun function.

Alternative Functionality

Live Editor Task

You can use rmmissing functionality interactively by adding the Clean Missing Data task to a live script.

Clean Missing Data task in the Live Editor

Extended Capabilities

Version History

Introduced in R2016b

expand all