Main Content

*Grouping variables* are utility variables
used to group, or categorize, observations. Grouping variables are
useful for summarizing or visualizing data by group. A grouping variable
can be any of these data types:

Numeric vector

Logical vector

Character array

String array

Cell array of character vectors

Categorical vector

A grouping variable must have the same number of observations (rows) as the table, dataset array, or numeric array you are grouping. Observations that have the same grouping variable value belong to the same group.

For example, the following variables comprise the same groups. Each grouping variable divides five observations into two groups. The first group contains the first and fourth observations. The other three observations are in the second group.

Data Type | Grouping Variable |
---|---|

Numeric vector | `[1 2 2 1 2]` |

Logical vector | `[0 1 1 0 1]` |

String array | `["Male","Female","Female","Male","Female"]` |

Cell array of character vectors | `{'Male','Female','Female','Male','Female'}` |

Categorical vector | `Male Female Female Male Female` |

Use grouping variables with labels to give each group a meaningful name. A categorical vector is an efficient and flexible choice of grouping variable.

Typically, there are as many groups as unique values in the grouping variable. However,
categorical vectors can have levels that are not represented in the data. The groups
and the order of the groups depend on the data type of the grouping variable.
Suppose `G`

is a grouping variable.

If

`G`

is a numeric or logical vector, then the groups correspond to the distinct values in`G`

, in the sorted order of the unique values.If

`G`

is a character array, string array, or cell array of character vectors, then the groups correspond to the distinct elements in`G`

, in the order of their first appearance.If

`G`

is a categorical vector, then the groups correspond to the unique category levels in`G`

, in the order returned by`categories`

.

Some functions, such as `grpstats`

, accept
multiple grouping variables specified as a cell array of grouping
variables, for example, `{G1,G2,G3}`

. In this case,
the groups are defined by the unique combinations of values in the
grouping variables. The order is decided first by the order of the
first grouping variable, then by the order of the second grouping
variable, and so on.

This table lists common tasks you might want to perform using grouping variables.

Grouping Task | Function Accepting Grouping Variable |
---|---|

Draw side-by-side boxplots for data in different groups. | `boxplot` |

Draw a scatter plot with markers colored by group. | `gscatter` |

Draw a scatter plot matrix with markers colored by group. | `gplotmatrix` |

Compute summary statistics by group. | `grpstats` |

Test for differences between group means. | `anovan` |

Create an index vector from a grouping variable. | `grp2idx` |

Grouping variables can have missing values provided you include a valid indicator.

Grouping Variable Data Type | Missing Value Indicator |
---|---|

Numeric vector | `NaN` |

Logical vector | (Cannot be missing) |

Character array | Row of spaces |

String array | `<missing>` or
`""` |

Cell array of character vectors | `''` |

Categorical vector | `<undefined>` |