# groupsummary

Group summary computations

## Syntax

``G = groupsummary(T,groupvars)``
``G = groupsummary(T,groupvars,method)``
``G = groupsummary(T,groupvars,method,datavars)``
``G = groupsummary(T,groupvars,groupbins)``
``G = groupsummary(T,groupvars,groupbins,method)``
``G = groupsummary(T,groupvars,groupbins,method,datavars)``
``G = groupsummary(___,Name,Value)``
``B = groupsummary(A,groupvars,method)``
``B = groupsummary(A,groupvars,groupbins,method)``
``B = groupsummary(___,Name,Value)``
``[B,BG] = groupsummary(A,___)``
``[B,BG,BC] = groupsummary(A,___)``

## Description

example

````G = groupsummary(T,groupvars)` returns a table containing the computed groups and the number of elements in each group for data in a table or timetable `T`. A group contains the unique combinations of grouping variables in `groupvars`. For example, ```G = groupsummary(T,'Gender')``` returns the number of `Male` elements and the number of `Female` elements in the variable `Gender`.```

example

````G = groupsummary(T,groupvars,method)` also returns the computations specified in `method`. For example, `G = groupsummary(T,'Gender','median')` returns the median of all nongrouping variables in `T` for both genders, in addition to the number of elements in each group.```

example

````G = groupsummary(T,groupvars,method,datavars)` specifies the variables to apply the computations to.```
````G = groupsummary(T,groupvars,groupbins)` specifies how to bin the data in `groupvars`. For example, `G = groupsummary(T,'SaleDate','year')` gives the group counts for all sales in `T` within each year according to the grouping variable `SaleDate`.```
````G = groupsummary(T,groupvars,groupbins,method)` bins the data according to `groupbins` for the computations specified in `method`.```

example

````G = groupsummary(T,groupvars,groupbins,method,datavars)` bins the data according to `groupbins` and specifies the variables to apply the computations to.```

example

````G = groupsummary(___,Name,Value)` specifies additional grouping properties using one or more name-value pairs for any of the previous syntaxes. For example, ```G = groupsummary(T,'Category1','IncludeMissingGroups',false)``` excludes the group made from missing `categorical` data indicated by `<undefined>`.```
````B = groupsummary(A,groupvars,method)` returns the computations according to the unique combinations of grouping vectors in `groupvars` when `A` is a vector or matrix. `groupvars` can be a column vector, matrix, or cell array of column vectors.```

example

````B = groupsummary(A,groupvars,groupbins,method)` bins the data according to `groupbins`.```

example

````B = groupsummary(___,Name,Value)` specifies additional grouping properties using one or more name-value pairs for either of the previous array syntaxes.```
````[B,BG] = groupsummary(A,___)` also returns the groups for each grouping vector.```

example

````[B,BG,BC] = groupsummary(A,___)` also returns the group counts for each group.```

## Examples

collapse all

Compute summary statistics on table variables.

Create a table `T` that contains information about five individuals.

```Gender = ["male";"female";"female";"male";"male"]; Age = [38;43;38;40;49]; Height = [71;69;64;67;64]; Weight = [176;163;131;133;119]; T = table(Gender,Age,Height,Weight)```
```T=5×4 table Gender Age Height Weight ________ ___ ______ ______ "male" 38 71 176 "female" 43 69 163 "female" 38 64 131 "male" 40 67 133 "male" 49 64 119 ```

Compute the counts of males and females by specifying `Gender` as the grouping variable.

`G = groupsummary(T,"Gender")`
```G=2×2 table Gender GroupCount ________ __________ "female" 2 "male" 3 ```

Compute the mean age, height, and weight of females and males separately.

`G = groupsummary(T,"Gender","mean")`
```G=2×5 table Gender GroupCount mean_Age mean_Height mean_Weight ________ __________ ________ ___________ ___________ "female" 2 40.5 66.5 147 "male" 3 42.333 67.333 142.67 ```

Still grouping by gender, compute the median height only.

`G = groupsummary(T,"Gender","median","Height")`
```G=2×3 table Gender GroupCount median_Height ________ __________ _____________ "female" 2 66.5 "male" 3 67 ```

Group table data using two grouping variables.

Create a table `T` that contains information about five individuals.

```Gender = ["male";"female";"male";"female";"male"]; Smoker = logical([1;0;1;0;1]); Weight = [176;163;131;133;119]; T = table(Gender,Smoker,Weight)```
```T=5×3 table Gender Smoker Weight ________ ______ ______ "male" true 176 "female" false 163 "male" true 131 "female" false 133 "male" true 119 ```

Compute the mean weight, grouped by gender and smoking status. By default, two combinations of gender and smoking status are not represented in the output because they are empty groups.

`G = groupsummary(T,{'Gender','Smoker'},'mean','Weight')`
```G=2×4 table Gender Smoker GroupCount mean_Weight ________ ______ __________ ___________ "female" false 2 148 "male" true 3 142 ```

Set the `'IncludeEmptyGroups'` parameter value to `true` in order to see all group combinations, including the empty ones.

`G = groupsummary(T,{'Gender','Smoker'},'mean','Weight','IncludeEmptyGroups',true)`
```G=4×4 table Gender Smoker GroupCount mean_Weight ________ ______ __________ ___________ "female" false 2 148 "female" true 0 NaN "male" false 0 NaN "male" true 3 142 ```

Group data according to specified bins.

Create a timetable containing sales information for days within a single month.

```TimeStamps = datetime([2017 3 4; 2017 3 2; 2017 3 15; 2017 3 10;... 2017 3 14; 2017 3 31; 2017 3 25;... 2017 3 29; 2017 3 21; 2017 3 18]); Profit = [2032 3071 1185 2587 1998 2899 3112 909 2619 3085]'; TotalItemsSold = [14 13 8 5 10 16 8 6 7 11]'; TT = timetable(TimeStamps,Profit,TotalItemsSold)```
```TT=10×2 timetable TimeStamps Profit TotalItemsSold ___________ ______ ______________ 04-Mar-2017 2032 14 02-Mar-2017 3071 13 15-Mar-2017 1185 8 10-Mar-2017 2587 5 14-Mar-2017 1998 10 31-Mar-2017 2899 16 25-Mar-2017 3112 8 29-Mar-2017 909 6 21-Mar-2017 2619 7 18-Mar-2017 3085 11 ```

Compute the mean profit grouped by the total items sold, binning the groups into intervals of item numbers.

```format shorte G = groupsummary(TT,'TotalItemsSold',[0 4 8 12 16],'mean','Profit')```
```G=3×3 table disc_TotalItemsSold GroupCount mean_Profit ___________________ __________ ___________ [4, 8) 3.0000e+00 2.0383e+03 [8, 12) 4.0000e+00 2.3450e+03 [12, 16] 3.0000e+00 2.6673e+03 ```

Compute the mean profit grouped by day of the week.

`G = groupsummary(TT,'TimeStamps','dayname','mean','Profit')`
```G=5×3 table dayname_TimeStamps GroupCount mean_Profit __________________ __________ ___________ Tuesday 2.0000e+00 2.3085e+03 Wednesday 2.0000e+00 1.0470e+03 Thursday 1.0000e+00 3.0710e+03 Friday 2.0000e+00 2.7430e+03 Saturday 3.0000e+00 2.7430e+03 ```

Create a vector of dates and a vector of corresponding profit values.

```timeStamps = datetime([2017 3 4; 2017 3 2; 2017 3 15; 2017 3 10; ... 2017 3 14; 2017 3 31; 2017 3 25; ... 2017 3 29; 2017 3 21; 2017 3 18]); profit = [2032 3071 1185 2587 1998 2899 3112 909 2619 3085]';```

Compute the mean profit by day of the week. Display the means, the group names, and the number of members in each group.

```format shorte [meanDailyProfit,dayOfWeek,dailyCounts] = groupsummary(profit,timeStamps,'dayname','mean')```
```meanDailyProfit = 5×1 2.3085 1.0470 3.0710 2.7430 2.7430 ```
```dayOfWeek = 5x1 categorical Tuesday Wednesday Thursday Friday Saturday ```
```dailyCounts = 5×1 2 2 1 2 3 ```

Compute the mean weights for four groups based on their gender and smoker status.

Store patient information as three vectors of different types.

```Gender = ["male";"female";"male";"female";"male"]; Smoker = logical([1;0;1;0;1]); Weight = [176;163;131;133;119];```

Grouping by gender and smoker status, compute the mean weights. `B` contains the mean for each group (`NaN` for empty groups). `BG` is a cell array containing two vectors that describe the groups as you look at their elements rowwise. For instance, the first row of `BG{1}` says that the patients in the first group are female, and the first row of `BG{2}` says that they are nonsmokers. Finally, `BC` contains the number of members in each group for the corresponding groups in `BG`.

```[B,BG,BC] = groupsummary(Weight,{Gender,Smoker},'mean','IncludeEmptyGroups',true); B```
```B = 4×1 148 NaN NaN 142 ```
`BG{1}`
```ans = 4x1 string "female" "female" "male" "male" ```
`BG{2}`
```ans = 4x1 logical array 0 1 0 1 ```
`BC`
```BC = 4×1 2 0 0 3 ```

Load data containing patient information and create a table describing each patient's gender, systolic and diastolic blood pressure, height, and weight.

```load patients T = table(Gender,Systolic,Diastolic,Height,Weight)```
```T=100×5 table Gender Systolic Diastolic Height Weight __________ ________ _________ ______ ______ {'Male' } 124 93 71 176 {'Male' } 109 77 69 163 {'Female'} 125 83 64 131 {'Female'} 117 75 67 133 {'Female'} 122 80 64 119 {'Female'} 121 70 68 142 {'Female'} 130 88 64 142 {'Male' } 115 82 68 180 {'Male' } 115 78 68 183 {'Female'} 118 86 66 132 {'Female'} 114 77 68 128 {'Female'} 115 68 66 137 {'Male' } 127 74 71 174 {'Male' } 130 95 72 202 {'Female'} 114 79 65 129 {'Male' } 130 92 71 181 ⋮ ```

Grouping by gender, compute the correlation between patient height and weight and the correlation between systolic and diastolic blood pressure. Use the `xcov` function as the method to compute the correlation. The first two input arguments to `xcov` describe the data to correlate, the third argument describes the lag size, and the fourth argument describes the type of normalization. For each group computation, the `x` and `y` arguments passed into `xcov` are specified pairwise by variable from the two cell elements `["Height","Systolic"]` and `["Weight","Diastolic"]`.

`G = groupsummary(T,"Gender",@(x,y)xcov(x,y,0,'coeff'),{["Height","Systolic"],["Weight","Diastolic"]})`
```G=2×4 table Gender GroupCount fun1_Height_Weight fun1_Systolic_Diastolic __________ __________ __________________ _______________________ {'Female'} 53 0.071278 0.48731 {'Male' } 47 0.047571 0.50254 ```

Alternatively, if your data is in vector or matrix form instead of in a table, you can provide the data to correlate as the first input argument of `groupsummary`.

`[G,GR,GC] = groupsummary({[Height,Systolic],[Weight,Diastolic]},Gender,@(x,y)xcov(x,y,0,'coeff'))`
```G = 2×2 0.0713 0.4873 0.0476 0.5025 ```
```GR = 2x1 cell {'Female'} {'Male' } ```
```GC = 2×1 53 47 ```

## Input Arguments

collapse all

Input data, specified as a table or timetable.

Input array, specified as a vector, matrix, or cell array of vectors or matrices.

When you specify a function handle for `method` that takes more than one input argument, the input data `A` must be a cell array of vectors or matrices. In each call to the function by group, the input arguments are the corresponding columns of each element in the cell array. For example:

• ```groupsummary({x1, y1},groupvars,@(x,y) myFun(x,y))``` calculates `myFun(x1,y1)` for each group.

• ```groupsummary({[x1 x2], [y1 y2]},groupvars,@(x,y) myFun(x,y))``` first calculates `myFun(x1,y1)` for each group, and then calculates `myFun(x2,y2)` for each group.

Grouping variables or vectors, specified as one of these options:

• For array input, `groupvars` can be either a column vector with the same number of rows as `A` or a group of column vectors arranged in a matrix or cell array.

• For table or timetable inputs, `groupvars` indicates which variables to use to compute groups in the data. You can specify the grouping variables with any of the options in this table.

OptionDescriptionExamples
Scalar variable name

A character vector or scalar string specifying a single table variable name.

`'Var1'`

`"Var1"`

Vector of variable names

A cell array of character vectors or string array where each element is a table variable name.

```{'Var1' 'Var2'}```

```["Var1" "Var2"]```

Scalar or vector of variable indices

A scalar or vector of table variable indices.

`1`

```[1 3 5]```

Logical scalar or vector

A logical vector whose elements each correspond to a table variable, where `true` includes the corresponding variable and `false` excludes it.

```[true false true]```

Function handle

A function handle that takes a table variable as input and returns a logical scalar.

`@isnumeric`

`vartype` subscript

A table subscript generated by the `vartype` function.

`vartype('numeric')`

Example: `groupsummary(T,"Var3")`

Computation method, specified as one of the following:

• `'sum'` — sum

• `'mean'` — mean

• `'median'` — median

• `'mode'` — mode

• `'var'` — variance

• `'std'` — standard deviation

• `'min'` — minimum

• `'max'` — maximum

• `'range'` — maximum minus minimum

• `'nummissing'` — number of missing elements

• `'nnz'` — number of nonzero and non-`NaN` elements

• `'all'` — all computations previously listed

You also can specify `method` as a function handle that returns one entity per group whose first dimension has length 1. For table input data, the function operates on each table variable separately.

When the input data is a table `T` and you specify a function handle for `method` that takes more than one input argument, you must specify `datavars`. The `datavars` argument must be a cell array whose elements indicate the table variables to use for each input into the method. In each call to the function by group, the input arguments are the corresponding table variables of the cell array elements. For example:

• ```groupsummary(T,groupvars,@(x,y) myFun(x,y),{"x1","y1"})``` calculates `myFun(T.x1,T.y1)` for each group.

• ```groupsummary(T,groupvars,@(x,y) myFun(x,y),{["x1" "x2"],["y1" "y2"]})``` first calculates `myfun(T.x1,T.y1)` for each group, and then calculates `myfun(T.x2,T.y2)` for each group.

When the input data is in vector or matrix form and you specify a function handle for `method` that takes more than one input argument, the input data `A` must be a cell array of vectors or matrices. In each call to the function, the input arguments are the corresponding columns of each element in the cell array. For example:

• ```groupsummary({x1,y1},groupvars,@(x,y) myFun(x,y))``` calculates `myFun(x1,y1)` for each group.

• ```groupsummary({[x1 x2],[y1 y2]},groupvars,@(x,y) myFun(x,y))``` first calculates `myFun(x1,y1)` for each group, and then calculates `myFun(x2,y2)` for each group.

To specify multiple computations at a time, list the options in a cell array, such as `{'mean','median'}` or `{myFun1,myFun2}`.

`NaN` values in the input data are automatically omitted when using the method names described here, with the exception of `'nummissing'`. To include `NaN` values, consider using a function handle for the method, such as `@sum` instead of `'sum'`.

Data Types: `char` | `string` | `cell` | `function_handle`

Table variables to operate on, specified as one of the options in this table. `datavars` indicates which variables of the input table or timetable to apply the methods to. Other variables not specified by `datavars` are not operated on and do not pass through to the output. When `datavars` is not specified, `groupsummary` operates on each nongrouping variable.

OptionDescriptionExamples
Variable name

A character vector or scalar string specifying a single table variable name

`'Var1'`

`"Var1"`

Vector of variable names

A cell array of character vectors or string array where each element is a table variable name

`{'Var1' 'Var2'}`

`["Var1" "Var2"]`

Scalar or vector of variable indices

A scalar or vector of table variable indices

`1`

`[1 3 5]`

Logical vector

A logical vector whose elements each correspond to a table variable, where `true` includes the corresponding variable and `false` excludes it

`[true false true]`

Function handle

A function handle that takes a table variable as input and returns a logical scalar

`@isnumeric`

`vartype` subscript

A table subscript generated by the `vartype` function

`vartype('numeric')`

When the input data is a table `T` and you specify a function handle for `method` that takes more than one input argument, you must specify `datavars`. The `datavars` argument must be a cell array whose elements are any of the options in the table. The cell array elements indicate the table variables to use for each input into the method. In each call to the function by group, the input arguments are the corresponding table variables of the cell array elements. For example:

• ```groupsummary(T,groupvars,@(x,y) myFun(x,y),{"x1", "y1"})``` calculates `myFun(T.x1,T.y1)` for each group.

• ```groupsummary(T,groupvars,@(x,y) myFun(x,y),{["x1" "x2"],["y1" "y2"]})``` first calculates `myfun(T.x1,T.y1)` for each group, and then calculates `myfun(T.x2,T.y2)` for each group.

Example: ```groupsummary(T,groupvars,method,["Var1" "Var2" "Var4"])```

Binning scheme, specified as one of the following options:

• `'none'`, indicating no binning

• A list of bin edges, specified as a numeric vector, or a `datetime` vector for `datetime` grouping variables or vectors

• A number of bins, specified as an integer scalar

• A time duration, specified as a scalar of type `duration` or `calendarDuration` indicating bin widths (for `datetime` or `duration` grouping variables or vectors only)

• A time bin for `datetime` and `duration` grouping variables or vectors only, specified as one of the following character vectors:

ValueDescriptionData Type
`'second'`

Each bin is 1 second.

`datetime` and `duration`
`'minute'`

Each bin is 1 minute.

`datetime` and `duration`
`'hour'`

Each bin is 1 hour.

`datetime` and `duration`
`'day'`

Each bin is 1 calendar day. This value accounts for Daylight Saving Time shifts.

`datetime` and `duration`
`'week'`Each bin is 1 calendar week.`datetime` only
`'month'`Each bin is 1 calendar month.`datetime` only
`'quarter'`Each bin is 1 calendar quarter.`datetime` only
`'year'`

Each bin is 1 calendar year. This value accounts for leap days.

`datetime` and `duration`
`'decade'`Each bin is 1 decade (10 calendar years).`datetime` only
`'century'`Each bin is 1 century (100 calendar years).`datetime` only
`'secondofminute'`

Bins are seconds from 0 to 59.

`datetime` only
`'minuteofhour'`

Bins are minutes from 0 to 59.

`datetime` only
`'hourofday'`

Bins are hours from 0 to 23.

`datetime` only
`'dayofweek'`

Bins are days from 1 to 7. The first day of the week is Sunday.

`datetime` only
`'dayname'`Bins are full day names such as `'Sunday'`.`datetime` only
`'dayofmonth'`Bins are days from 1 to 31.`datetime` only
`'dayofyear'`Bins are days from 1 to 366.`datetime` only
`'weekofmonth'`Bins are weeks from 1 to 6.`datetime` only
`'weekofyear'`Bins are weeks from 1 to 54.`datetime` only
`'monthname'`Bins are full month names such as `'January'`.`datetime` only
`'monthofyear'`

Bins are months from 1 to 12.

`datetime` only
`'quarterofyear'`Bins are quarters from 1 to 4.`datetime` only

• A cell array listing binning rules for each grouping variable or vector

When multiple grouping variables are specified, you can provide a single binning rule that is applied to all grouping variables, or a cell array containing a binning method for each grouping variable such as `{'none',[0 2 4 Inf]}`.

### Name-Value Arguments

Specify optional comma-separated pairs of `Name,Value` arguments. `Name` is the argument name and `Value` is the corresponding value. `Name` must appear inside quotes. You can specify several name and value pair arguments in any order as `Name1,Value1,...,NameN,ValueN`.

Example: ```G = groupsummary(T,groupvars,groupbins,'IncludedEdge','right')```

Included bin edge, specified as either `'left'` or `'right'`, indicating which end of the bin interval is inclusive.

This name-value pair can only be specified when `groupbins` is specified, and the value is applied to all binning schemes for all grouping variables or vectors.

Missing groups indicator, specified as `true` or `false`. When the parameter value is `true`, `groupsummary` displays groups made up of missing values, such as `NaN`. When the parameter value is `false`, `groupsummary` does not display the missing groups.

Data Types: `logical`

Empty groups indicator, specified as `true` or `false`. When the parameter value is `false`, `groupsummary` does not display groups with zero elements. When the parameter value is `true`, `groupsummary` displays the empty groups.

Data Types: `logical`

## Output Arguments

collapse all

Output table, returned as a table containing the specified computations for each group.

Output array, returned as a vector or matrix containing the group computations. When you specify multiple methods, `groupsummary` horizontally concatenates the computations in the order that they were listed.

Groups for array input data, returned as a column vector or cell array of column vectors each corresponding to a grouping vector.

When you provide more than one grouping vector, `BG` is a cell array containing column vectors of equal length. The group information can be found by looking at the elements rowwise across all vectors in `BG`. Each group maps to the corresponding row of the output array `B`.

Group counts for array input data, returned as a column vector containing the number of elements in each group. The length of `BC` is the same as the length of the group column vectors returned in `BG`.

## Tips

• When making many calls to `groupsummary`, consider converting grouping variables to type `categorical` or `logical` when possible for improved performance. For example, if you have a grouping variable of type `char` (such as `Gender` with elements `'Male'` and `'Female'`), you can convert it to a categorical value using the command `categorical(Gender)`.