# crosstab

Cross-tabulation

## Syntax

``tbl = crosstab(x1,x2)``
``tbl = crosstab(x1,...,xn)``
``````[tbl,chi2,p] = crosstab(___)``````
``````[tbl,chi2,p,labels] = crosstab(___)``````

## Description

example

``tbl = crosstab(x1,x2)` returns a cross-tabulation, `tbl`, of two vectors of the same length, `x1` and `x2`.`

example

``tbl = crosstab(x1,...,xn)` returns a multi-dimensional cross-tabulation, `tbl`, of data for multiple input vectors, `x1`, `x2`, ..., `xn`.`

example

``````[tbl,chi2,p] = crosstab(___)``` also returns the chi-square statistic, `chi2`, and its p-value, `p`, for a test that `tbl` is independent in each dimension. You can use any of the previous syntaxes.```

example

``````[tbl,chi2,p,labels] = crosstab(___)``` also returns a cell array, `labels`, which contains one column of labels for each input argument, `x1` ... `xn`.```

## Examples

collapse all

Create two sample data vectors, containing three and four distinct values, respectively.

```x = [1 1 2 3 1]; y = [1 2 5 3 1];```

Cross-tabulate `x` and `y`.

`table = crosstab(x,y)`
```table = 3×4 2 1 0 0 0 0 0 1 0 0 1 0 ```

The rows in `table` correspond to the three distinct values in `x`, and the columns correspond to the four distinct values in `y`.

Generate two independent vectors, `x1` and `x2`, each containing 50 discrete uniform random numbers in the range `1:3`.

```rng default; % for reproducibility x1 = unidrnd(3,50,1); x2 = unidrnd(3,50,1);```

Cross-tabulate `x1` and `x2`.

`[table,chi2,p] = crosstab(x1,x2)`
```table = 3×3 1 6 7 5 5 2 11 7 6 ```
```chi2 = 7.5449 ```
```p = 0.1097 ```

The returned `p` value of `0.1097` indicates that, at the 5% significance level, `crosstab` fails to reject the null hypothesis that `table` is independent in each dimension.

Load the sample data, which contains measurements of large model cars during the years 1970-1982.

`load carbig`

Cross-tabulate the data of four-cylinder cars (`cyl4`) based on model year (`when`) and country of origin (`org`).

`[table,chi2,p,labels] = crosstab(cyl4,when,org);`

Use `labels` to determine the index location in `table` for the number of four-cylinder cars made in the USA during the late period of the data.

`labels`
```labels=3×3 cell array {'Other' } {'Early'} {'USA' } {'Four' } {'Mid' } {'Europe'} {0x0 double} {'Late' } {'Japan' } ```

The first column of `labels` corresponds to the data in `cyl4`, and indicates that row `2` of `table` contains data on cars with four cylinders. The second column of `labels` corresponds to the data in `when`, and indicates that column `3` of `table` contains data on cars made during the late period. The third column of `labels` corresponds to the data in `org`, and indicates that location `1` of the third dimension of `table` contains data on cars made in the USA.

Therefore, `table(2,3,1)` contains the number of four-cylinder cars made in the USA during the late period.

`table(2,3,1)`
```ans = 38 ```

The data contains 38 four-cylinder cars made in the USA during the late period.

Create a contingency table from data, and visualize the table in a heatmap chart.

`load hospital`

The `hospital` dataset array contains data on 100 hospital patients, including last name, gender, age, weight, smoking status, and systolic and diastolic blood pressure measurements.

Convert the dataset array to a MATLAB® table.

`Tbl = dataset2table(hospital);`

Determine whether smoking status is independent of gender by creating a 2-by-2 contingency table of smokers and nonsmokers, grouped by gender.

`[conttbl,chi2,p,labels] = crosstab(Tbl.Sex,Tbl.Smoker)`
```conttbl = 2×2 40 13 26 21 ```
```chi2 = 4.5083 ```
```p = 0.0337 ```
```labels = 2x2 cell {'Female'} {'0'} {'Male' } {'1'} ```

The rows of the resulting contingency table `conttbl` correspond to patient gender, with row 1 containing data for females and row 2 containing data for males. The columns correspond to patient smoking status, with column 1 containing data for nonsmokers and column 2 containing data for smokers. The returned result `chi2 = 4.5083` is the value of the chi-squared test statistic for a Pearson's chi-squared test of independence. The $\mathit{p}$-value for the test `p = 0.0337` suggests, at a 5% level of significance, rejection of the null hypothesis that gender and smoking status are independent.

Visualize the contingency table in a heatmap. Plot smoking status on the $\mathit{x}$-axis and gender on the $\mathit{y}$-axis.

`heatmap(Tbl,'Smoker','Sex')`

## Input Arguments

collapse all

Input vector, specified as a vector of grouping variables. All input vectors, including `x1`, `x2`, ..., `xn`, must be the same length.

Data Types: `single` | `double` | `char` | `string` | `logical` | `categorical`

Input vector, specified as a vector of grouping variables. All input vectors, including `x1`, `x2`, ..., `xn`, must be the same length.

Data Types: `single` | `double` | `char` | `string` | `logical` | `categorical`

Input vectors, specified as vectors of grouping variables. If you use this syntax to specify more than two input vectors, then `crosstab` generates a multi-dimensional cross-tabulation table. All input vectors, including `x1`, `x2`, ..., `xn`, must be the same length.

Data Types: `single` | `double` | `char` | `string` | `logical` | `categorical`

## Output Arguments

collapse all

Cross-tabulation table, returned as a matrix of integer values.

If you specify two input vectors, `x1` and `x2`, then `tbl` is an m-by-n matrix, where m is the number of distinct values in `x1` and n is the number of distinct values in `x2`.

If you specify three or more input vectors, then `tbl(i,j,...,n)` is a count of indices where `grp2idx(x1)` is `i`, `grp2idx(x2)` is `j`, `grp2idx(x3)` is `k`, and so on.

Chi-square statistic, returned as a positive scalar value. The null hypothesis is that the proportion in any entry of `tbl` is the product of the proportions in each dimension.

p-value for the chi-square test statistic, returned as a scalar value in the range `[0,1]`. `crosstab` tests that `tbl` is independent in each dimension.

Data labels, returned as a cell array. The entries in the first column are labels for the rows of `tbl`, the entries in the second column are labels for the columns, and so on, for a multi-dimensional `tbl`.

## Algorithms

• `crosstab` uses `grp2idx` to assign a positive integer to each distinct value. `tbl(i,j)` is a count of indices where `grp2idx(x1)` is `i` and `grp2idx(x2)` is `j`. The numerical order of `grp2idx(x1)` and `grp2idx(x2)` order rows and columns of `tbl`, respectively.

In this case, the returned value of `tbl(i,j,...,n)` is a count of indices where `grp2idx(x1)` is `i`, `grp2idx(x2)` is `j`, `grp2idx(x3)` is `k`, and so on.

• `crosstab` computes the p-value of the chi-square test statistic using a formula that is asymptotically valid for a large sample size. The approximation is less accurate for small samples or samples with uneven marginal distributions. If your sample includes only two variables and each has two levels, you can use `fishertest` instead. This function performs Fisher’s exact test, which does not depend on large-sample distribution assumptions.

## Version History

Introduced before R2006a