# Create Categorical Arrays

This example shows how to create a categorical array. `categorical`

is a data type for storing data with values from a finite set of discrete categories. These categories can have a natural order, but it is not required. A categorical array provides efficient storage and convenient manipulation of data, while also maintaining meaningful names for the values. You can use categorical arrays in a table to define groups of rows.

By default, categorical arrays contain categories that have no mathematical ordering. For example, the discrete set of pet categories `["dog","cat","bird"]`

has no meaningful mathematical ordering, so MATLAB® uses the alphabetical ordering `["bird","cat","dog"]`

. *Ordinal* categorical arrays contain categories that have a meaningful mathematical ordering. For example, the discrete set of size categories `["small","medium","large"]`

has the mathematical ordering `small < medium < large`

.

When you create categorical arrays from string arrays (or cell arrays of character vectors), leading and trailing spaces are removed. For example, if you specify the text `[" cat","dog"]`

as categories, then when you convert them to categories they become `["cat","dog"]`

.

### Create Categorical Array from String Array

You can use the `categorical`

function to create a categorical array from a numeric array, logical array, string array, cell array of character vectors, or an existing categorical array.

Create a 1-by-11 string array containing state names from New England.

state = ["MA","ME","CT","VT","ME","NH","VT","MA","NH","CT","RI"]

`state = `*1x11 string*
Columns 1 through 9
"MA" "ME" "CT" "VT" "ME" "NH" "VT" "MA" "NH"
Columns 10 through 11
"CT" "RI"

Convert the string array, `state`

, to a categorical array that has no mathematical order.

state = categorical(state)

`state = `*1x11 categorical*
Columns 1 through 9
MA ME CT VT ME NH VT MA NH
Columns 10 through 11
CT RI

List the discrete categories in the variable `state`

. There are only six unique states listed in `state`

, which means there are six categories. The categories are listed in alphabetical order.

categories(state)

`ans = `*6x1 cell*
{'CT'}
{'MA'}
{'ME'}
{'NH'}
{'RI'}
{'VT'}

### Add New and Missing Elements

Add elements to the original string array. One of the elements is the missing string, displayed as `<missing>`

. Just as `NaN`

can indicate missing values in a numeric array, `<missing>`

indicates missing values in a string array.

state = ["MA","ME","CT","VT","ME","NH","VT","MA","NH","CT","RI"]; state = [string(missing) state]; state(13) = "ME"

`state = `*1x13 string*
Columns 1 through 9
<missing> "MA" "ME" "CT" "VT" "ME" "NH" "VT" "MA"
Columns 10 through 13
"NH" "CT" "RI" "ME"

Convert the string array to a `categorical`

array. The missing string becomes an undefined category, displayed as `<undefined>`

. It indicates an element of the categorical array that does not belong to any category.

state = categorical(state)

`state = `*1x13 categorical*
Columns 1 through 8
<undefined> MA ME CT VT ME NH VT
Columns 9 through 13
MA NH CT RI ME

### Create Ordinal Categorical Array from String Array

Create a 1-by-8 string array containing the sizes of eight objects.

AllSizes = ["medium","large","small","small","medium",... "large","medium","small"];

The string array, `AllSizes`

, has three distinct values: `"large"`

, `"medium"`

, and `"small"`

. When using a string array, there is no convenient way to indicate that `small < medium < large`

.

Convert the string array, `AllSizes`

, to an ordinal categorical array. Use `valueset`

to specify the values `small`

, `medium`

, and `large`

, which define the categories. For an ordinal categorical array, the first category specified is the smallest and the last category is the largest.

valueset = ["small","medium","large"]; sizeOrd = categorical(AllSizes,valueset,'Ordinal',true)

`sizeOrd = `*1x8 categorical*
Columns 1 through 6
medium large small small medium large
Columns 7 through 8
medium small

The order of the values in the categorical array, `sizeOrd`

, remains unchanged.

List the discrete categories in the categorical variable, `sizeOrd`

.

categories(sizeOrd)

`ans = `*3x1 cell*
{'small' }
{'medium'}
{'large' }

The categories are listed in the specified order to match the mathematical ordering `small < medium < large`

.

### Create Ordinal Categorical Array by Binning Numeric Data

Create a vector of 100 random numbers between zero and 50.

x = rand(100,1)*50;

Use the `discretize`

function to create a categorical array by binning the values of `x`

. Put all values between zero and 15 in the first bin, all the values between 15 and 35 in the second bin, and all the values between 35 and 50 in the third bin. Each bin includes the left endpoint, but does not include the right endpoint.

catnames = ["small","medium","large"]; binnedData = discretize(x,[0 15 35 50],'categorical',catnames);

`binnedData`

is a 100-by-1 ordinal categorical array with three categories, such that `small < medium < large`

.

Use the `summary`

function to print the number of elements in each category.

summary(binnedData)

small 30 medium 35 large 35

You can make various kinds of charts of the binned data. For example, make a pie chart of `binnedData`

.

pie(binnedData)

## See Also

`categorical`

| `categories`

| `summary`

| `discretize`

## Related Examples

- Convert Text in Table Variables to Categorical
- Access Data Using Categorical Arrays
- Compare Categorical Array Elements