Predict labels using *k*-nearest neighbor classification
model

`label = predict(mdl,X)`

```
[label,score,cost]
= predict(mdl,X)
```

returns a vector of predicted class labels for the predictor data in the table or
matrix `label`

= predict(`mdl`

,`X`

)`X`

, based on the trained *k*-nearest
neighbor classification model `mdl`

. See Predicted Class Label.

`[`

also returns:`label`

,`score`

,`cost`

]
= predict(`mdl`

,`X`

)

A matrix of classification scores (

`score`

) indicating the likelihood that a label comes from a particular class. For*k*-nearest neighbor, scores are posterior probabilities. See Posterior Probability.A matrix of expected classification cost (

`cost`

). For each observation in`X`

, the predicted class label corresponds to the minimum expected classification costs among all classes. See Expected Cost.

Create a *k*-nearest neighbor classifier for Fisher's iris data, where *k* = 5. Evaluate some model predictions on new data.

Load the Fisher iris data set.

```
load fisheriris
X = meas;
Y = species;
```

Create a classifier for five nearest neighbors. Standardize the noncategorical predictor data.

mdl = fitcknn(X,Y,'NumNeighbors',5,'Standardize',1);

Predict the classifications for flowers with minimum, mean, and maximum characteristics.

Xnew = [min(X);mean(X);max(X)]; [label,score,cost] = predict(mdl,Xnew)

`label = `*3x1 cell array*
{'versicolor'}
{'versicolor'}
{'virginica' }

`score = `*3×3*
0.4000 0.6000 0
0 1.0000 0
0 0 1.0000

`cost = `*3×3*
0.6000 0.4000 1.0000
1.0000 0 1.0000
1.0000 1.0000 0

The second and third rows of the score and cost matrices have binary values, which means all five nearest neighbors of the mean and maximum flower measurements have identical classifications.

`mdl`

— `ClassificationKNN`

object*k*-nearest neighbor classifier model, specified as a
`ClassificationKNN`

object.

`X`

— Predictor data to be classifiednumeric matrix | table

Predictor data to be classified, specified as a numeric matrix or table.

Each row of `X`

corresponds to one observation, and
each column corresponds to one variable.

For a numeric matrix:

The variables that make up the columns of

`X`

must have the same order as the predictor variables used to train`mdl`

.If you train

`mdl`

using a table (for example,`Tbl`

), then`X`

can be a numeric matrix if`Tbl`

contains all numeric predictor variables.*k*-nearest neighbor classification requires homogeneous predictors. Therefore, to treat all numeric predictors in`Tbl`

as categorical during training, set`'CategoricalPredictors','all'`

when you train using`fitcknn`

. If`Tbl`

contains heterogeneous predictors (for example, numeric and categorical data types) and`X`

is a numeric matrix, then`predict`

throws an error.

For a table:

`predict`

does not support multicolumn variables and cell arrays other than cell arrays of character vectors.If you train

`mdl`

using a table (for example,`Tbl`

), then all predictor variables in`X`

must have the same variable names and data types as those used to train`mdl`

(stored in`mdl.PredictorNames`

). However, the column order of`X`

does not need to correspond to the column order of`Tbl`

. Both`Tbl`

and`X`

can contain additional variables (response variables, observation weights, and so on), but`predict`

ignores them.If you train

`mdl`

using a numeric matrix, then the predictor names in`mdl.PredictorNames`

and corresponding predictor variable names in`X`

must be the same. To specify predictor names during training, see the`PredictorNames`

name-value pair argument of`fitcknn`

. All predictor variables in`X`

must be numeric vectors.`X`

can contain additional variables (response variables, observation weights, and so on), but`predict`

ignores them.

If you set `'Standardize',true`

in
`fitcknn`

to train `mdl`

, then the
software standardizes the columns of `X`

using the
corresponding means in `mdl.Mu`

and standard deviations in
`mdl.Sigma`

.

**Data Types: **`double`

| `single`

| `table`

`label`

— Predicted class labelscategorical array | character array | logical vector | vector of numeric values | cell array of character vectors

Predicted class labels for the observations (rows) in
`X`

, returned as a categorical array, character
array, logical vector, vector of numeric values, or cell array of character
vectors. `label`

has length equal to the number of rows
in `X`

. The label is the class with minimal expected
cost. See Predicted Class Label.

`score`

— Predicted class scores or posterior probabilitiesnumeric matrix

Predicted class scores or posterior probabilities, returned as a numeric
matrix of size *n*-by-*K*.
*n* is the number of observations (rows) in
`X`

, and *K* is the number of
classes (in `mdl.ClassNames`

).
`score(i,j)`

is the posterior probability that
observation `i`

in `X`

is of class
`j`

in `mdl.ClassNames`

. See Posterior Probability.

**Data Types: **`single`

| `double`

`cost`

— Expected classification costsnumeric matrix

Expected classification costs, returned as a numeric matrix of size
*n*-by-*K*. *n* is
the number of observations (rows) in `X`

, and
*K* is the number of classes (in
`mdl.ClassNames`

). `cost(i,j)`

is the
cost of classifying row `i`

of `X`

as
class `j`

in `mdl.ClassNames`

. See Expected Cost.

**Data Types: **`single`

| `double`

`predict`

classifies by minimizing the expected
classification cost:

$$\widehat{y}=\underset{y=1,\mathrm{...},K}{\mathrm{arg}\mathrm{min}}{\displaystyle \sum _{j=1}^{K}\widehat{P}\left(j|x\right)C\left(y|j\right)},$$

where

$$\widehat{y}$$ is the predicted classification.

*K*is the number of classes.$$\widehat{P}\left(j|x\right)$$ is the posterior probability of class

*j*for observation*x*.$$C\left(y|j\right)$$ is the cost of classifying an observation as

*y*when its true class is*j*.

Consider a vector (single query point) `xnew`

and a model
`mdl`

.

*k*is the number of nearest neighbors used in prediction,`mdl.NumNeighbors`

.`nbd(mdl,xnew)`

specifies the*k*nearest neighbors to`xnew`

in`mdl.X`

.`Y(nbd)`

specifies the classifications of the points in`nbd(mdl,xnew)`

, namely`mdl.Y(nbd)`

.`W(nbd)`

specifies the weights of the points in`nbd(mdl,xnew)`

.`prior`

specifies the priors of the classes in`mdl.Y`

.

If the model contains a vector of prior probabilities, then the observation weights
`W`

are normalized by class to sum to the priors.
This process might involve a calculation for the point `xnew`

,
because weights can depend on the distance from `xnew`

to the
points in `mdl.X`

.

The posterior probability *p*(*j*|`xnew`

)
is

$$p\left(j|x\text{new}\right)=\frac{{\displaystyle \sum _{i\in \text{nbd}}W(i){1}_{Y(X(i))=j}}}{{\displaystyle \sum _{i\in \text{nbd}}W(i)}}.$$

Here, $${1}_{Y(X(i))=j}$$ is `1`

when
`mdl.Y(i) = j`

, and
`0`

otherwise.

Two costs are associated with KNN classification: the true misclassification cost per class and the expected misclassification cost per observation.

You can set the true misclassification cost per class by using the `'Cost'`

name-value pair argument when you run `fitcknn`

. The value `Cost(i,j)`

is the cost of classifying
an observation into class `j`

if its true class is `i`

. By
default, `Cost(i,j) = 1`

if `i ~= j`

, and
`Cost(i,j) = 0`

if `i = j`

. In other words, the cost
is `0`

for correct classification and `1`

for incorrect
classification.

Two costs are associated with KNN classification: the true misclassification cost per class
and the expected misclassification cost per observation. The third output of `predict`

is the expected misclassification cost per
observation.

Suppose you have `Nobs`

observations that you want to classify with a trained
classifier `mdl`

, and you have `K`

classes. You place the
observations into a matrix `Xnew`

with one observation per row. The
command

[label,score,cost] = predict(mdl,Xnew)

returns a matrix `cost`

of size
`Nobs`

-by-`K`

, among other outputs. Each row of the
`cost`

matrix contains the expected (average) cost of classifying the
observation into each of the `K`

classes. `cost(n,j)`

is

$$\sum _{i=1}^{K}\widehat{P}\left(i|Xnew(n)\right)C\left(j|i\right)},$$

where

*K*is the number of classes.$$\widehat{P}\left(i|Xnew(n)\right)$$ is the posterior probability of class

*i*for observation*Xnew*(*n*).$$C\left(j|i\right)$$ is the true misclassification cost of classifying an observation as

*j*when its true class is*i*.

Calculate with arrays that have more rows than fit in memory.

This function fully supports tall arrays. For more information, see Tall Arrays (MATLAB).

Generate C and C++ code using MATLAB® Coder™.

Usage notes and limitations:

Use

`saveCompactModel`

,`loadCompactModel`

, and`codegen`

to generate code for the`predict`

function. Save a trained model by using`saveCompactModel`

. Define an entry-point function that loads the saved model by using`loadCompactModel`

and calls the`predict`

function. Then use`codegen`

to generate code for the entry-point function.This table contains notes about the arguments of

`predict`

. Arguments not included in this table are fully supported.Argument Notes and Limitations `mdl`

A

`ClassificationKNN`

model object is a full object that does not have a corresponding compact object. For this model,`saveCompactModel`

saves a compact version that does not include the hyperparameter optimization properties.If

`mdl`

is a model trained using the*k*d-tree search algorithm, and the code generation build type is a MEX function, then`codegen`

generates a MEX function using Intel^{®}Threading Building Blocks (TBB) for parallel computation. Otherwise,`codegen`

generates code using`parfor`

.MEX function for the

*k*d-tree search algorithm —`codegen`

generates an optimized MEX function using Intel TBB for parallel computation on multicore platforms. You can use the MEX function to accelerate MATLAB^{®}algorithms. For details on Intel TBB, see https://software.intel.com/en-us/intel-tbb.If you generate the MEX function to test the generated code of the

`parfor`

version, you can disable the usage of Intel TBB. Set the`ExtrinsicCalls`

property of the MEX configuration object to`false`

. For details, see`coder.MexCodeConfig`

.MEX function for the exhaustive search algorithm and standalone C/C++ code for both algorithms — The generated code of

`predict`

uses`parfor`

to create loops that run in parallel on supported shared-memory multicore platforms in the generated code. If your compiler does not support the Open Multiprocessing (OpenMP) application interface or you disable OpenMP library, MATLAB Coder™ treats the`parfor`

-loops as`for`

-loops. To find supported compilers, see Supported Compilers. To disable OpenMP library, set the`EnableOpenMP`

property of the configuration object to`false`

. For details, see`coder.CodeConfig`

.

For the usage notes and limitations of the model object, see Code Generation of the

`ClassificationKNN`

object.

`X`

Must be a single-precision or double-precision matrix and can be variable-size. However, the number of columns in

`X`

must be`numel(mdl.PredictorNames)`

.Rows and columns must correspond to observations and predictors, respectively.

For more information, see Introduction to Code Generation.

You clicked a link that corresponds to this MATLAB command:

Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.

Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .

Select web siteYou can also select a web site from the following list:

Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.

- América Latina (Español)
- Canada (English)
- United States (English)

- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)

- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)