knearest neighbor classification
ClassificationKNN
is a nearestneighbor classification model
in which you can alter both the distance metric and the number of nearest neighbors.
Because a ClassificationKNN
classifier stores training data, you can
use the model to compute resubstitution predictions. Alternatively, use the model to
classify new observations using the predict
method.
Create a ClassificationKNN
model using fitcknn
.
BreakTies
— Tiebreaking algorithm'smallest'
(default)  'nearest'
 'random'
Tiebreaking algorithm used by predict
when multiple classes
have the same smallest cost, specified as one of the following:
'smallest'
— Use the smallest index
among tied groups.
'nearest'
— Use the class with the
nearest neighbor among tied groups.
'random'
— Use a random tiebreaker
among tied groups.
By default, ties occur when multiple classes have the same number of
nearest points among the k nearest neighbors.
BreakTies
applies when
IncludeTies
is false
.
Change BreakTies
using dot notation:
mdl.BreakTies = newBreakTies
.
Distance
— Distance metric'cityblock'
 'chebychev'
 'correlation'
 'cosine'
 'euclidean'
 function handle  ...Distance metric, specified as a character vector or a function handle.
The values allowed depend on the NSMethod
property.
NSMethod  Distance Metric Allowed 

'exhaustive'  Any distance metric of ExhaustiveSearcher 
'kdtree'  'cityblock' ,
'chebychev' ,
'euclidean' , or
'minkowski' 
The following table lists the ExhaustiveSearcher
distance
metrics.
Value  Description 

'cityblock'  City block distance. 
'chebychev'  Chebychev distance (maximum coordinate difference). 
'correlation'  One minus the sample linear correlation between observations (treated as sequences of values). 
'cosine'  One minus the cosine of the included angle between observations (treated as vectors). 
'euclidean'  Euclidean distance. 
'hamming'  Hamming distance, the percentage of coordinates that differ. 
'jaccard'  One minus the Jaccard coefficient, the percentage of nonzero coordinates that differ. 
'mahalanobis'  Mahalanobis distance, computed using a positive
definite covariance matrix C . The
default value of C is the sample
covariance matrix of X , as computed
by nancov(X) . To specify a different
value for C , set the
DistParameter property of
mdl using dot notation. 
'minkowski'  Minkowski distance. The default exponent is
2 . To specify a different
exponent, set the DistParameter
property of mdl using dot
notation. 
'seuclidean'  Standardized Euclidean distance. Each coordinate
difference between X and a query
point is scaled, meaning divided by a scale value
S . The default value of
S is the standard deviation
computed from X ,
S = nanstd(X) . To
specify another value for S , set the
DistParameter property of
mdl using dot notation. 
'spearman'  One minus the sample Spearman's rank correlation between observations (treated as sequences of values). 
@ 
Distance function handle.
function D2 = distfun(ZI,ZJ) % calculation of distance ...

For more information, see Distance Metrics.
Change Distance
using dot notation:
mdl.Distance = newDistance
.
If NSMethod
is 'kdtree'
, you can
use dot notation to change Distance
only for the
metrics 'cityblock'
, 'chebychev'
,
'euclidean'
, and
'minkowski'
.
Data Types: char
 function_handle
DistanceWeight
— Distance weighting function'equal'
 'inverse'
 'squaredinverse'
 function handleDistance weighting function, specified as one of the values in this table.
Value  Description 

'equal'  No weighting 
'inverse'  Weight is 1/distance 
'squaredinverse'  Weight is 1/distance^{2} 
@  fcn is a function that
accepts a matrix of nonnegative distances and returns a
matrix of the same size containing nonnegative distance
weights. For example,
'squaredinverse' is equivalent to
@(d)d.^(–2) . 
Change DistanceWeight
using dot notation:
mdl.DistanceWeight = newDistanceWeight
.
Data Types: char
 function_handle
DistParameter
— Parameter for distance metricParameter for the distance metric, specified as one of the values described in this table.
Distance Metric  Parameter 

'mahalanobis'  Positive definite covariance matrix
C 
'minkowski'  Minkowski distance exponent, a positive scalar 
'seuclidean'  Vector of positive scale values with length equal to
the number of columns of X 
For any other distance metric, the value of
DistParameter
must be
[]
.
You can alter DistParameter
using dot notation:
mdl.DistParameter = newDistParameter
. However, if
Distance
is 'mahalanobis'
or
'seuclidean'
, then you cannot alter
DistParameter
.
Data Types: single
 double
IncludeTies
— Tie inclusion flagfalse
(default)  true
Tie inclusion flag indicating whether predict
includes all the
neighbors whose distance values are equal to the kth
smallest distance, specified as false
or
true
. If IncludeTies
is
true
, predict
includes all
of these neighbors. Otherwise, predict
uses exactly
k neighbors (see the
BreakTies
property).
Change IncludeTies
using dot notation:
mdl.IncludeTies = newIncludeTies
.
Data Types: logical
NSMethod
— Nearest neighbor search method'kdtree'
 'exhaustive'
This property is readonly.
Nearest neighbor search method, specified as either
'kdtree'
or
'exhaustive'
.
'kdtree'
— Creates and uses a
Kdtree to find nearest
neighbors.
'exhaustive'
— Uses the exhaustive
search algorithm. When predicting the class of a new point
xnew
, the software computes the distance
values from all points in X
to
xnew
to find nearest neighbors.
The default value is 'kdtree'
when
X
has 10
or fewer columns,
X
is not sparse, and the distance metric is a
'kdtree'
type. Otherwise, the default value is
'exhaustive'
.
NumNeighbors
— Number of nearest neighborsNumber of nearest neighbors in X
used to classify
each point during prediction, specified as a positive integer
value.
Change NumNeighbors
using dot notation:
mdl.NumNeighbors = newNumNeighbors
.
Data Types: single
 double
CategoricalPredictors
— Categorical predictor indices[]
 vector of positive integersThis property is readonly.
Categorical predictor
indices, specified as a vector of positive integers. CategoricalPredictors
contains index values corresponding to the columns of the predictor data that contain
categorical predictors. If none of the predictors are categorical, then this property is empty
([]
).
Data Types: double
ClassNames
— Names of classes in training data Y
This property is readonly.
Names of the classes in the training data Y
with
duplicates removed, specified as a categorical or character array,
logical or numeric vector, or cell array of character vectors.
ClassNames
has the same data type as
Y
. (The software treats string arrays as cell arrays of character
vectors.)
Data Types: categorical
 char
 logical
 single
 double
 cell
Cost
— Cost of misclassificationCost of the misclassification of a point, specified as a square
matrix. Cost(i,j)
is the cost of classifying a point
into class j
if its true class is
i
(that is, the rows correspond to the true class
and the columns correspond to the predicted class). The order of the
rows and columns in Cost
corresponds to the order
of the classes in ClassNames
. The number of rows
and columns in Cost
is the number of unique classes
in the response.
By default, Cost(i,j) = 1
if i ~=
j
, and Cost(i,j) = 0
if i =
j
. In other words, the cost is 0
for
correct classification and 1
for incorrect
classification.
Change a Cost
matrix using dot notation:
mdl.Cost = costMatrix
.
Data Types: single
 double
ExpandedPredictorNames
— Expanded predictor namesThis property is readonly.
Expanded predictor names, specified as a cell array of character vectors.
If the model uses encoding for categorical variables, then
ExpandedPredictorNames
includes the names that
describe the expanded variables. Otherwise,
ExpandedPredictorNames
is the same as
PredictorNames
.
Data Types: cell
ModelParameters
— Parameters used in training ClassificationKNN
This property is readonly.
Parameters used in training the ClassificationKNN
model, specified as a structure.
Data Types: struct
Mu
— Predictor meansThis property is readonly.
Predictor means, specified as a numeric vector of length
numel(PredictorNames)
.
If you do not standardize mdl
when training the
model using fitcknn
, then Mu
is empty ([]
).
Data Types: single
 double
NumObservations
— Number of observationsThis property is readonly.
Number of observations used in training the
ClassificationKNN
model, specified as a positive
integer scalar. This number can be less than the number of rows in the
training data because rows containing NaN
values are
not part of the fit.
Data Types: double
PredictorNames
— Predictor variable namesThis property is readonly.
Predictor variable names, specified as a cell array of character
vectors. The variable names are in the same order in which they appear
in the training data X
.
Data Types: cell
Prior
— Prior probabilities for each classPrior probabilities for each class, specified as a numeric vector. The
order of the elements in Prior
corresponds to the
order of the classes in ClassNames
.
Add or change a Prior
vector using dot notation:
mdl.Prior = priorVector
.
Data Types: single
 double
ResponseName
— Response variable nameThis property is readonly.
Response variable name, specified as a character vector.
Data Types: char
RowsUsed
— Rows used in fitting[]
 logical vectorThis property is readonly.
Rows of the original data X
used in fitting the
ClassificationKNN
model, specified as a logical vector. This property is
empty if all rows are used.
Data Types: logical
ScoreTransform
— Score transformation'none'
(default)  'doublelogit'
 'invlogit'
 'ismax'
 'logit'
 function handle  ...Score transformation, specified as either a character vector or a function handle.
This table summarizes the available character vectors.
Value  Description 

'doublelogit'  1/(1 + e^{–2x}) 
'invlogit'  log(x / (1 – x)) 
'ismax'  Sets the score for the class with the largest score to 1, and sets the scores for all other classes to 0 
'logit'  1/(1 + e^{–x}) 
'none' or 'identity'  x (no transformation) 
'sign'  –1 for x < 0 0 for x = 0 1 for x > 0 
'symmetric'  2x – 1 
'symmetricismax'  Sets the score for the class with the largest score to 1, and sets the scores for all other classes to –1 
'symmetriclogit'  2/(1 + e^{–x}) – 1 
For a MATLAB^{®} function or a function you define, use its function handle for score transform. The function handle must accept a matrix (the original scores) and return a matrix of the same size (the transformed scores).
Change ScoreTransform
using dot notation:
mdl.ScoreTransform = newScoreTransform
.
Data Types: char
 function_handle
Sigma
— Predictor standard deviationsThis property is readonly.
Predictor standard deviations, specified as a numeric vector of length
numel(PredictorNames)
.
If you do not standardize the predictor variables during training,
then Sigma
is empty ([]
).
Data Types: single
 double
W
— Observation weightsThis property is readonly.
Observation weights, specified as a vector of nonnegative values with
the same number of rows as Y
. Each entry in
W
specifies the relative importance of the
corresponding observation in Y
.
Data Types: single
 double
X
— Unstandardized predictor dataThis property is readonly.
Unstandardized predictor data, specified as a numeric matrix. Each
column of X
represents one predictor (variable),
and each row represents one observation.
Data Types: single
 double
Y
— Class labelsThis property is readonly.
Class labels, specified as a categorical or character array, logical
or numeric vector, or cell array of character vectors. Each value in
Y
is the observed class label for the
corresponding row in X
.
Y
has the same data type as the data in
Y
used for training the model. (The software treats string arrays as cell arrays of character
vectors.)
Data Types: single
 double
 logical
 char
 cell
 categorical
HyperparameterOptimizationResults
— Crossvalidation optimization of hyperparametersBayesianOptimization
object  tableThis property is readonly.
Crossvalidation optimization of hyperparameters, specified as a
BayesianOptimization
object
or a table of hyperparameters and associated values. This property is
nonempty when the 'OptimizeHyperparameters'
namevalue pair argument is nonempty when you create the model using
fitcknn
. The value depends on the setting of
the 'HyperparameterOptimizationOptions'
namevalue
pair argument when you create the model:
'bayesopt'
(default) — Object of
class BayesianOptimization
'gridsearch'
or
'randomsearch'
— Table of
hyperparameters used, observed objective function values
(crossvalidation loss), and rank of observations from lowest
(best) to highest (worst)
compareHoldout  Compare accuracies of two classification models using new data 
crossval  Crossvalidated knearest neighbor classifier 
edge  Edge of knearest neighbor classifier 
loss  Loss of knearest neighbor classifier 
margin  Margin of knearest neighbor classifier 
predict  Predict labels using knearest neighbor classification model 
resubEdge  Edge of knearest neighbor classifier by resubstitution 
resubLoss  Loss of knearest neighbor classifier by resubstitution 
resubMargin  Margin of knearest neighbor classifier by resubstitution 
resubPredict  Predict resubstitution labels of knearest neighbor classifier 
Train a knearest neighbor classifier for Fisher's iris data, where k, the number of nearest neighbors in the predictors, is 5.
Load Fisher's iris data.
load fisheriris
X = meas;
Y = species;
X
is a numeric matrix that contains four petal measurements for 150 irises. Y
is a cell array of character vectors that contains the corresponding iris species.
Train a 5nearest neighbor classifier. Standardize the noncategorical predictor data.
Mdl = fitcknn(X,Y,'NumNeighbors',5,'Standardize',1)
Mdl = ClassificationKNN ResponseName: 'Y' CategoricalPredictors: [] ClassNames: {'setosa' 'versicolor' 'virginica'} ScoreTransform: 'none' NumObservations: 150 Distance: 'euclidean' NumNeighbors: 5 Properties, Methods
Mdl
is a trained ClassificationKNN
classifier, and some of its properties appear in the Command Window.
To access the properties of Mdl
, use dot notation.
Mdl.ClassNames
ans = 3x1 cell
{'setosa' }
{'versicolor'}
{'virginica' }
Mdl.Prior
ans = 1×3
0.3333 0.3333 0.3333
Mdl.Prior
contains the class prior probabilities, which you can specify using the 'Prior'
namevalue pair argument in fitcknn
. The order of the class prior probabilities corresponds to the order of the classes in Mdl.ClassNames
. By default, the prior probabilities are the respective relative frequencies of the classes in the data.
You can also reset the prior probabilities after training. For example, set the prior probabilities to 0.5, 0.2, and 0.3, respectively.
Mdl.Prior = [0.5 0.2 0.3];
You can pass Mdl
to predict
to label new measurements or crossval
to crossvalidate the classifier.
The compact
function reduces the size of most
classification models by removing the training data properties and any other
properties that are not required to predict the labels of new observations.
Because knearest neighbor classification models require all
of the training data to predict labels, you cannot reduce the size of a
ClassificationKNN
model.
knnsearch
finds the
knearest neighbors of points. rangesearch
finds all the points within a fixed distance. You can use
these functions for classification, as shown in Classify Query Data. If you want to perform
classification, then using ClassificationKNN
models can be more
convenient because you can train a classifier in one step (using fitcknn
) and classify in other steps (using predict
). Alternatively, you can train a knearest
neighbor classification model using one of the crossvalidation options in the call to
fitcknn
. In this case, fitcknn
returns a
ClassificationPartitionedModel
crossvalidated model object.
Usage notes and limitations:
The predict
function supports
code generation.
When you train a knearest neighbor classification
model by using fitcknn
, the following
restrictions apply.
The class labels input argument value (Y
)
cannot be a categorical array.
Code generation does
not support categorical predictors (logical
, categorical
,
char
, string
, or cell
). If you
supply training data in a table, the predictors must be numeric (double
or
single
). Also, you cannot use the
'CategoricalPredictors'
namevalue pair argument.
To include categorical predictors in a model, preprocess the
categorical predictors by using dummyvar
before fitting the model.
The value of the 'ClassNames'
namevalue pair argument cannot
be a categorical array.
The value of the 'Distance'
namevalue pair argument cannot
be a custom distance function.
The value of the 'DistanceWeight'
namevalue pair argument
can be a custom distance weight function, but it cannot be
an anonymous function.
The value of the 'ScoreTransform'
namevalue pair argument
cannot be an anonymous function.
For more information, see Introduction to Code Generation.
A modified version of this example exists on your system. Do you want to open this version instead?
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
Select web siteYou can also select a web site from the following list:
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.