Fit knearest neighbor classifier
fits
a model with additional options specified by one or more namevalue
pair arguments. For example, you can specify the tiebreaking algorithm,
distance metric, or observation weights.mdl
= fitcknn(X
,y
,Name,Value
)
Construct a knearest neighbor classifier for Fisher's iris data, where k, the number of nearest neighbors in the predictors, is 5.
Load Fisher's iris data.
load fisheriris
X = meas;
Y = species;
X
is a numeric matrix that contains four petal measurements for 150 irises. Y
is a cell array of strings that contains the corresponding iris species.
Train a 5nearest neighbors classifier. It is good practice to standardize noncategorical predictor data.
Mdl = fitcknn(X,Y,'NumNeighbors',5,'Standardize',1)
Mdl = ClassificationKNN PredictorNames: {'x1' 'x2' 'x3' 'x4'} ResponseName: 'Y' ClassNames: {'setosa' 'versicolor' 'virginica'} ScoreTransform: 'none' NumObservations: 150 Distance: 'euclidean' NumNeighbors: 5
Mdl
is a trained ClassificationKNN
classifier, and some of its properties display in the Command Window.
To access the properties of Mdl
, use dot notation.
Mdl.ClassNames Mdl.Prior
ans = 'setosa' 'versicolor' 'virginica' ans = 0.3333 0.3333 0.3333
Mdl.Prior
contains the class prior probabilities, which are settable using the namevalue pair argument 'Prior'
in fitcknn
. The order of the class prior probabilities corresponds to the order of the classes in Mdl.ClassNames
. By default, the prior probabilities are the respective relative frequencies of the classes in the data.
You can also reset the prior probabilities after training. For example, set the prior probabilities to 0.5, 0.2, and 0.3 respectively.
Mdl.Prior = [0.5 0.2 0.3];
You can pass Mdl
to, for example, ClassificationKNN.predict
to label new measurements, or ClassificationKNN.crossval
to cross validate the classifier.
Load Fisher's iris data set.
load fisheriris
X = meas;
Y = species;
X
is a numeric matrix that contains four petal measurements for 150 irises. Y
is a cell array of strings that contains the corresponding iris species.
Train a 3nearest neighbors classifier using the Minkowski metric. To use the Minkowski metric, you must use an exhaustive searcher. It is good practice to standardize noncategorical predictor data.
Mdl = fitcknn(X,Y,'NumNeighbors',3,... 'NSMethod','exhaustive','Distance','minkowski',... 'Standardize',1);
Mdl
is a ClassificationKNN
classifier.
You can examine the properties of Mdl
by doubleclicking Mdl
in the Workspace window. This opens the Variable Editor.
Train a knearest neighbor classifier using the chisquare distance.
Load Fisher's iris data set.
load fisheriris X = meas; % Predictors Y = species; % Response
The chisquare distance between jdimensional points x and z is
where is a weight associated with dimension j.
Specify the chisquare distance function. The distance function must:
Take one row of X
, e.g., x
, and the matrix Z
.
Compare x
to each row of Z
.
Return a vector D
of length
, where
is the number of rows of Z
. Each element of D
is the distance between the observation corresponding to x
and the observations corresponding to each row of Z
.
chiSqrDist = @(x,Z,wt)sqrt((bsxfun(@minus,x,Z).^2)*wt);
This example uses arbitrtary weights for illustration.
Train a 3nearest neighbor classifier. It is good practoce to standardize noncategorical predictor data.
k = 3; w = [0.3; 0.3; 0.2; 0.2]; KNNMdl = fitcknn(X,Y,'Distance',@(x,Z)chiSqrDist(x,Z,w),... 'NumNeighbors',k,'Standardize',1);
KNNMdl
is a ClassificationKNN
classifier.
Cross validate the KNN classifier using the default 10fold cross validation. Examine the classification error.
rng(1); % For reproducibility
CVKNNMdl = crossval(KNNMdl);
classError = kfoldLoss(CVKNNMdl)
classError = 0.0600
CVKNNMdl
is a ClassificationPartitionedModel
classifier. The 10fold classification error is 4%.
Compare the classifier with one that uses a different weighting scheme.
w2 = [0.2; 0.2; 0.3; 0.3]; CVKNNMdl2 = fitcknn(X,Y,'Distance',@(x,Z)chiSqrDist(x,Z,w2),... 'NumNeighbors',k,'KFold',10,'Standardize',1); classError2 = kfoldLoss(CVKNNMdl2)
classError2 = 0.0400
The second weighting scheme yields a classifier that has better outofsample performance.
X
— Predictor valuesnumeric matrixPredictor values, specified as a numeric matrix. Each column
of X
represents one variable, and each row represents
one observation.
Data Types: single
 double
y
— Classification valuesnumeric vector  categorical vector  logical vector  character array  cell array of stringsClassification values, specified as a numeric vector, categorical
vector, logical vector, character array, or cell array of strings,
with the same number of rows as X
. Each row of y
represents
the classification of the corresponding row of X
.
Data Types: single
 double
 cell
 logical
 char
Specify optional commaseparated pairs of Name,Value
arguments.
Name
is the argument
name and Value
is the corresponding
value. Name
must appear
inside single quotes (' '
).
You can specify several name and value pair
arguments in any order as Name1,Value1,...,NameN,ValueN
.
'NumNeighbors',3,'NSMethod','exhaustive','Distance','minkowski'
specifies
a classifier for threenearest neighbors using the nearest neighbor
search method and the Minkowski metric.'BreakTies'
— Tiebreaking algorithm'smallest'
(default)  'nearest'
 'random'
Tiebreaking algorithm used by the predict
method
if multiple classes have the same smallest cost, specified as the
commaseparated pair consisting of 'BreakTies'
and
one of the following:
'smallest'
— Use the smallest
index among tied groups.
'nearest'
— Use the class
with the nearest neighbor among tied groups.
'random'
— Use a random
tiebreaker among tied groups.
By default, ties occur when multiple classes have the same number
of nearest points among the K
nearest neighbors.
Example: 'BreakTies','nearest'
'BucketSize'
— Maximum data points in node50
(default)  positive integer valueMaximum number of data points in the leaf node of the kdtree,
specified as the commaseparated pair consisting of 'BucketSize'
and
a positive integer value. This argument is meaningful only when NSMethod
is 'kdtree'
.
Example: 'BucketSize',40
Data Types: single
 double
'CategoricalPredictors'
— Categorical predictor flag[]
(default)  'all'
Categorical predictor flag, specified as the commaseparated
pair consisting of 'CategoricalPredictors'
and
one of the following:
'all'
— All predictors are
categorical.
[]
— No predictors are categorical.
When you set CategoricalPredictors
to 'all'
,
the default Distance
is 'hamming'
.
Example: 'CategoricalPredictors','all'
'ClassNames'
— Class namesnumeric vector  categorical vector  logical vector  character array  cell array of stringsClass names, specified as the commaseparated pair consisting
of 'ClassNames'
and an array representing the class
names. Use the same data type as the values that exist in y
.
Use ClassNames
to order the classes or to
select a subset of classes for training. The default is the class
names in y
.
Data Types: single
 double
 char
 logical
 cell
'Cost'
— Cost of misclassificationsquare matrix  structureCost of misclassification of a point, specified as the commaseparated
pair consisting of 'Cost'
and one of the following:
Square matrix, where Cost(i,j)
is
the cost of classifying a point into class j
if
its true class is i
(i.e., the rows correspond
to the true class and the columns correspond to the predicted class).
To specify the class order for the corresponding rows and columns
of Cost
, additionally specify the ClassNames
namevalue
pair argument.
Structure S
having two fields: S.ClassNames
containing
the group names as a variable of the same type as y
,
and S.ClassificationCosts
containing the cost matrix.
The default is Cost(i,j)=1
if i~=j
,
and Cost(i,j)=0
if i=j
.
Data Types: single
 double
 struct
'Cov'
— Covariance matrixnancov(X)
(default)  positive definite matrix of scalar valuesCovariance matrix, specified as the commaseparated pair consisting
of 'Cov'
and a positive definite matrix of scalar
values representing the covariance matrix when computing the Mahalanobis
distance. This argument is only valid when 'Distance'
is 'mahalanobis'
.
You cannot simultaneously specify 'Standardize'
and
either of 'Scale'
or 'Cov'
.
Data Types: single
 double
'CrossVal'
— Crossvalidation flag'off'
(default)  'on'
Crossvalidation flag, specified as the commaseparated pair
consisting of 'CrossVal'
and either 'on'
or 'off'
.
If 'on'
, fitcknn
creates a
crossvalidated model with 10 folds. Use the 'KFold'
, 'Holdout'
, 'Leaveout'
,
or 'CVPartition'
parameters to override this crossvalidation
setting. You can only use one parameter at a time to create a crossvalidated
model.
Alternatively, cross validate mdl
later
using the crossval
method.
Example: 'Crossval','on'
'CVPartition'
— Crossvalidated model partitioncvpartition
objectCrossvalidated model partition, specified as the commaseparated
pair consisting of 'CVPartition'
and an object
created using cvpartition
. You
can only use one of these four options at a time to create a crossvalidated
model: 'KFold'
, 'Holdout'
, 'Leaveout'
,
or 'CVPartition'
.
'Distance'
— Distance metricvalid distance metric string  function handleDistance metric, specified as the commaseparated pair consisting
of 'Distance'
and a valid distance metric string
or function handle. The allowable strings depend on the NSMethod
parameter,
which you set in fitcknn
, and which exists as
a field in ModelParameters
. If you specify CategoricalPredictors
as 'all'
,
then the default distance metric is 'hamming'
.
Otherwise, the default distance metric is 'euclidean'
.
NSMethod  Distance Metric Names 

exhaustive  Any distance metric of ExhaustiveSearcher 
kdtree  'cityblock' , 'chebychev' , 'euclidean' ,
or 'minkowski' 
For definitions, see Distance Metrics.
This table includes valid distance metrics of ExhaustiveSearcher
.
Value  Description 

'cityblock'  City block distance. 
'chebychev'  Chebychev distance (maximum coordinate difference). 
'correlation'  One minus the sample linear correlation between observations (treated as sequences of values). 
'cosine'  One minus the cosine of the included angle between observations (treated as vectors). 
'euclidean'  Euclidean distance. 
'hamming'  Hamming distance, percentage of coordinates that differ. 
'jaccard'  One minus the Jaccard coefficient, the percentage of nonzero coordinates that differ. 
'mahalanobis'  Mahalanobis distance, computed using a positive definite covariance
matrix C . The default value of C is
the sample covariance matrix of X , as computed
by nancov(X) . To specify a different value for C ,
use the 'Cov' namevalue pair argument. 
'minkowski'  Minkowski distance. The default exponent is 2 .
To specify a different exponent, use the 'Exponent' namevalue
pair argument. 
'seuclidean'  Standardized Euclidean distance. Each coordinate difference
between X and a query point is scaled, meaning
divided by a scale value S . The default value of S is
the standard deviation computed from X , S = nanstd(X) . To specify another
value for S , use the Scale namevalue
pair argument. 
'spearman'  One minus the sample Spearman's rank correlation between observations (treated as sequences of values). 
@  Distance function handle. distfun has
the formfunction D2 = DISTFUN(ZI,ZJ) % calculation of distance ...

Example: 'Distance','minkowski'
Data Types: function_handle
'DistanceWeight'
— Distance weighting function'equal'
(default)  'inverse'
 'squaredinverse'
 function handleDistance weighting function, specified as the commaseparated
pair consisting of 'DistanceWeight'
and either
a function handle or one of the following strings specifying the distance
weighting function.
DistanceWeight  Meaning 

'equal'  No weighting 
'inverse'  Weight is 1/distance 
'squaredinverse'  Weight is 1/distance^{2} 
@  fcn is a function that accepts a
matrix of nonnegative distances, and returns a matrix the same size
containing nonnegative distance weights. For example, 'squaredinverse' is
equivalent to @(d)d.^(2) . 
Example: 'DistanceWeight','inverse'
Data Types: function_handle
'Exponent'
— Minkowski distance exponent2
(default)  positive scalar valueMinkowski distance exponent, specified as the commaseparated
pair consisting of 'Exponent'
and a positive scalar
value. This argument is only valid when 'Distance'
is 'minkowski'
.
Example: 'Exponent',3
Data Types: single
 double
'Holdout'
— Fraction of data for holdout validation0
(default)  scalar value in the range [0,1]
Fraction of data used for holdout validation, specified as the
commaseparated pair consisting of 'Holdout'
and
a scalar value in the range [0,1]
. Holdout validation
tests the specified fraction of the data, and uses the remaining data
for training.
If you use Holdout
, you cannot use any of
the 'CVPartition'
, 'KFold'
,
or 'Leaveout'
namevalue pair arguments.
Example: 'Holdout',0.1
Data Types: single
 double
'IncludeTies'
— Tie inclusion flagfalse
(default)  true
Tie inclusion flag, specified as the commaseparated pair consisting
of 'IncludeTies'
and a logical value indicating
whether predict
includes all the neighbors whose
distance values are equal to the K
th smallest distance.
If IncludeTies
is true
, predict
includes
all these neighbors. Otherwise, predict
uses exactly K
neighbors.
Example: 'IncludeTies',true
Data Types: logical
'KFold'
— Number of folds10
(default)  positive integer valueNumber of folds to use in a crossvalidated model, specified
as the commaseparated pair consisting of 'KFold'
and
a positive integer value.
If you use 'KFold'
, you cannot use any of
the 'CVPartition'
, 'Holdout'
,
or 'Leaveout'
namevalue pair arguments.
Example: 'KFold',8
Data Types: single
 double
'Leaveout'
— Leaveoneout crossvalidation flag'off'
(default)  'on'
Leaveoneout crossvalidation flag, specified as the commaseparated
pair consisting of 'Leaveout'
and either 'on'
or 'off'
.
Specify 'on'
to use leaveoneout cross validation.
If you use 'Leaveout'
, you cannot use any
of the 'CVPartition'
, 'Holdout'
,
or 'KFold'
namevalue pair arguments.
Example: 'Leaveout','on'
'NSMethod'
— Nearest neighbor search method'kdtree'
 'exhaustive'
Nearest neighbor search method, specified as the commaseparated
pair consisting of 'NSMethod'
and 'kdtree'
or 'exhaustive'
.
'kdtree'
— Create and use
a kdtree to find nearest neighbors. 'kdtree'
is
valid when the distance metric is one of the following:
'euclidean'
'cityblock'
'minkowski'
'chebyshev'
'exhaustive'
— Use the exhaustive
search algorithm. The distance values from all points in X
to
each point in y
are computed to find nearest
neighbors.
The default is 'kdtree'
when X
has 10
or
fewer columns, X
is not sparse, and the distance
metric is a 'kdtree'
type; otherwise, 'exhaustive'
.
Example: 'NSMethod','exhaustive'
'NumNeighbors'
— Number of nearest neighbors to find1
(default)  positive integer valueNumber of nearest neighbors in X
to find
for classifying each point when predicting, specified as the commaseparated
pair consisting of 'NumNeighbors'
and a positive
integer value.
Example: 'NumNeighbors',3
Data Types: single
 double
'PredictorNames'
— Predictor variable names{'x1','x2',...}
(default)  cell array of stringsPredictor variable names, specified as the commaseparated pair
consisting of 'PredictorNames'
and a cell array
of strings containing the names for the predictor variables, in the
order in which they appear in X
.
Data Types: cell
'Prior'
— Prior probabilities'empirical'
(default)  'uniform'
 vector of scalar values  structurePrior probabilities for each class, specified as the commaseparated
pair consisting of 'Prior'
and one of the following.
A string:
'empirical'
determines class probabilities
from class frequencies in y
. If you pass observation
weights, they are used to compute the class probabilities.
'uniform'
sets all class probabilities
equal.
A vector (one scalar value for each class). To specify
the class order for the corresponding elements of Prior
,
additionally specify the ClassNames
namevalue
pair argument.
A structure S
with two fields:
S.ClassNames
containing the class
names as a variable of the same type as y
S.ClassProbs
containing a vector
of corresponding probabilities
If you set values for both Weights
and Prior
,
the weights are renormalized to add up to the value of the prior probability
in the respective class.
Example: 'Prior','uniform'
Data Types: single
 double
 struct
'ResponseName'
— Response variable name'Y'
(default)  stringResponse variable name, specified as the commaseparated pair
consisting of 'ResponseName'
and a string containing
the name of the response variable y
.
Example: 'ResponseName','Response'
Data Types: char
'Scale'
— Distance scalenanstd(X)
(default)  vector of nonnegative scalar valuesDistance scale, specified as the commaseparated pair consisting
of 'Scale'
and a vector containing nonnegative
scalar values with length equal to the number of columns in X
.
Each coordinate difference between X
and a query
point is scaled by the corresponding element of Scale
.
This argument is only valid when 'Distance'
is 'seuclidean'
.
You cannot simultaneously specify 'Standardize'
and
either of 'Scale'
or 'Cov'
.
Data Types: single
 double
'ScoreTransform'
— Score transform function'none'
(default)  'doublelogit'
 'invlogit'
 'ismax'
 'logit'
 'sign'
 'symmetric'
 'symmetriclogit'
 'symmetricismax'
 function handleScore transform function, specified as the commaseparated pair
consisting of 'ScoreTransform'
and a string or
function handle.
If the value is a string, then it must correspond to a builtin function. This table summarizes the available, builtin functions.
String  Formula 

'doublelogit'  1/(1 + e^{–2x}) 
'invlogit'  log(x / (1–x)) 
'ismax'  Set the score for the class with the largest score to 1 ,
and scores for all other classes to 0 . 
'logit'  1/(1 + e^{–x}) 
'none'  x (no transformation) 
'sign'  –1 for x < 0 0 for x = 0 1 for x > 0 
'symmetric'  2x – 1 
'symmetriclogit'  2/(1 + e^{–x}) – 1 
'symmetricismax'  Set the score for the class with the largest score to 1 ,
and scores for all other classes to 1 . 
For a MATLAB^{®} function, or a function that you define, enter its function handle.
Mdl.ScoreTransform = @function;
function
should accept a matrix (the original
scores) and return a matrix of the same size (the transformed scores).
Example: 'ScoreTransform','sign'
Data Types: char
 function_handle
'Standardize'
— Flag to standardize predictorsfalse
(default)  true
Flag to standardize the predictors, specified as the commaseparated
pair consisting of 'Standardize'
and true
(1
)
or false
(0)
.
If you set 'Standardize',true
, then the software
centers and scales each column of the predictor data (X
)
by the column mean and standard deviation, respectively.
The software does not standardize categorical predictors, and throws an error if all predictors are categorical.
You cannot simultaneously specify 'Standardize',1
and
either of 'Scale'
or 'Cov'
.
It is good practice to standardize the predictor data.
Example: 'Standardize',true
Data Types: logical
'Weights'
— Observation weightsones(size(X,1),1)
(default)  vector of scalar valuesObservation weights, specified as the commaseparated pair consisting
of 'Weights'
and a vector of scalar values. The
length of Weights
is the number of rows in X
.
The software normalizes the weights in each class to add up to the value of the prior probability of the class.
Data Types: single
 double
mdl
— Classifier modelclassifier model objectknearest neighbor classifier model, returned as a classifier model object.
Note that using the 'CrossVal'
, 'KFold'
, 'Holdout'
, 'Leaveout'
,
or 'CVPartition'
options results in a model of
class ClassificationPartitionedModel
.
You cannot use a partitioned tree for prediction, so this kind of
tree does not have a predict
method.
Otherwise, mdl
is of class ClassificationKNN
,
and you can use the predict
method to make predictions.
Although fitcknn
can train a multiclass KNN
classifier, you can reduce a multiclass learning problem to a series
of KNN binary learners using fitcecoc
.
ClassificationKNN
predicts the classification
of a point Xnew
using a procedure equivalent to
this:
Find the NumNeighbors
points in
the training set X
that are nearest to Xnew
.
Find the NumNeighbors
response
values Y
to those nearest points.
Assign the classification label Ynew
that
has the largest posterior probability among the values in Y
.
For details, see Posterior Probability in the predict
documentation.
NaNs
or <undefined>
s
indicate missing observations. The following describes the behavior
of fitcknn
when the data set or weights contain
missing observations.
If any value of y
or any weight
is missing, then fitcknn
removes those values from y
,
the weights, and the corresponding rows of X
from
the data. The software renormalizes the weights to sum to 1
.
If you specify to standardize predictors ('Standardize',1
)
or the standardized Euclidean distance ('Distance','seuclidean'
)
without a scale, then fitcknn
removes missing observations
from individual predictors before computing the mean and standard
deviation. In other words, the software implements nanmean
and nanstd
on
each predictor.
If you specify the Mahalanobis distance ('Distance','mahalanbois'
)
without its covariance matrix, then fitcknn
removes
rows of X
that contain at least one missing value.
In other words, the software implements nancov
on
the predictor matrix X
.
Suppose that you set 'Standardize',1
.
If you also specify Prior
or Weights
,
then the software takes the observation weights into account. Specifically,
the weighted mean of predictor j is
$${\overline{x}}_{j}={\displaystyle \sum}_{{B}_{j}}^{}{w}_{k}{x}_{jk}$$
and the weighted standard deviation is
$${s}_{j}={\displaystyle \sum _{Bj}^{}{w}_{k}}({x}_{jk}{\overline{x}}_{j}),$$
where B_{j} is the set of indices k for which x_{jk} and w_{k} are not missing.
If you also set 'Distance','mahalanobis'
or 'Distance','seuclidean'
,
then you cannot specify Scale
or Cov
.
Instead, the software:
Computes the means and standard deviations of each predictor
Standardizes the data using the results of step 1
Computes the distance parameter values using their respective default.
If you specify Scale
and either
of Prior
or Weights
, then the
software scales observed distances by the weighted standard deviations.
If you specify Cov
and either
of Prior
or Weights
, then the
software applies the weighted covariance matrix to the distances.
In other words,
$$Cov=\frac{{\displaystyle \sum _{B}{w}_{j}}}{{\left({\displaystyle \sum _{B}{w}_{j}}\right)}^{2}{\displaystyle \sum _{B}{w}_{j}^{2}}}{\displaystyle \sum}_{B}^{}{w}_{j}{\left({x}_{j}\overline{x}\right)}^{\prime}\left({x}_{j}\overline{x}\right),$$
where B is the set of indices j for which the observation x_{j} does not have any missing values and w_{j} is not missing.
ClassificationKNN
 ClassificationPartitionedModel
 fitcecoc
 fitensemble
 predict
 templateKNN