templateKNN
k-nearest neighbor classifier template
Description
returns
a k-nearest neighbor (KNN) learner template suitable
for training ensembles or error-correcting output code (ECOC) multiclass
models.t
= templateKNN()
If you specify a default template, then the software uses default values for all input arguments during training.
Specify t
as a learner in fitcensemble
or fitcecoc
.
creates a template with additional options specified by one or more name-value
arguments.t
= templateKNN(Name=Value
)
For example, you can specify the nearest neighbor search method, the number of nearest neighbors to find, or the distance metric.
If you display t
in the Command Window, then all options appear empty
([]
), except those that you specify using name-value
arguments. During training, the software uses default values for empty
options.
Examples
Create a nondefault k-nearest neighbor template for use in fitcensemble
.
Load Fisher's iris data set.
load fisheriris
Create a template for a 5-nearest neighbor search, and specify to standardize the predictors.
t = templateKNN(NumNeighbors=5,Standardize=true)
t = Fit template for classification KNN. NumNeighbors: 5 NSMethod: '' Distance: '' BucketSize: [] IncludeTies: [] DistanceWeight: [] BreakTies: '' Exponent: [] Cov: [] Scale: [] StandardizeData: 1 CacheSize: 1000 Version: 1 Method: 'KNN' Type: 'classification'
All properties of the template object are empty except for NumNeighbors
, Method
, StandardizeData
, and Type
. When you specify t
as a learner, the software fills in the empty properties with their respective default values.
Specify t
as a weak learner for a classification ensemble.
Mdl = fitcensemble(meas,species, ... Method="Subspace",Learners=t);
Display the in-sample (resubstitution) misclassification error.
L = resubLoss(Mdl)
L = 0.0600
Create a nondefault k-nearest neighbor template for use in fitcecoc
.
Load Fisher's iris data set.
load fisheriris
Create a template for a 5-nearest neighbor search, and specify to standardize the predictors.
t = templateKNN(NumNeighbors=5,Standardize=true)
t = Fit template for classification KNN. NumNeighbors: 5 NSMethod: '' Distance: '' BucketSize: [] IncludeTies: [] DistanceWeight: [] BreakTies: '' Exponent: [] Cov: [] Scale: [] StandardizeData: 1 CacheSize: 1000 Version: 1 Method: 'KNN' Type: 'classification'
All properties of the template object are empty except for NumNeighbors
, Method
, StandardizeData
, and Type
. When you specify t
as a learner, the software fills in the empty properties with their respective default values.
Specify t
as a binary learner for an ECOC multiclass model.
Mdl = fitcecoc(meas,species,Learners=t);
By default, the software trains Mdl
using the one-versus-one coding design.
Display the in-sample (resubstitution) misclassification error.
L = resubLoss(Mdl,LossFun="classiferror")
L = 0.0467
Name-Value Arguments
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN
, where Name
is
the argument name and Value
is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Example: templateKNN(NumNeighbors=4,Distance="minkowski")
specifies a
4-nearest neighbor classifier template using the Minkowski distance
measure.
Tie-breaking algorithm used by the predict
method if multiple classes have the
same smallest cost, specified as one of the following:
"smallest"
— Use the smallest index among tied groups."nearest"
— Use the class with the nearest neighbor among tied groups."random"
— Use a random tiebreaker among tied groups.
By default, ties occur when multiple classes have the same number of nearest points among the k nearest neighbors.
Example: BreakTies="nearest"
Maximum number of data points in the leaf node of the Kd-tree, specified
as a positive integer value. This argument is meaningful only when
NSMethod
is "kdtree"
.
Example: BucketSize=40
Data Types: single
| double
Covariance matrix, specified as a positive definite matrix of scalar values representing the
covariance matrix when computing the Mahalanobis distance. This argument is only valid
when Distance
is "mahalanobis"
.
You cannot simultaneously specify Standardize
and either of
Scale
or Cov
.
Data Types: single
| double
Distance metric, specified as a valid distance metric name or function
handle. The allowable distance metric names depend on your choice of a
neighbor-searcher method (see NSMethod
).
NSMethod
Value | Distance Metric Names |
---|---|
"exhaustive" | Any distance metric of ExhaustiveSearcher |
"kdtree" | "cityblock" ,
"chebychev" ,
"euclidean" , or
"minkowski" |
This table includes valid distance metrics of ExhaustiveSearcher
.
Distance Metric Names | Description |
---|---|
"cityblock" | City block distance. |
"chebychev" | Chebychev distance (maximum coordinate difference). |
"correlation" | One minus the sample linear correlation between observations (treated as sequences of values). |
"cosine" | One minus the cosine of the included angle between observations (treated as vectors). |
"euclidean" | Euclidean distance. |
"hamming" | Hamming distance, percentage of coordinates that differ. |
"jaccard" | One minus the Jaccard coefficient, the percentage of nonzero coordinates that differ. |
"mahalanobis" | Mahalanobis distance, computed using a positive
definite covariance matrix C . The
default value of C is the sample
covariance matrix of X , as computed
by cov(X,"omitrows") . To specify a
different value for C , use the
Cov name-value
argument. |
"minkowski" | Minkowski distance. The default exponent is
2 . To specify a different
exponent, use the Exponent
name-value argument. |
"seuclidean" | Standardized Euclidean distance. Each coordinate
difference between X and a query
point is scaled, meaning divided by a scale value
S . The default value of
S is the standard deviation
computed from X ,
S = std(X,"omitnan") .
To specify another value for S , use
the Scale name-value
argument. |
"spearman" | One minus the sample Spearman's rank correlation between observations (treated as sequences of values). |
@ | Distance function handle.
function D2 = distfun(ZI,ZJ) % calculation of distance ...
|
When you call fitcensemble
or fitcecoc
, if you specify
Learners
as a templateKNN
object
and CategoricalPredictors
as
"all"
, then the default distance metric is
"hamming"
. Otherwise, the default distance metric
is "euclidean"
.
Change Distance
using dot notation:
mdl.Distance = newDistance
.
If NSMethod
is "kdtree"
, you
can use dot notation to change Distance
only for
the metrics "cityblock"
,
"chebychev"
, "euclidean"
, and
"minkowski"
.
For definitions, see Distance Metrics.
Example: Distance="minkowski"
Data Types: char
| string
| function_handle
Distance weighting function, specified as a function handle or one of the values in this table.
Value | Description |
---|---|
"equal" | No weighting |
"inverse" | Weight is 1/distance |
"squaredinverse" | Weight is 1/distance2 |
@ | fcn is a function that accepts a matrix of nonnegative distances,
and returns a matrix the same size containing nonnegative distance
weights. For example, "squaredinverse" is equivalent
to @(d)d.^(-2) . |
Example: DistanceWeight="inverse"
Data Types: char
| string
| function_handle
Minkowski distance exponent, specified as a positive scalar value. This argument is only valid
when Distance
is "minkowski"
.
Example: Exponent=3
Data Types: single
| double
Tie inclusion flag, specified as a logical value indicating whether predict
includes all the neighbors whose distance values are equal to the
kth smallest distance. If IncludeTies
is
true
, predict
includes all these neighbors.
Otherwise, predict
uses exactly k
neighbors.
Example: IncludeTies=true
Data Types: logical
Nearest neighbor search method, specified as "kdtree"
or
"exhaustive"
.
"kdtree"
— Creates and uses a Kd-tree to find nearest neighbors."kdtree"
is valid when the distance metric is one of the following:"euclidean"
"cityblock"
"minkowski"
"chebychev"
"exhaustive"
— Uses the exhaustive search algorithm. When predicting the class of a new pointxnew
, the software computes the distance values from all points inX
toxnew
to find nearest neighbors.
The default is "kdtree"
when X
has
10
or fewer columns, X
is not sparse or a
gpuArray
, and the distance metric is a "kdtree"
type; otherwise, "exhaustive"
.
Example: NSMethod="exhaustive"
Number of nearest neighbors in X
to find for classifying each point when
predicting, specified as a positive integer value.
Example: NumNeighbors=3
Data Types: single
| double
Distance scale, specified as a vector containing nonnegative scalar values with length equal
to the number of columns in X
. Each coordinate difference between
X
and a query point is scaled by the corresponding element of
Scale
. This argument is only valid when
Distance
is "seuclidean"
.
You cannot simultaneously specify Standardize
and either of
Scale
or Cov
.
Data Types: single
| double
Flag to standardize the predictors, specified as true
(1
) or false
(0)
.
If you set Standardize=true
, then the software centers and scales each
column of the predictor data (X
) by the column mean and standard
deviation, respectively.
The software does not standardize categorical predictors, and throws an error if all predictors are categorical.
You cannot simultaneously specify Standardize=true
and either of
Scale
or Cov
.
It is good practice to standardize the predictor data.
Example: Standardize=true
Data Types: logical
Output Arguments
kNN classification template suitable for training ensembles or
error-correcting output code (ECOC) multiclass models, returned as a
template object. Pass t
to fitcensemble
or fitcecoc
to specify how to
create the KNN classifier for the ensemble or ECOC model,
respectively.
If you display t
to the Command Window, then
all, unspecified options appear empty ([]
). However,
the software replaces empty options with their corresponding default
values during training.
Version History
Introduced in R2014a
See Also
ClassificationKNN
| ExhaustiveSearcher
| fitcensemble
| fitcecoc
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)