templateKNN

k-nearest neighbor classifier template

Syntax

t = templateKNN()

t = templateKNN(Name,Value)

Description

t = templateKNN() returns a k-nearest neighbor (KNN) learner template suitable for training ensembles or error-correcting output code (ECOC) multiclass models.

If you specify a default template, then the software uses default values for all input arguments during training.

Specify t as a learner in fitcensemble or fitcecoc.

example

t = templateKNN(Name,Value) creates a template with additional options specified by one or more name-value pair arguments.

For example, you can specify the nearest neighbor search method, the number of nearest neighbors to find, or the distance metric.

If you display t in the Command Window, then all options appear empty ([]), except those that you specify using name-value pair arguments. During training, the software uses default values for empty options.

Examples

collapse all

Create a k-Nearest Neighbor Template for Ensemble

Open Live Script

Create a nondefault k-nearest neighbor template for use in fitcensemble.

Load Fisher's iris data set.

load fisheriris

Create a template for a 5-nearest neighbor search, and specify to standardize the predictors.

t = templateKNN('NumNeighbors',5,'Standardize',1)

t = 
Fit template for classification KNN.

       NumNeighbors: 5
           NSMethod: ''
           Distance: ''
         BucketSize: ''
        IncludeTies: []
     DistanceWeight: []
          BreakTies: []
           Exponent: []
                Cov: []
              Scale: []
    StandardizeData: 1
            Version: 1
             Method: 'KNN'
               Type: 'classification'

All properties of the template object are empty except for NumNeighbors, Method, StandardizeData, and Type. When you specify t as a learner, the software fills in the empty properties with their respective default values.

Specify t as a weak learner for a classification ensemble.

Mdl = fitcensemble(meas,species,'Method','Subspace','Learners',t);

Display the in-sample (resubstitution) misclassification error.

L = resubLoss(Mdl)

L = 0.0600

Create a k-Nearest Neighbor Template for ECOC Multiclass Learning

Open Live Script

Create a nondefault k-nearest neighbor template for use in fitcecoc.

Load Fisher's iris data set.

load fisheriris

Create a template for a 5-nearest neighbor search, and specify to standardize the predictors.

t = templateKNN('NumNeighbors',5,'Standardize',1)

t = 
Fit template for classification KNN.

       NumNeighbors: 5
           NSMethod: ''
           Distance: ''
         BucketSize: ''
        IncludeTies: []
     DistanceWeight: []
          BreakTies: []
           Exponent: []
                Cov: []
              Scale: []
    StandardizeData: 1
            Version: 1
             Method: 'KNN'
               Type: 'classification'

Specify t as a binary learner for an ECOC multiclass model.

Mdl = fitcecoc(meas,species,'Learners',t);

By default, the software trains Mdl using the one-versus-one coding design.

Display the in-sample (resubstitution) misclassification error.

L = resubLoss(Mdl,'LossFun','classiferror')

L = 0.0467

Input Arguments

collapse all

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: 'NumNeighbors',4,'Distance','minkowski' specifies a 4-nearest neighbor classifier template using the Minkowski distance measure.

`BreakTies` — Tie-breaking algorithm
`'smallest'` (default) | `'nearest'` | `'random'`

Tie-breaking algorithm used by the predict method if multiple classes have the same smallest cost, specified as the comma-separated pair consisting of 'BreakTies' and one of the following:

'smallest' — Use the smallest index among tied groups.
'nearest' — Use the class with the nearest neighbor among tied groups.
'random' — Use a random tiebreaker among tied groups.

By default, ties occur when multiple classes have the same number of nearest points among the k nearest neighbors.

Example: 'BreakTies','nearest'

`BucketSize` — Maximum data points in node
`50` (default) | positive integer value

Maximum number of data points in the leaf node of the Kd-tree, specified as the comma-separated pair consisting of 'BucketSize' and a positive integer value. This argument is meaningful only when NSMethod is 'kdtree'.

Example: 'BucketSize',40

Data Types: single | double

`Cov` — Covariance matrix
`cov(X,'omitrows')` (default) | positive definite matrix of scalar values

Covariance matrix, specified as the comma-separated pair consisting of 'Cov' and a positive definite matrix of scalar values representing the covariance matrix when computing the Mahalanobis distance. This argument is only valid when 'Distance' is 'mahalanobis'.

You cannot simultaneously specify 'Standardize' and either of 'Scale' or 'Cov'.

Data Types: single | double

`Distance` — Distance metric
`'cityblock'` | `'chebychev'` | `'correlation'` | `'cosine'` | `'euclidean'` | `'hamming'` | function handle | ...

Distance metric, specified as the comma-separated pair consisting of 'Distance' and a valid distance metric name or function handle. The allowable distance metric names depend on your choice of a neighbor-searcher method (see NSMethod).

`NSMethod`	Distance Metric Names
`exhaustive`	Any distance metric of `ExhaustiveSearcher`
`kdtree`	`'cityblock'`, `'chebychev'`, `'euclidean'`, or `'minkowski'`

This table includes valid distance metrics of ExhaustiveSearcher.

Distance Metric Names	Description
`'cityblock'`	City block distance.
`'chebychev'`	Chebychev distance (maximum coordinate difference).
`'correlation'`	One minus the sample linear correlation between observations (treated as sequences of values).
`'cosine'`	One minus the cosine of the included angle between observations (treated as vectors).
`'euclidean'`	Euclidean distance.
`'hamming'`	Hamming distance, percentage of coordinates that differ.
`'jaccard'`	One minus the Jaccard coefficient, the percentage of nonzero coordinates that differ.
`'mahalanobis'`	Mahalanobis distance, computed using a positive definite covariance matrix `C`. The default value of `C` is the sample covariance matrix of `X`, as computed by `cov(X,'omitrows')`. To specify a different value for `C`, use the `'Cov'` name-value pair argument.
`'minkowski'`	Minkowski distance. The default exponent is `2`. To specify a different exponent, use the `'Exponent'` name-value pair argument.
`'seuclidean'`	Standardized Euclidean distance. Each coordinate difference between `X` and a query point is scaled, meaning divided by a scale value `S`. The default value of `S` is the standard deviation computed from `X`, `S = std(X,'omitnan')`. To specify another value for `S`, use the `Scale` name-value pair argument.
`'spearman'`	One minus the sample Spearman's rank correlation between observations (treated as sequences of values).
`@distfun`	Distance function handle. `distfun` has the form function D2 = distfun(ZI,ZJ) % calculation of distance ... where `ZI` is a `1`-by-`N` vector containing one row of `X` or `Y`. `ZJ` is an `M2`-by-`N` matrix containing multiple rows of `X` or `Y`. `D2` is an `M2`-by-`1` vector of distances, and `D2(k)` is the distance between observations `ZI` and `ZJ(k,:)`.

If you specify CategoricalPredictors as 'all', then the default distance metric is 'hamming'. Otherwise, the default distance metric is 'euclidean'.

Change Distance using dot notation: mdl.Distance = newDistance.

If NSMethod is 'kdtree', you can use dot notation to change Distance only for the metrics 'cityblock', 'chebychev', 'euclidean', and 'minkowski'.

For definitions, see Distance Metrics.

Example: 'Distance','minkowski'

Data Types: char | string | function_handle

`DistanceWeight` — Distance weighting function
`'equal'` (default) | `'inverse'` | `'squaredinverse'` | function handle

Distance weighting function, specified as the comma-separated pair consisting of 'DistanceWeight' and either a function handle or one of the values in this table.

Value	Description
`'equal'`	No weighting
`'inverse'`	Weight is 1/distance
`'squaredinverse'`	Weight is 1/distance²
`@fcn`	`fcn` is a function that accepts a matrix of nonnegative distances, and returns a matrix the same size containing nonnegative distance weights. For example, `'squaredinverse'` is equivalent to `@(d)d.^(-2)`.

Example: 'DistanceWeight','inverse'

Data Types: char | string | function_handle

`Exponent` — Minkowski distance exponent
`2` (default) | positive scalar value

Minkowski distance exponent, specified as the comma-separated pair consisting of 'Exponent' and a positive scalar value. This argument is only valid when 'Distance' is 'minkowski'.

Example: 'Exponent',3

Data Types: single | double

`IncludeTies` — Tie inclusion flag
`false` (default) | `true`

Tie inclusion flag, specified as the comma-separated pair consisting of 'IncludeTies' and a logical value indicating whether predict includes all the neighbors whose distance values are equal to the kth smallest distance. If IncludeTies is true, predict includes all these neighbors. Otherwise, predict uses exactly k neighbors.

Example: 'IncludeTies',true

Data Types: logical

`NSMethod` — Nearest neighbor search method
`'kdtree'` | `'exhaustive'`

Nearest neighbor search method, specified as the comma-separated pair consisting of 'NSMethod' and 'kdtree' or 'exhaustive'.

'kdtree' — Creates and uses a Kd-tree to find nearest neighbors. 'kdtree' is valid when the distance metric is one of the following:
- 'euclidean'
- 'cityblock'
- 'minkowski'
- 'chebychev'
'exhaustive' — Uses the exhaustive search algorithm. When predicting the class of a new point xnew, the software computes the distance values from all points in X to xnew to find nearest neighbors.

The default is 'kdtree' when X has 10 or fewer columns, X is not sparse or a gpuArray, and the distance metric is a 'kdtree' type; otherwise, 'exhaustive'.

Example: 'NSMethod','exhaustive'

`NumNeighbors` — Number of nearest neighbors to find
`1` (default) | positive integer value

Number of nearest neighbors in X to find for classifying each point when predicting, specified as the comma-separated pair consisting of 'NumNeighbors' and a positive integer value.

Example: 'NumNeighbors',3

Data Types: single | double

`Scale` — Distance scale
`std(X,'omitnan')` (default) | vector of nonnegative scalar values

Distance scale, specified as the comma-separated pair consisting of 'Scale' and a vector containing nonnegative scalar values with length equal to the number of columns in X. Each coordinate difference between X and a query point is scaled by the corresponding element of Scale. This argument is only valid when 'Distance' is 'seuclidean'.

You cannot simultaneously specify 'Standardize' and either of 'Scale' or 'Cov'.

Data Types: single | double

`Standardize` — Flag to standardize predictors
`false` (default) | `true`

Flag to standardize the predictors, specified as the comma-separated pair consisting of 'Standardize' and true (1) or false (0).

If you set 'Standardize',true, then the software centers and scales each column of the predictor data (X) by the column mean and standard deviation, respectively.

The software does not standardize categorical predictors, and throws an error if all predictors are categorical.

You cannot simultaneously specify 'Standardize',1 and either of 'Scale' or 'Cov'.

It is good practice to standardize the predictor data.

Example: 'Standardize',true

Data Types: logical

Output Arguments

collapse all

`t` — kNN classification template
template object

kNN classification template suitable for training ensembles or error-correcting output code (ECOC) multiclass models, returned as a template object. Pass t to fitcensemble or fitcecoc to specify how to create the KNN classifier for the ensemble or ECOC model, respectively.

If you display t to the Command Window, then all, unspecified options appear empty ([]). However, the software replaces empty options with their corresponding default values during training.

Version History

Introduced in R2014a

templateKNN

Syntax

Description

Examples

Create a k-Nearest Neighbor Template for Ensemble

Create a k-Nearest Neighbor Template for ECOC Multiclass Learning

Input Arguments

Name-Value Arguments

`BreakTies` — Tie-breaking algorithm
`'smallest'` (default) | `'nearest'` | `'random'`

`BucketSize` — Maximum data points in node
`50` (default) | positive integer value

`Cov` — Covariance matrix
`cov(X,'omitrows')` (default) | positive definite matrix of scalar values

`Distance` — Distance metric
`'cityblock'` | `'chebychev'` | `'correlation'` | `'cosine'` | `'euclidean'` | `'hamming'` | function handle | ...

`DistanceWeight` — Distance weighting function
`'equal'` (default) | `'inverse'` | `'squaredinverse'` | function handle

`Exponent` — Minkowski distance exponent
`2` (default) | positive scalar value

`IncludeTies` — Tie inclusion flag
`false` (default) | `true`

`NSMethod` — Nearest neighbor search method
`'kdtree'` | `'exhaustive'`

`NumNeighbors` — Number of nearest neighbors to find
`1` (default) | positive integer value

`Scale` — Distance scale
`std(X,'omitnan')` (default) | vector of nonnegative scalar values

`Standardize` — Flag to standardize predictors
`false` (default) | `true`

Output Arguments

`t` — kNN classification template
template object

Version History

See Also

Topics

templateKNN

Syntax

Description

Examples

Create a k-Nearest Neighbor Template for Ensemble

Create a k-Nearest Neighbor Template for ECOC Multiclass Learning

Input Arguments

Name-Value Arguments

BreakTies — Tie-breaking algorithm 'smallest' (default) | 'nearest' | 'random'

BucketSize — Maximum data points in node 50 (default) | positive integer value

Cov — Covariance matrix cov(X,'omitrows') (default) | positive definite matrix of scalar values

Distance — Distance metric 'cityblock' | 'chebychev' | 'correlation' | 'cosine' | 'euclidean' | 'hamming' | function handle | ...

DistanceWeight — Distance weighting function 'equal' (default) | 'inverse' | 'squaredinverse' | function handle

Exponent — Minkowski distance exponent 2 (default) | positive scalar value

IncludeTies — Tie inclusion flag false (default) | true

NSMethod — Nearest neighbor search method 'kdtree' | 'exhaustive'

NumNeighbors — Number of nearest neighbors to find 1 (default) | positive integer value

Scale — Distance scale std(X,'omitnan') (default) | vector of nonnegative scalar values

Standardize — Flag to standardize predictors false (default) | true

Output Arguments

t — kNN classification template template object

Version History

See Also

Topics

`BreakTies` — Tie-breaking algorithm
`'smallest'` (default) | `'nearest'` | `'random'`

`BucketSize` — Maximum data points in node
`50` (default) | positive integer value

`Cov` — Covariance matrix
`cov(X,'omitrows')` (default) | positive definite matrix of scalar values

`Distance` — Distance metric
`'cityblock'` | `'chebychev'` | `'correlation'` | `'cosine'` | `'euclidean'` | `'hamming'` | function handle | ...

`DistanceWeight` — Distance weighting function
`'equal'` (default) | `'inverse'` | `'squaredinverse'` | function handle

`Exponent` — Minkowski distance exponent
`2` (default) | positive scalar value

`IncludeTies` — Tie inclusion flag
`false` (default) | `true`

`NSMethod` — Nearest neighbor search method
`'kdtree'` | `'exhaustive'`

`NumNeighbors` — Number of nearest neighbors to find
`1` (default) | positive integer value

`Scale` — Distance scale
`std(X,'omitnan')` (default) | vector of nonnegative scalar values

`Standardize` — Flag to standardize predictors
`false` (default) | `true`

`t` — kNN classification template
template object