Impute missing data using nearest-neighbor method

returns `imputedData`

= knnimpute(`data`

)`imputedData`

after replacing `NaN`

s in the
input `data`

with the corresponding value from the nearest-neighbor
column. If the corresponding value from the nearest-neighbor column is also
`NaN`

, the next nearest column is used. The function calculates the
Euclidean distance between observation columns by using only the rows with no
`NaN`

values. Thus, the data must have at least one row that contains no
`NaN`

.

replaces `imputedData`

= knnimpute(`data`

,`k`

)`NaN`

s in `Data`

with a weighted mean of the
`k`

nearest-neighbor columns. The weights are inversely proportional to
the distances from the neighboring columns.

uses additional options specified by one or more name-value pair arguments. For example,
`imputedData`

= knnimpute(`data`

,`k`

,`Name,Value`

)`imputedData = knnimpute(data,k,'Distance','mahalanobis')`

uses the
Mahalanobis distance to compute the nearest-neighbor columns.

[1] Speed, T. (2003). Statistical Analysis of Gene Expression Microarray Data (Chapman & Hall/CRC).

[2] Hastie, T., Tibshirani, R., Sherlock, G., Eisen, M., Brown, P., and Botstein, D. (1999). “Imputing missing data for gene expression arrays”, Technical Report, Division of Biostatistics, Stanford University.

[3] Troyanskaya, O., Cantor, M.,
Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., Botstein, D., and Altman, R. (2001).
Missing value estimation methods for DNA microarrays. Bioinformatics
*17(6)*, 520–525.