Documentation |
The purpose of measures of dispersion is to find out how spread out the data values are on the number line. Another term for these statistics is measures of spread.
The table gives the function names and descriptions.
Function Name | Description |
---|---|
iqr | Interquartile range |
mad | Mean absolute deviation |
moment | Central moment of all orders |
range | Range |
std | Standard deviation |
var | Variance |
The range (the difference between the maximum and minimum values) is the simplest measure of spread. But if there is an outlier in the data, it will be the minimum or maximum value. Thus, the range is not robust to outliers.
The standard deviation and the variance are popular measures of spread that are optimal for normally distributed samples. The sample variance is the minimum variance unbiased estimator (MVUE) of the normal parameter σ^{2}. The standard deviation is the square root of the variance and has the desirable property of being in the same units as the data. That is, if the data is in meters, the standard deviation is in meters as well. The variance is in meters^{2}, which is more difficult to interpret.
Neither the standard deviation nor the variance is robust to outliers. A data value that is separate from the body of the data can increase the value of the statistics by an arbitrarily large amount.
The mean absolute deviation (MAD) is also sensitive to outliers. But the MAD does not move quite as much as the standard deviation or variance in response to bad data.
The interquartile range (IQR) is the difference between the 75th and 25th percentile of the data. Since only the middle 50% of the data affects this measure, it is robust to outliers.
This example shows the behavior of the measures of dispersion for a sample with one outlier:
x = [ones(1,6) 100] stats = [iqr(x) mad(x) range(x) std(x)]
x = 1 1 1 1 1 1 100 stats = 0 24.2449 99.0000 37.4185