Anderson-Darling test

## Syntax

``h = adtest(x)``
``h = adtest(x,Name,Value)``
``````[h,p] = adtest(___)``````
``````[h,p,adstat,cv] = adtest(___)``````

## Description

example

````h = adtest(x)` returns a test decision for the null hypothesis that the data in vector `x` is from a population with a normal distribution, using the Anderson-Darling test. The alternative hypothesis is that `x` is not from a population with a normal distribution. The result `h` is `1` if the test rejects the null hypothesis at the 5% significance level, or `0` otherwise.```

example

````h = adtest(x,Name,Value)` returns a test decision for the Anderson-Darling test with additional options specified by one or more name-value pair arguments. For example, you can specify a null distribution other than normal, or select an alternative method for calculating the p-value.```

example

``````[h,p] = adtest(___)``` also returns the p-value, `p`, of the Anderson-Darling test, using any of the input arguments from the previous syntaxes.```

example

``````[h,p,adstat,cv] = adtest(___)``` also returns the test statistic, `adstat`, and the critical value, `cv`, for the Anderson-Darling test.```

## Examples

collapse all

Load the sample data. Create a vector containing the first column of the students' exam grades data.

```load examgrades x = grades(:,1);```

Test the null hypothesis that the exam grades come from a normal distribution. You do not need to specify values for the population parameters.

`[h,p,adstat,cv] = adtest(x)`
```h = logical 0 ```
```p = 0.1854 ```
```adstat = 0.5194 ```
```cv = 0.7470 ```

The returned value of `h = 0` indicates that `adtest` fails to reject the null hypothesis at the default 5% significance level.

Load the sample data. Create a vector containing the first column of the students' exam grades data.

```load examgrades x = grades(:,1);```

Test the null hypothesis that the exam grades come from an extreme value distribution. You do not need to specify values for the population parameters.

`[h,p] = adtest(x,'Distribution','ev')`
```h = logical 0 ```
```p = 0.0714 ```

The returned value of `h = 0` indicates that `adtest` fails to reject the null hypothesis at the default 5% significance level.

Load the sample data. Create a vector containing the first column of the students' exam grades data.

```load examgrades x = grades(:,1);```

Create a normal probability distribution object with mean `mu = 75` and standard deviation `sigma = 10`.

`dist = makedist('normal','mu',75,'sigma',10)`
```dist = NormalDistribution Normal distribution mu = 75 sigma = 10 ```

Test the null hypothesis that `x` comes from the hypothesized normal distribution.

`[h,p] = adtest(x,'Distribution',dist)`
```h = logical 0 ```
```p = 0.4687 ```

The returned value of `h = 0` indicates that `adtest` fails to reject the null hypothesis at the default 5% significance level.

## Input Arguments

collapse all

Sample data, specified as a vector. Missing observations in `x`, indicated by `NaN`, are ignored.

Data Types: `single` | `double`

### Name-Value Pair Arguments

Specify optional comma-separated pairs of `Name,Value` arguments. `Name` is the argument name and `Value` is the corresponding value. `Name` must appear inside quotes. You can specify several name and value pair arguments in any order as `Name1,Value1,...,NameN,ValueN`.

Example: `'Alpha',0.01,'MCTol',0.01` conducts the hypothesis test at the 1% significance level, and determines the p-value, `p`, using a Monte Carlo simulation with a maximum Monte Carlo standard error for `p` of 0.01.

Hypothesized distribution of data vector `x`, specified as the comma-separated pair consisting of `'Distribution'` and one of the following.

 `'norm'` Normal distribution `'exp'` Exponential distribution `'ev'` Extreme value distribution `'logn'` Lognormal distribution `'weibull'` Weibull distribution

In this case, you do not need to specify population parameters. Instead, `adtest` estimates the distribution parameters from the sample data and tests `x` against a composite hypothesis that it comes from the selected distribution family with parameters unspecified.

Alternatively, you can specify any continuous probability distribution object for the null distribution. In this case, you must specify all the distribution parameters, and `adtest` tests `x` against a simple hypothesis that it comes from the given distribution with its specified parameters.

Example: `'Distribution','exp'`

Significance level of the hypothesis test, specified as the comma-separated pair consisting of `'Alpha'` and a scalar value in the range (0,1).

Example: `'Alpha',0.01`

Data Types: `single` | `double`

Maximum Monte Carlo standard error for the p-value, `p`, specified as the comma-separated pair consisting of `'MCTol'` and a positive scalar value. If you use `MCTol`, `adtest` determines `p` using a Monte Carlo simulation, and the name-value pair argument `Asymptotic` must have the value `false`.

Example: `'MCTol',0.01`

Data Types: `single` | `double`

Method for calculating the p-value of the Anderson-Darling test, specified as the comma-separated pair consisting of `'Asymptotic'` and either `true` or `false`. If you specify `'true'`, `adtest` estimates the p-value using the limiting distribution of the Anderson-Darling test statistic. If you specify `false`, `adtest` calculates the p-value based on an analytical formula. For sample sizes greater than 120, the limiting distribution estimate is likely to be more accurate than the small sample size approximation method.

• If you specify a distribution family with unknown parameters for the `Distribution` name-value pair, `Asymptotic` must be `false`.

• If you use `MCTol` to calculate the p-value using a Monte Carlo simulation, `Asymptotic` must be `false`.

Example: `'Asymptotic',true`

Data Types: `logical`

## Output Arguments

collapse all

Hypothesis test result, returned as a logical value.

• If `h` `= 1`, this indicates the rejection of the null hypothesis at the `Alpha` significance level.

• If `h` `= 0`, this indicates a failure to reject the null hypothesis at the `Alpha` significance level.

p-value of the Anderson-Darling test, returned as a scalar value in the range [0,1]. `p` is the probability of observing a test statistic as extreme as, or more extreme than, the observed value under the null hypothesis. `p` is calculated using one of these methods:

• If the hypothesized distribution is a fully specified probability distribution object, `adtest` calculates `p` analytically. If `'Asymptotic'` is `true`, `adtest` uses the asymptotic distribution of the test statistic. If you specify a value for `'MCTol'`, `adtest` uses a Monte Carlo simulation.

• If the hypothesized distribution is specified as a distribution family with unknown parameters, `adtest` retrieves the critical value from a table and uses inverse interpolation to determine the p-value. If you specify a value for `'MCTol'`, `adtest` uses a Monte Carlo simulation.

Test statistic for the Anderson-Darling test, returned as a scalar value.

• If the hypothesized distribution is a fully specified probability distribution object, `adtest` computes `adstat` using specified parameters.

• If the hypothesized distribution is specified as a distribution family with unknown parameters, `adtest` computes `adstat` using parameters estimated from the sample data.

Critical value for the Anderson-Darling test at the significance level `Alpha`, returned as a scalar value. `adtest` determines `cv` by interpolating into a table based on the specified `Alpha` significance level.

collapse all

### Anderson-Darling Test

The Anderson-Darling test is commonly used to test whether a data sample comes from a normal distribution. However, it can be used to test for another hypothesized distribution, even if you do not fully specify the distribution parameters. Instead, the test estimates any unknown parameters from the data sample.

The test statistic belongs to the family of quadratic empirical distribution function statistics, which measure the distance between the hypothesized distribution, F(x) and the empirical cdf, Fn(x) as

`$n{\int }_{-\infty }^{\infty }\left({F}_{n}\left(x\right)-F\left(x\right)\right){}^{2}w\left(x\right)dF\left(x\right),$`

over the ordered sample values ${x}_{1}<{x}_{2}<...<{x}_{n}$, where w(x) is a weight function and n is the number of data points in the sample.

The weight function for the Anderson-Darling test is

`$w\left(x\right)={\left[F\left(x\right)\left(1-F\left(x\right)\right)\right]}^{-1},$`

which places greater weight on the observations in the tails of the distribution, thus making the test more sensitive to outliers and better at detecting departure from normality in the tails of the distribution.

The Anderson-Darling test statistic is

`${A}_{n}^{2}=-n-\sum _{i=1}^{n}\frac{2i-1}{n}\left[\mathrm{ln}\left(F\left({X}_{i}\right)\right)+\mathrm{ln}\left(1-F\left({X}_{n+1-i}\right)\right)\right],$`

where$\left\{{X}_{1}<...<{X}_{n}\right\}$ are the ordered sample data points and n is the number of data points in the sample.

In `adtest`, the decision to reject or not reject the null hypothesis is based on comparing the p-value for the hypothesis test with the specified significance level, not on comparing the test statistic with the critical value.

### Monte Carlo Standard Error

The Monte Carlo standard error is the error due to simulating the p-value.

The Monte Carlo standard error is calculated as

`$SE=\sqrt{\frac{\left(\stackrel{^}{p}\right)\left(1-\stackrel{^}{p}\right)}{\text{mcreps}}},$`

where $\stackrel{^}{p}$ is the estimated p-value of the hypothesis test, and `mcreps` is the number of Monte Carlo replications performed.

`adtest` chooses the number of Monte Carlo replications, `mcreps`, large enough to make the Monte Carlo standard error for $\stackrel{^}{p}$ less than the value specified for `MCTol`.