kstest - normal?

Question

1 vote

Hi, I am confused from reading the description from the 'kstest' function. Usually '1' means true and '0' means false, and the purpose of this function is to test whether or not a set of data is normally distributed. However, what I gather from reading the description, '0' is returned when the data is normally distributed, and '1' is returned when the data is not normally distributed.

Is this correct interpretation? The example is also a little confusing x = -2:1:4 x = -2 -1 0 1 2 3 4

[h,p,k,c] = kstest(x,[],0.05,0)
h =
   0
p =
   0.13632
k =
   0.41277
c =
   0.48342

These data are linear, not a normal distribution. Yet the kstest returns '0', which means the kstest classifies these data as normal, which is a limitation of the kstest with small data samples?

From what I read, the resolution is thus to use the 'smaller' or 'larger' tag to correct for this problem, but is there any clear cut-off for what is 'smaller' and what is 'larger'?

Lastly, if I were to use this test in a publication and say that our data was 'normal' (this function returned 0) or failed to be classified as 'normal' (this function returned 1) with this test and I used the 'smaller' or 'larger' tags, how does that change the name of the test? It can't be the same test if it is returning different values. How would I explain this?

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Andrew Newell on 31 Mar 2011

Open in MATLAB Online

0 votes

Your example (taken from the documentation), "illustrates the difficulty of testing normality in small samples." If you plot

normplot(x)

you'll see that the deviations from a standard normal distribution occur in the two outer points. It doesn't take a lot more data to get a reasonable result, though:

x = -2:0.5:4;
[h,p,k,c] = kstest(x,[],0.05,0)
h =
     1

p =

0.0245

k =

0.3947

c =

0.3614

Keep in mind, too, their comment about the Lilliefors test - it is more likely to be the one you want.

2 Comments
Show None Hide None

the cyclist on 31 Mar 2011

Andrew, I think you meant "normplot(x)" rather than "normpdf(x)" here.

Andrew Newell on 31 Mar 2011

Oops!

Sign in to comment.

Answer 2

the cyclist on 31 Mar 2011

1 vote

Ian,

Lots and lots of things that need to be addressed here. I'll try to address as much as I can.

First, in your little example, you only have seven data points. Therefore, the statistical test you are applying has very little power to distinguish between normal and non-normal distributions. Note that if you added even one more point, x=-2:1:5, the K-S test would have rejected the null hypothesis, though. I hope that the real study you are planning to submit has more data than this!

The test certainly does not "classify these data as normal"! It fails to reject the hypothesis that the data are normally distributed. That's an important distinction. Given this dataset, you should not say your data are normal.

The data [-2 -1 0 1 2 3 4] are not, in and of themselves, "linear". They are seven data points that you just happen to know you generated linearly.

The resolution of this issue is not to use the additional arguments "larger" or "smaller". Those arguments are more related to one's expectation that the distribution being sampled is skewed toward one side or the other of normal. I don't think those are relevant here. (But, the way it would be described, if it were relevant, would be to say you used a one-sided KS test rather than two-sided.)

There are other tests of normality that may also be useful to you: jbtest and lillietest.

I would say that if it is important to distinguish normality, then, sadly, you do not have enough data to do so confidently.

6 Comments
Show 4 older comments Hide 4 older comments

N on 29 Jan 2020

On a side note related to the definition of the tails:

when using 'Tail' set to 'smaller' we are testing if the the distribution is left skewed
when using 'Tail' set to 'larger' we are testing if the the distribution is right skewed

Is this correct?

the cyclist on 30 Jan 2020

Open in MATLAB Online

% Set random number seed to default
rng default
% Generate data that is clearly shifted larger than standard normal
% (I'm not sure I would refer to this as "right skewed", but I think this is what you mean.)
N = 1000;
x = randn(N,1) + 5;
% Null hypothesis that the distribution is larger than standard normal is NOT rejected
h_larger  = kstest(x,'Tail',"larger")
% Null hypothesis that the distribution is unequal to standard normal IS rejected
h_unequal  = kstest(x,'Tail',"unequal")
% Null hypothesis that the distribution is smaller than standard normal IS rejected
h_smaller = kstest(x,'Tail',"smaller")

Sign in to comment.

Answer 3

Matt Tearle on 31 Mar 2011

0 votes

The output is the more likely hypothesis, not a true/false. Hence, h = 0 means the null hypothesis (H0) which is that the data comes from the assumed distribution.

The smaller/larger options are for performing one-sided tests - eg if your data came from a normal distribution with positive mean.

Other than that, see Andrew's answer. In particular, look at lillietest and jbtest.

2 Comments
Show None Hide None

the cyclist on 31 Mar 2011

h=0 does not mean that the null hypothesis is the more likely hypothesis. It means only that the null hypothesis cannot be rejected at the specified level of confidence.

Matt Tearle on 31 Mar 2011

Yes, but given that it returns a single value 0 or 1, I was trying to find a way to phrase that this return is the "decision" (H0 or H1), rather than a true/false.

Sign in to comment.

kstest - normal?

0 Comments
Show -2 older comments Hide -2 older comments

Accepted Answer

2 Comments
Show None Hide None

More Answers (2)

6 Comments
Show 4 older comments Hide 4 older comments

2 Comments
Show None Hide None

Categories

Products

Tags

Community Treasure Hunt

kstest - normal?

0 Comments Show -2 older comments Hide -2 older comments

Accepted Answer

2 Comments Show None Hide None

More Answers (2)

6 Comments Show 4 older comments Hide 4 older comments

2 Comments Show None Hide None

Categories

Products

Tags

See Also

Community Treasure Hunt

0 Comments
Show -2 older comments Hide -2 older comments

2 Comments
Show None Hide None

6 Comments
Show 4 older comments Hide 4 older comments

2 Comments
Show None Hide None