How to calculate the confidence interval

618 views (last 30 days)

Sepp on 20 Oct 2014

1
Link

Direct link to this question

https://in.mathworks.com/matlabcentral/answers/159417-how-to-calculate-the-confidence-interval

Commented: Star Strider on 25 Aug 2023

I have a vector x with e.g. 100 data point. I can easy calculate the mean but now I want the 95% confidence interval. I can calculate the 95% confidence interval as follows:

CI = mean(x)+- t * (s / square(n))

where s is the standard deviation and n the sample size (= 100).

Is there a method in matlab where I just can feed in the vector and then I get the confidence interval?

Or I can write my own method but I need at least the value of t (critical value of the t distribution) because it depends on the number of samples and I don't want to lookup it in a table everytime. Is this possible?

Would be very nice if somebody could give an example.

Last but not least, I want 95% confidence in a 5% interval around the mean. For checking that I just have to calculate the 95% confidence interval and then check if the retrieved value is less than 5% of my mean, right?

4 Comments
Show 2 older commentsHide 2 older comments

Jennifer Wade on 15 Feb 2022

I use something like this for a generic data vector, A.....

N = length(A)

STDmean = mean(A)/sqrt(N)

dof = N - 1; %Depends on the problem but this is standard for a CI around a mean.

studentst = tinv([.025 0.975],dof) %tinv is the student's t lookup table for the two-tailed 95% CI ...

CI = studentst*STDmean

I'm looking into bootci now!

Jennifer Wade on 15 Feb 2022

Sorry, just saw the same answer below!

Accepted Answer

Star Strider on 20 Oct 2014

16
Link

Direct link to this answer

https://in.mathworks.com/matlabcentral/answers/159417-how-to-calculate-the-confidence-interval#answer_155952

Open in MATLAB Online

This works:

x = randi(50, 1, 100);                      % Create Data
SEM = std(x)/sqrt(length(x));               % Standard Error
ts = tinv([0.025  0.975],length(x)-1);      % T-Score
CI = mean(x) + ts*SEM;                      % Confidence Intervals

You have to have the Statistics Toolbox to use the tinv function. If you do not have it, I can provide you with a few lines of my code that will calculate the t-probability and its inverse.

25 Comments
Show 23 older commentsHide 23 older comments

Star Strider on 22 Oct 2014

Open in MATLAB Online

You could certainly do that, but I’m not sure how meaningful it would be. Consider two vectors of random numbers with the same (normal) distribution, the only difference between them being a fixed offset:

x1 = randn(1,100);
x2 = randn(1,100)+10;
SEM1 = std(x1)/sqrt(length(x1));
SEM2 = std(x2)/sqrt(length(x2));
RR1 = SEM1/mean(x1);
RR2 = SEM1/mean(x2);

taking the ratio as in ‘RR1’ and ‘RR2’ would produce a significantly lower ratio for ‘RR2’ (in this illustration) in spite of the data themselves being essentially the same.

The way I understand your latest comment (no promises that I do), you might want to compare CI values with increasing numbers of data points and compare them. I suspect they will become asymptotic to some non-zero value and not decrease further.

To illustrate:

x = randn(1,1E+6);
st = 10000;
for k1 = 1:st:length(x)
    xs = x(1:st+(k1-1));
    SE = std(xs)/sqrt(length(xs));
    xss(1+fix((k1-1)/st)) = length(xs);
    CI(1+fix((k1-1)/st)) = SE*tinv(0.975,length(xs)-1)*2;
end
figure(1)
stairs(xss,CI)
grid
xlabel('Sample Size')
ylabel('CI')

I’m certain there is an analytic proof of this available, but I’m not up to looking for it just now.

Adam Danz on 21 Aug 2019

Edited: Adam Danz on 12 Jul 2022

Open in MATLAB Online

Here's an anonymous function based on Star Strider's answer. It uses tinv() which means the stats toolbox is required. This function also uses "omitnan" flags so that NaN values are ignored which requires r2016a or later. Note that the t-distribution method assumes the data form an approximately normal distribution but this can be fairly robust to skewed data.

% x is a vector, matrix, or any numeric array of data. NaNs are ignored.

% p is a the confident level (ie, 95 for 95% CI)

% The output is 1x2 vector showing the [lower,upper] interval values.

CIFcn = @(x,p)std(x(:),'omitnan')/sqrt(sum(~isnan(x(:)))) * tinv(abs([0,1]-(1-p/100)/2),sum(~isnan(x(:)))-1) + mean(x(:),'omitnan');

Alternatively, you could compute CI of the mean using bootstrapping along with the percentile method. This approach does not assume a normal distribution and is more robust than the t-distribution method.

Here's a demo comparing both methods to show a small difference in CI.

Generate skewed data

rng('default')
x = raylrnd(5,[1,2000]);  % requires stats & machine learning toolbox

Compute CI using the t-distribution method

CIFcn = @(x,p)std(x(:),'omitnan')/sqrt(sum(~isnan(x(:)))) * tinv(abs([0,1]-(1-p/100)/2),sum(~isnan(x(:)))-1) + mean(x(:),'omitnan'); 
p = 95; 
CItdist = CIFcn(x,p)
CItdist = 1×2
    6.1236    6.4105

Compute CI using bootstrapping & percentile method

bootci requires the stats & machine learning toolbox. However, this is fairly easy to compute without the bootci function. Simply create a for-loop with n iterations for n bootstraps (I've chosen 1000 here). In each iteration of the for-loop, sample your data with replacement (use the randi function) and store the mean of the resampled data. After you have n means, compute the 95% CI of the means using prctile.

Note, I would not use mean as the statistic for a non-normal distribution. The median would be a much better approach.

[CIbsMean, CImeans] = bootci(1000, {@mean, x}, 'type','per','alpha', 0.05);
disp(CIbsMean')
    6.1355    6.4138

Plot the results.

The first axes shows the distribution of the raw data and both sets of CIs. You can see that they are so close the nearly overlap. The second axes show the same sets of CIs but magnified to see the difference. The last axes shows the distribution of the means from the bootstrap. Notice that they are approximately normally distributed even though the underlying data are not normally distributed. Herein lies the magic of bootstrapping with the percentile method. Thanks to the central limit theorm, the distribution of bootstrapped means will always be normally distributed no matter what the underlying distribution is from the raw data!

figure()

tiledlayout(3,1,'TileSpacing','Compact');

nexttile

histogram(x)

x1 = xline(CItdist,'k:','LineWidth',1,'DisplayName','tinv');

x2 = xline(CIbsMean,'m--','LineWidth',1,'DisplayName','BootMean');

x3 = xline(mean(x),'k-','DisplayName','mean');

legend([x1(1),x2(1),x3],'Location','EastOutside')

title('CI and underlying data')

nexttile

x1 = xline(CItdist,'k:','LineWidth',1,'DisplayName','tinv');

x2 = xline(CIbsMean,'m--','LineWidth',1,'DisplayName','BootMean');

x3 = xline(mean(x),'k-','DisplayName','mean');

legend([x1(1),x2(1),x3],'Location','EastOutside')

title('CIs')

box on

nexttile

histogram(CImeans)

xline(CIbsMean,'m--','LineWidth',1,'DisplayName','BootMean');

title('Bootstrapped means')

Lastly, for people looking to compute the bootstrapped CI on the distribution rather than the mean of the distribution, you can simply use the prctile function:

% x is a vector, matrix, or any numeric array of data. NaNs are ignored.

% p is the confidence level (ie, 95 for 95% CI)

% The output is 1x2 vector showing the [lower,upper] interval values.

CIFcn = @(x,p)prctile(x,abs([0,100]-(100-p)/2));

Demo:

figure

x = pearsrnd(0,1,1,4,100,1);

histogram(x);

CItdist = CIFcn(x,p);

xline(CItdist,'k:','LineWidth',1,'DisplayName','CI Mean')

CIFcn = @(x,p)prctile(x,abs([0,100]-(100-p)/2));

CIDist = CIFcn(x,95);

xline(CIDist,'m--','LineWidth',1,'DisplayName','CI Distribution')

*This comment was updated on 7/12/22 thanks to feeback from Ishmaal Erekson in the comments below.

Niraj Desai on 25 Aug 2023

@Star Strider Thank you so much for your answers (over the course of eight years !!!) I realize this thread started in 2014, but I only found it today. It clarified something that I had been confused about. I'm grateful.

Star Strider on 25 Aug 2023

@Niraj Desai — My pleasure!

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

How to calculate the confidence interval

4 Comments
Show 2 older commentsHide 2 older comments

Accepted Answer

25 Comments
Show 23 older commentsHide 23 older comments

More Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

How to calculate the confidence interval

4 Comments Show 2 older commentsHide 2 older comments

Accepted Answer

25 Comments Show 23 older commentsHide 23 older comments

More Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

4 Comments
Show 2 older commentsHide 2 older comments

25 Comments
Show 23 older commentsHide 23 older comments