Calculate confidence interval of data set
75 views (last 30 days)
Show older comments
I want to calculate a 95% confidence interval for the difference between the mean of each variable for the species versicolor and virginica in the "fisheriris" data set, but I don't quite understand how to write the code. This is my code so far:
load fisheriris
%Versicolor
ver1 = meas(51:100,1);
ver2 = meas(51:100,2);
ver3 = meas(51:100,3);
ver4 = meas(51:100,4);
ver1_mean = mean(ver1);
ver2_mean = mean(ver2);
ver3_mean = mean(ver3);
ver4_mean = mean(ver4);
%Virginica
vir1 = meas(101:150,1);
vir2 = meas(101:150,2);
vir3 = meas(101:150,3);
vir4 = meas(101:150,4);
vir1_mean = mean(vir1);
vir2_mean = mean(vir2);
vir3_mean = mean(vir3);
vir4_mean = mean(vir4);
q1 = quantile(ver1_mean - vir1_mean, [0.95])
q2 = quantile(ver2_mean - vir2_mean, [0.95])
q3 = quantile(ver3_mean - vir3_mean, [0.95])
q4 = quantile(ver4_mean - vir4_mean, [0.95])
But I assume that I should have some sort of interval instead of just 0.95 in my quantile command, however whatever I put for ex, [0.05 0.95], I get the same value in for both ends of the interval and if I just have 0.95 then I don't get an interval. Could someone explain where I have gone wrong, all help is appreciated!
0 Comments
Answers (2)
Ive J
on 18 Dec 2020
Edited: Ive J
on 18 Dec 2020
I'm not sure if you want to calculate CI or something like reference range. Regardless, as you can see on quantile help page, it says Quantiles of a data set. So, of course when your inputs are scalar the output is same as the input.
data = randn(100, 1);
quantile(data, [0.05 0.95]) % this is not 95% CI
% 95% CI
CI95 = [mean(A) - 1.96*(std(A)./sqrt(numel(A))), mean(A) + 1.96*(std(A)./sqrt(numel(A)))]
2 Comments
Ive J
on 20 Dec 2020
Edited: Ive J
on 20 Dec 2020
In bootsrapt you simply do random sampling form ver1 and vir1; so, in above example you pick K random subsamples with replication from ver1 and vir1 and calculate the mean of each subsample. So, you have a vector of mean values, for which you can easily calculate CI.
I guess you are looking for something like this:
load fisheriris
%Versicolor
ver = meas(51:100,1);
%Virginica
vir = meas(101:150,1);
% 95% CI using bootstrapping
M = 2000;
alpha = 0.05/2;
ver_boot = bootstrp(M, @mean, ver);
vir_boot = bootstrp(M, @mean, vir);
ci_bootstrap = quantile(ver_boot - vir_boot, [alpha 1-alpha]);
% using Z-value (assuming normal distribution)
A = ver - vir;
ci_z = [mean(A) - 1.96*(std(A)./sqrt(numel(A))), ...
mean(A) + 1.96*(std(A)./sqrt(numel(A)))];
% using t-distribution
[~, ~, ci_t] = ttest2(ver, vir);
% compare them
ci_bootstrap
ci_z
ci_t'
ci_bootstrap =
-0.8880 -0.4230
ci_z =
-0.8942 -0.4098
ci_t =
-0.8819 -0.4221
% visualize esitmated mean from bootstrapping
histogram(ver_boot - vir_boot)
line([mean(A), mean(A)], get(gca, 'YLim'), 'color', 'r', 'LineWidth', 1.5)

Liam Walsh
2 minutes ago
bootci will generate confidence intervals by creating subsamples of the datasets and computing the mean differences from those subsamples. From the set of these subsampled mean differences, we can construct intervals, which are returned by bootci. It is a non-parametric approach, meaning that it doesn't assume any specific distribution about the data or about quantities estimated from the data.
ttest2 will generate confidence intervals based on properties of the estimate of the mean difference. This test is parametric, unlike bootci, and it assumes the data passed to it are approximately normally distributed. However, with enough observations (>= 30 is the general rule of thumb), the confidence intervals are fairly robust to departures from this assumption.
% Separate out the data
load fisheriris
setData = meas(1:50,:);
versData = meas(51:100,:);
virgData = meas(101:end,:);
% Approach 1: bootci
meandiff = @(x,y) mean(x-y);
nsamples = 1000;
setVersDiff = bootci(nsamples, meandiff, setData, versData);
setVirgDiff = bootci(nsamples, meandiff, setData, virgData);
versVirgDiff = bootci(nsamples, meandiff, versData, virgData);
% Approach 2 - ttest2
setVersDiffT = zeros(2, 4);
setVirgDiffT = zeros(2, 4);
versVirgDiffT = zeros(2, 4);
for i = 1:4 % Four columns in the data
% Set VarType="unequal" to indicate that each of the columns have
% different variances. This can be set to "equal" if they have equal
% variances, but "unequal" will be more robust if that assumption is
% incorrect
[~,~,setVersDiffT(:, i)] = ttest2(setData(:,i), versData(:,i), VarType="unequal");
[~,~,setVirgDiffT(:, i)] = ttest2(setData(:,i), virgData(:,i), VarType="unequal");
[~,~,versVirgDiffT(:, i)] = ttest2(versData(:,i), virgData(:,i), VarType="unequal");
end
% Compare each of the estimates. They show good alignment
disp(setVersDiff)
disp(setVersDiffT)
disp(setVirgDiff)
disp(setVirgDiffT)
disp(versVirgDiff)
disp(versVirgDiffT)
0 Comments
See Also
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!