How pwelch computes confidence intervals

Can Mathworks please point to exactly which formulae are being used to compute confidence intervals in pwelch? Perhaps state the formulae being used in the code? This would include how exacty the number of degrees of freedom are being computed.

3 Comments

I share the original poster's question. Kiran, you said to read the documentaiton, and I have. It does not answer the question. The pwelch() documentation only cites two textbooks which are rather expensive. An example of an acceptable answer could be something like this (but this is probably not the real answer):
The p% CI for the estimate Pxxhat(f) is: Pxxhat(f)*(1/k)*{chisq[k,(1-p)/2], chisqd[k,(1+p)/2]}
where k is the degrees of freedom and k=2*N/L where N is the signal length and L is the window length, i.e. k=2*(number of times the window fits into the signal without any overlapping).
and where chisq[k,a] is the a-percentage point on the chi squared distribution with k degrees of freedom.
Please provide an answer of this type. Thank you.
I read one of the two books which is cited in document pwelch, which was recommended by Kiran. The book I read is Statistical digital signal processing and modeling by M.H. Hayes, 1996. The relevant portion is seciton 8.2.5, pp.415-419 (which are pages 435-439 in the electronic version). It does not give formulas for the confidence interval for the Welch periodogram. Therefore it does not answer the original question: how is the confidence interval determined.
There are several good books on this - the original Jenkins and Watt book is excellent (although out of print and not evailble in pdf form) as is a book aimed at oceanographers by Thomson and Emery (Data Analysis Methods in Physical Oceanography). The reason I originally asked this question was because I suspect that the confidence intervals given by pwelch are not quite calculated correctly. One can check this by doing spectra for a random process that has a known spectrum - Jenkins and Watt give an excellent example of such a process in their book. I wrote a script - below - that generates a time series using the auto-regressive process J&W use for their examples. With apologies for the poor coding technique (I first learned programming in Fortran 66), you will see that if one inverts the CIs that Matlab computes and swap the results between the upper and lower CIs, one gets something that looks pretty good whereas if one sticks with the CIs pwelch gives, the lower and upper bounds are both high, meaning that the lower bound is not low enough and the upper bound is too high. So, this is what makes me think there is something wrong in pwelch.
%
% Examine spectral confidence intervals using a random process with a known spectrum
% The time series used is generated via an AR1 process detailed in Jenkins and Watts on p 269
% S. Monismith
% Stanford University
% 2/15/21
%
clear
M=16;
seriesize=2^M; % length of time series
t=0:seriesize-1; % times
X=ones(seriesize,1); %
Z=randn(seriesize,1);
factor=0.5;
for j=3:seriesize
X(j)=X(j-1)-factor*X(j-2)+Z(j);
end
X(1)=X(seriesize)-factor*X(seriesize-1)+Z(1);
X(2)=X(1)-factor*X(seriesize)+Z(2);
idothese=seriesize;
%
% Key adjustment - how many segments
%
nsegs=2;
%
% Now set up and do pwelch estimation
%
windowlength=idothese/nsegs;
nfft=windowlength;
[PXX,f,PXXc] = pwelch(X,windowlength,0.5*windowlength,nfft,1,'ConfidenceLevel',0.95);
%
% Now compute bounds - note that they are multiplicative factors on known spectrum
% First, we will use what pwelch gives
%
factor1=median(PXXc(:,1)./PXX);
factor2=median(PXXc(:,2)./PXX);
subplot(1,2,1)
semilogy(f,PXX,'ko','markeredgecolor',0.7*[1,1,1],...
'markerfacecolor',0.7*[1,1,1],'markersize',4)
hold on
Pxxtheory=var(X)*(0.834)./(2.25-3*cos(2*pi*f)+cos(4*pi*f));
hold on
semilogy(f,Pxxtheory,'r','linewidth',2)
semilogy(f,Pxxtheory*factor1,'b--','linewidth',2)
semilogy(f,Pxxtheory*factor2,'r--','linewidth',2)
hold off
ylim([1e-1 1e2])
grid
set(gca,'fontsize',20)
ylabel('\Phi_{XX}','fontsize',20)
xlabel('f','fontsize',20)
title('CIs as given by pwelch','fontsize',20)
%
% Now re-do the confidence intervals with a suggested correction
%
factor1=median(PXX./PXXc(:,1));
factor2=median(PXX./PXXc(:,2));
%
subplot(1,2,2)
semilogy(f,PXX,'ko','markeredgecolor',0.7*[1,1,1],...
'markerfacecolor',0.7*[1,1,1],'markersize',4)
hold on
Pxxtheory=var(X)*(0.834)./(2.25-3*cos(2*pi*f)+cos(4*pi*f));
hold on
semilogy(f,Pxxtheory,'r','linewidth',2)
semilogy(f,Pxxtheory*factor1,'b--','linewidth',2)
semilogy(f,Pxxtheory*factor2,'r--','linewidth',2)
hold off
ylim([1e-1 1e2])
grid
set(gca,'fontsize',20)
ylabel('\Phi_{XX}','fontsize',20)
xlabel('f','fontsize',20)
title('CIs inverted and swapped','fontsize',20)

Sign in to comment.

Answers (2)

Hi Stephen,
Refer the pwelchfunction documentation for the details.

2 Comments

Kiran, thak you for this answer but plese see my request above for more info. Thanks.
Kiran,
I read the source code for pwelch.m (C:\Program Files\MATLAB\R2018b\toolbox\signal\signal\pwelch.m). It does not reveal how the confidence interval is computed. pwelch.m calls welch.m (C:\Program Files\MATLAB\R2018b\toolbox\signal\signal\+spectrum\welch.m), which I inspected. It calls another version of pwelch() which I oculd not find, probably because it is in a compiled library. Therefore I would appreciate an actual answer to Stephen's question. Thank you.

Sign in to comment.

The 2-sided confidence interval (C.I.) with probability p on the power spectral denisty (p.s.d.), which is returned by pwelch(), is given by
Pxxhat(f)*k/chi2((1+p)/2,k) < Pxx(f) < Pxxhat(f)*k/chi2((1-p)/2,k)
where Pxxhat(f) is the experimental estimate of the p.s.d. at frequency f, and Pxx(f) is the true, but unknown, value of the p.s.d., and k is the degrees of freedom. This is analagous to a confidence interval for a variance. The degrees of freedom is given by
k = 2*K
where upper case K is the number of segments or windows used when esitmating the p.s.d.
If the window length and offset are chosen so that the windows fit the signal without any leftover bits, then
K=(N-L)/D+1
where N=sinal length, L=window length, D=offset. (Don't confuse overlap and offset. Overlap=L-D. pwelch() takes overlap as an argument, but Welch (1967) uses offset=D in his formulas.
Example: Signal length=1024, window=Hamming, with length 256, half-overlapped. Then N=1024, L=256, and D=offset=128. Therefore K=7, by the formula above, and we can verify that 7 half overlapped windows cover the signal exactly. Then k=2*K = 14.
Prob{Pxxhat*14/chi2(.975,14) < Pxx < Pxxhat*14/chi2(.025,14)} = 0.95
=> Prob{Pxxhat*0.536 < Pxx < Pxxhat*2.487} = 0.95
I have checked the above formulas for various combinations of N, L, D, and probability p. The formuals correctly reproduce the confidence intervals reported by pwelch().
The formulas above are not correct from a statistical point of view, when the overlap is more than zero, but they are the formulas that pwelch() uses. The error is that k, the degrees of freedom, should be somewhat less than 2*K when the windows overlap. The exact formula is given by Welch, IEEE Trans Audio Electroacoustics AU-15:70-73, 1967, https://ieeexplore.ieee.org/document/1161901. It is complicated so I will not reproduce it here. It has a varible number of terms, depending on window shape and amount of overlap. The C.I. from pwelch() does not take window shape or overlap into account. pwelch()'s confidence intervals are correct when the windows do not overlap. The error in the confidence interval reported by pwelch() is relatively small when using Hamming or Hann or similar windows at half-overlap. In general, the C.I. reported by pwelch() is narrower than it should be. See table below:
The errors in the confidence intervals reported by pwelch() become more severe if the overlap is greater. By more severe I mean that the C.I. from pwelch() is significantly more narrow that than the true C.I.

5 Comments

Hello
I have noticed an error in the above formula on the CI. The correct formula is
Pxxhat(f)*k/chi2(p/2,k) < Pxx(f) < Pxxhat(f)*k/chi2((1-p)/2,k)
which is coherent with the rest of the answer.
Thank you for your interest. Perhaps I am mistaken, but I do not think your correction is correct. I wrote:
"The 2-sided confidence interval (C.I.) with probability p on the power spectral denisty (p.s.d.), which is returned by pwelch(), is given by
Pxxhat(f)*k/chi2((1+p)/2,k) < Pxx(f) < Pxxhat(f)*k/chi2((1-p)/2,k)"
You wrote: "The correct formula is
Pxxhat(f)*k/chi2(p/2,k) < Pxx(f) < Pxxhat(f)*k/chi2((1-p)/2,k)"
The difference in our formulas is in bold. You propose that my "(1+p)/2" should be "p/2". I disagree. Consider the case of a two-sided 95% confidence interval (p=0.95). You are suggesting that the critical value in the denominator on the left should be . I think the critical value in the denominator on the left should be . My formula agrees with the formulas in various texts and online sources, such as here, here, here. (My "p" corresponds to in these sources, so my "p=95%" corresponds to .)
My mistake, you are right, sorry for this wrong comment.
The reason why I am studying the C.I. is that I am trying to understand the confidence interval estimation given by MATLAB pwelch function.
Using Monte-Carlo approach, I am computing a large number of spectral estimates and compute statistics. But I found some discrepancies between the average upper bound of the CI given by pwelch and the 95% confidence interval I can compute as twice the standard deviation of the spectrum estimates.
No problem.
The estimate of the power spectrum at a specific frequency, when scaled appropriately, has an approximately chi-squared distribution.* Therefore the formula that says the 95% C.I. is approximately twice the standard deviation does not apply. That formula is for estimates that have a normal distribution - such as the mean of a set of measurements, which is approximately normal, due to the central limit theorem.
* I say "approximately" because, if overlapped segments are used to compute the spectrum, as is usually done, then the different components of the estimate are not all independent - due to the partial overlap. A true chi-squared random variable is the sum of the squares of independent standard normal random variables.
You said you are trying to understand the CIs given by pwelch(). I too am interested in this topic.
You may wonder if there is a simple formula, comparable to
which relates μ and σ for a group of power spectrum estimates to the upper 95% C.I. of the power spectrum. I do not think such a formula exists. The C.I. returned by pwelch() uses the formula I gave in an earlier comment. I have done millions of Monte Carlo simulations of power spectrum estimates. I conclude that the formula which pwelch() uses for confidence intervals is accurate when there is no overlap of segments in the spectrum estimate. The CI returned by pwelch() is somewhat narrower than it should be, when overlapping segments are used to estimate the spectrum. In other words, the actual 95% C.I. is slightly wider than the C.I. returned by pwelch().
If you wish to discuss this further, please email me separately by clicking on the envelope at the upper right corner of the box that pops up when you click on my name.

Sign in to comment.

Products

Release

R2019a

Asked:

on 30 Apr 2020

Commented:

on 26 Nov 2023

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!