Estimating the error of a trapezoid method integral

Hi,
I have some experimental data (vectors x,y) and I've estimated the area under the curve via the trapezoid method (A = trapz(y,x))
The question is, how can I determine the error of this calculation? Is there a Matlab function that can estimate it?
Thanks a lot..

Answers (3)

I'm going to revive this just because I think this problem is fairly widespread and there isn't much available by way of resources.
When discussing the error of trapz, it is most commonly understood to be referring to the difference between the integral of an actual underlying function and the estimate made using trapz. That is not the issue here. When working with experimental data, there is no known underlying function. The problem is that the data points themselves are unreliable. The analogous case would be, if you had a known function, every time you call it there is a random error term added to it. The question is to obtain the uncertainty in the integrated area given the uncertainty in each of the data points.
There are a couple of approaches one could take. There are some formulas available here http://cmd.inp.nsk.su/old/cmd2/manuals/cernlib/shortwrups/node88.html, http://wwwasdoc.web.cern.ch/wwwasdoc/shortwrupsdir/d108/top.html
My solution was to implement the trapz() algorithm by hand, and to manually take care of the error propagation at each step. That is, you know how to convert a pair of X and Y values into an estimate of the area for a single segment, and you can use error propagation http://en.wikipedia.org/wiki/Propagation_of_uncertainty (check the table) to work out the uncertainty in that segment. You can then continue propagating the errors as you add segments together. Here is my example code (where x and y have 2 columns, and the second column is the standard deviation for each data point).
int = 0;
int_err = 0;
for j = 1:length(x)-1
yterm = 0.5*(y(j+1,1)+y(j,1));
xterm = (x(j+1,1)-x(j,1));
yerr = sqrt(0.5*(y(j,2)^2+y(j+1,2)^2));
xerr = sqrt(x(j,2)^2+x(j+1,2)^2);
z = yterm * xterm;
zerr = sqrt(z^2*((yerr/yterm)^2 + (xerr/xterm)^2));
if ~isfinite(zerr)
zerr = 0;
end
int = int + z;
int_err = sqrt(int_err^2 + zerr^2);
end

2 Comments

Romesh, in my opinion your statement "When working with experimental data, there is no known underlying function" cannot always be correct. There are a great many physical quantities subject to measurement for which there most certainly is an underlying well-defined function. If those measurements are sufficiently accurate, the curvature of this underlying function will be manifestly evident in the data and can be used to estimate the error being made by a trapezoidal approximation to an integral. It is a question of measurement accuracy in relation to data sampling density. It is simply not correct to say that all measurements are so unreliable as to rule out any such estimate. It really depends on the physical situation and the way the measurements are made.
Yes I agree I was probably wrote a little too hastily in implying there would never be a known underlying function. However, consider the case where you don't have a model predicting the relationship between quantities. Even if you had a large number of sufficiently accurate measurements, the estimate of 'the curvature of (the) underlying function' would have some level of uncertainty. And I don't think it would be correct to take the fitted curve as the 'underlying function' and then base errors on this unless you could be confident that the correct function had been fitted to the data. Maybe a Lorentzian should have been used instead of a Gaussian? I do agree with you that if you have clean enough measurements with sufficient sampling density, you could probably make a good guess, but that doesn't really help in situations where for one reason or another the number of samples and the measurement reliability are unable to be improved upon- in these cases, 'improve your data' is somewhat unhelpful!
In any case, even if you did estimate the curvature with the correct function and used it to estimate the error in the trapezoidal integration, the resulting error would not be the quantity of interest (it would be a reflection of the numerical error, rather than the experimental error). Instead, the experimental error would be contained in the uncertainty of the fitted curve (assuming the fit is correctly weighted). At this point, I think one would be best off computing the uncertainty in the integral based on the uncertainty in the fitted parameters and the formula for the analytic integral.

Sign in to comment.

Matt J
Matt J on 1 Jan 2013
Edited: Matt J on 1 Jan 2013
  • If you know the true area then compute the error by direct subtraction.
  • If you don't know the true area -- i.e., because you have the continuous form of the function y=f(x), but it cannot be analytically integrated, -- then you could take finer and finer samplings of the function and see what trapz(y,x) converges to. Then, use that as an estimate of the true area.
  • If you know bounds on the derivatives of f(x), you could use error estimation formulas from here.
  • If you don't know anything about the continuous form of the function you're integrating, then what notion of "error" are you using?

2 Comments

Thanks, the thing is, I don't really have a continues data function. My data is discrete. I only have 350 data points (x,y) with intervals of dx = 0.5, so I can't change my samplings.
In that case, then why not regard the error as zero? One of the infinitely many continuous functions that connect your x,y points is the one that connects them piece-wise linearly, and trapz(y,x) is its exact, error-free integral. If there's nothing stopping you from assuming your discrete samples came from this piece-wise linear function, then voila, you're done, and your area calculation was perfect!

Sign in to comment.

As Matt J has indicated with his reference to
the error can be estimated in terms of the second derivative of your data function and the widths of your trapezoidal intervals. You can estimate the second derivative in terms of the typical second finite differences in the data divided by the square of the interval widths.

7 Comments

Thanks, but can you be more specific about "estimating the second derivative in terms of the typical second finite differences in the data"? What exactly do you mean by "typical second finite differences in the data"?
To be less vague, I'll put it this way. If you use the trapezoidal approximation, (f(a)+f(b))/2*(b-a), to approximate the integral of a quadratic function f(x) from a to b (which is what 'trapz' does,) it can be shown that the error in doing so is precisely equal to the function's constant second derivative, f''(x), times (b-a)^3/12. Hence, for an estimate of this error you need to obtain a good approximation to your data's second derivative within each of your trapezoid intervals.
If your data is very accurate and represents a changing quantity that is reasonably smooth - that is, whose higher derivatives are not too big - then an estimate of the second derivative at the center of four uniformly spaced points can be obtained in terms of a second finite difference:
((f(b+(b-a))-f(b))-(f(a)-f(a-(b-a))))/(2*(b-a)^2)
which approximates the second derivative at a point half way between a and b. Combining this with the previous estimate gives us
((f(b+(b-a))-f(b))-(f(a)-f(a-(b-a))))*(b-a)/24
for the estimated error within the single interval from a to b. The absolute sum of all these would be a fairly conservative estimate of your total error.
Let me emphasize that if you have somewhat noisy data, this estimate becomes overly pessimistic since the second difference would be excessively large and not representative of the true second derivative. In that case it would be necessary to use appropriate filters covering a larger span of points to get the necessary accuracy. I won't go into that now.
You will find the matlab function 'diff' useful in calculating the above second difference.
If your data is very accurate and represents a changing quantity that is reasonably smooth - that is, whose higher derivatives are not too big - then an estimate of the second derivative at the center of four uniformly spaced points can be obtained in terms of a second finite difference:
I wonder, though, whether the error estimated for trapz would be very accurate (or useful) in that case. If your data is already so accurate as to allow a good finite difference approx, the error in trapz would be rather small. Would you even need to know the error in such a case? Or if you did want accurate estimates of errors that small, mightn't the additional errors introduced by the finite differencing make that difficult?
Well, that depends on how closely-spaced your intervals are in relation to the magnitude of higher derivatives. As an example I computed the integral of sin(x) from 0 to pi where the exact answer would be 2. Admittedly with matlab doing the computations the data is very precise and therefore the second differences are accurate. I divided that range into first 6 intervals and then 100 intervals. Here are the results:
6 intervals
actual error by trapz - 0.04590276668629
est. error, 2nd diff. - 0.04363323129986
100 intervals
actual error by trapz - 0.00016449611255687
est. error, 2nd diff. - 0.00016446634993921
In each case the estimated error is fairly accurate percentage-wise. However in the second case the data has to be very accurate to achieve this with second differencing.
The remedy when data is not sufficiently accurate is to widen the span of the area used to estimate the second derivative using appropriate filters such as those obtained from the second derivative of a normal density distribution with appropriately selected standard deviation.
So, which numerical integration method deals best with noisy data..?
That is a different question from that of the accuracy of the error estimation. My point above was that estimating the trapz error with second differences is particularly sensitive to noise in data and in such cases the estimates can be made more accurate by increasing the span of points used to estimate second derivatives. If you look at the curve of the second derivative of a normal distribution, you will see how a filter can be designed to cover a span of several points in making such an estimate.
As to the question of comparing trapz with other possible methods of integration for discrete data, if the data is noisy probably trapz is the best method to use, precisely because its error depends only on the second derivative. For accurate, smooth data, other methods using higher order approximation are superior in accuracy. You will find several of these in the File Exchange.

Sign in to comment.

Asked:

on 1 Jan 2013

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!