Numerical integration of discrete data using higher precision

31 views (last 30 days)
I have discrete data of a function. I attach the data for cases with 10 points and 100k points and also the plotted function.
The "Y" values of the function near "X=1.57" are very close to each other and zero, like 9.25558265263186E-11 and 5.92357284527227E-11.
Using Trapz function directly, or using Integral function indirectly something like
f1 = griddedInterpolant(x,y); %I used also spline and other methods
f2 = @(t) f1(t);
Result = Integral(f2,0,pi/2)
fails with more than 30% relative error.
I assume it happens because of the roundoff problem and the 16digits precision of matlab.
Is there any way to force Trapz to do the integration using higher precision?
I tried VPA for the given input data to Trapz. But didnt work. Appreciate any suggestion.
  4 Comments
John D'Errico
John D'Errico on 30 Dec 2021
Higher precision does not really help, IF the numbers themselves are not known correctly and exactly.
For example, what is the value of a simple computation:
X = sqrt(sym(2))
X = 
Y = sym(17)^(10*pi) + X
Y = 
In symbolic form, they have nice algebraic values. Hopwever, if we compute those numbers as floating point numbers, within a double precision form?
format long g
dx = double(X)
dx =
1.4142135623731
Effectively, we only know that number now to an accuracy of
eps(dx)
ans =
2.22044604925031e-16
That is the value of the least significant bit. As stored as a double precision number, that number is stored in BINARY form, as
sprintf('%0.55f',dx)
ans = '1.4142135623730951454746218587388284504413604736328125000'
But the true value of sqrt(2) is
vpa(X,55)
ans = 
1.414213562373095048801688724209698078569671875376948073
As you can see, the two numbers begin to diverge after around the 16 decimal place. So once you create a number as a double precision value, you have no information content beyond the 16th decimal digit. And that means all of those numbers you created, that you think are known exactly, are not known as exactly as you think.
Now, how about Y? Actually, Y is a number with many digits to the left of the decimal point.
vpa(Y,55)
ans = 
452577459456794552253165824820364312150.7136136441082774
eps(double(Y))
ans =
7.55578637259143e+22
So the least significant bit in the value of Y, IF we form a double precision mnumber from it, is a big number in itself. Anything below that value is virtually meaningless numerical garbage. So can we form the value X+Y, as a double, and have the lower significant digits mean anything? Certainly not.
dx < eps(double(Y))
ans = logical
1
My point in all of this, is using vpa here is a ruse. It convinces you that you were being "safe" In fact, those numbers, since they were originally stored as doubles, have no information content below a certain point. You could have used a million digits, and not have been safe.

Sign in to comment.

Accepted Answer

John D'Errico
John D'Errico on 29 Dec 2021
Edited: John D'Errico on 29 Dec 2021
I explained in your last question. NO. You cannot force trapz to give you higher precision. As for trapz giving 30% error here? FLAT OUT WRONG. PERIOD.
x = xy(:,1);
y = xy(:,2);
plot(x,y)
If your function is really that smooth and well-behaved, then trapz will not be in that much error.
format long g
max(diff(x))
ans =
1.57081203500109e-05
min(diff(x))
ans =
1.57081203397968e-05
So your data is pretty uniformly spaced in x, but not perfectly so.
If you think the integral is something significantly different from this:
format long g
trapz(x,y)
ans =
0.299617122167918
then however else you have computed the integral is simply wrong.
spl = spline(x,y);
diff(fnval(fnint(spl),[min(x),max(x)]))
ans =
0.299617122167912
I'm sorry, but the area under that curve is quite close to 0.2996 (and stuff). It is not 30% different from that.
Of course, if your fuction has a spike in it somewhere, essentially a delta function between two of the data points, that is so narrow that your data does not show the spike, then this is simply fact.
Is this a question of double precision accuracy? Since your data is entirely positive, there is no issue of subtractive cancellation.
min(y)
ans =
5.62570579352345e-35
Can we do the integration using high precision arithmetic? Well, yes, we could. But the fact is,
y(1)
ans =
1.02293003278743
So anything you add to y(1) that is less than eps(y(1)) is meaningless.
eps(y(1))
ans =
2.22044604925031e-16
Is this a problem of the integration does not use those really small values of y? Um, not really. For example, we could do it in 100 decimal digits of precision. I could do that using syms and vpa, but since I wrote HPF, I'll use that facility, as I know exactly what it is doing. You can download HPF from the file exchange for free.
DefaultNumberOfDigits 100
hy = hpf(y);
hx = hpf(x);
Of course, since we started with double precision numbers, the result will really be little better than trapz.
dx = hx(2:end) - hx(1:end-1);
Now compute a trapezoidal rule. This is trivial to do, as...
sum((hy(1:end-1) + hy(2:end)) .* dx)/2
ans =
0.2996171221679184598357910258813612376824763020165488844962532387299484829118318852066613967171717782
This computation really is a trapzeoidal rule, done fully in 100 decimal digits of precision. So the high precision tapezoidal rule, done in 100 decimal digits of precision agrees with trapz down to the last few digits reported. Remember, trapz gave us...
0.299617122167918
One problem is, the values in the vectors x and y have only been stored to essentially 16 significant digits. We cannot get higher precision than that. Yes I suppose I could use a higher order Newton-Cotes rule. Do I really need to do so here? We won't gain that much at all, because your numbers were stored only to roughly 16 digits of precsion.
But you are telling us this result is in error by 30%? I'm sorry, but I'll claim that is flat out wrong. How do you assert that as fact?
I will say that when you do this:
f1 = griddedInterpolant(x,y);
this is a PURELY linear interpolant. As such, any integral based on that can be no better than trapz. And if you just use integral, that forgets you need to set the tolerance. So part of your problem lies in a lack of nderstanding of numerical methods. (I'm sorry, but this is clear.)
We could try this:
f = griddedInterpolant(x,y,'spline');
integral(@(x) f(x),min(x),max(x),'abstol',1e-15)
ans =
0.299617122167919
Here I have forced griddedInterpolant to be a spline. Then I cranked down on the tolerance. The result? A minor difference in the last decimal digit reported from that of a direct integral of a spline interpolant.
So I'm sorry, but to claim that the true integral is 30% different from this value is pure silliness. (I really want to call it worse than that, but I won't.) If you want to claim that the last significant digit or two may be in error, go ahead. But to do better, you need to have a better way to store these numbers than double precision. It still won't help a lot.
If you show HOW the number were computed, then I might be able to suggest using a tool like HPF to compute the integral in slightly better ways, by computing y directly in a high precision form.
But now please stop posting this same question repeatedly.
  1 Comment
Jan
Jan on 30 Dec 2021
Edited: Jan on 30 Dec 2021
This is a useful answer. Obviously the actual problem is the value of the reference solution.
Some sentences of this answer sound more emotional than usual in this forum. The question concerns Matlab and it contains enough information for a useful answer and therefore it is welcome in this forum. A wrong expectation or mistakes are typical reasons for questions and not silly.

Sign in to comment.

More Answers (1)

Jan
Jan on 29 Dec 2021
You explain, that you have discrete data, but the code you show uses a function. If the data are given in double precision, using a higher precision for the intration has a very limited use only: The result cannot be more accurate than the input.
But the sum is numerically instable. So you can try a summation with error compensation: FEX: XSum
Take a look into the code of trapz. You find this (for equidistant x values):
x * sum((y(1:end-1,:) + y(2:end,:)), 1)/2
Replace this by:
x * XSum((y(1:end-1,:) + y(2:end,:)), 1)/2
  4 Comments
Jan
Jan on 30 Dec 2021
Fine. This is another evidence for the fact, that the accuracy of the integral is not the problem, but that the reference value is incorrect.
Msimon q
Msimon q on 30 Dec 2021
Edited: Msimon q on 30 Dec 2021
Thanks @Jan
My data for the function and the reference solution is coming from a physical problem(very long to explain). Since my code and the the computations have already been verified for a *better function, and since the same problem was reported in a paper (that singularity begins for this setting and the numerical integration becomes harder), I was pretty sure that there is nothing wrong in othe part of my code. However, after what you and others explained to me here, I try to find if i made any mistake there.
* in my physical problem, I can change some parameters, which leads to a better function (not near-zero value around x=1.57). The more my function becomes like this, the more my error increases.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!