How can I avoid a small value ignored during calculation?

11 views (last 30 days)
z is a small value, and wen a1 is added by z, it doesn't show any difference. Why is that? And how can I aviod this?

Accepted Answer

John D'Errico
John D'Errico on 10 Nov 2022
Welcome to the wonderful, wacky world of floating point arithmetic. You need to understand that floating point numbers (doubles) are stored using an IEE standard, where only 52 binary bits are used to represent the mantissa. Effectively, that gives you around 16 dignificant digits when represented as a decimal. So if you add 1e-19 to the number 1, MATLAB rounds the result to the nearest value representable by a double. Effectively, as far as MATLAB is concerned,
1 == (1 + 1e-19)
ans =
logical
1
In MATLAB, even if you do this:
a1=-0.0581375401465599531136696498379023978486657142639160156250000000000000;
format long
a1
a1 =
-0.058137540146560
all of those digits are in general not stored. Only the first 16 or so. Beyond that, you generally lose all of the extra digits. However, the number you provided is in fact, EXACTLY representable in 52 binary bits, since we can write it exactly as:
-sum(2.^[-5 -6 -7 -9 -10 -11 -15 -20 -21 -23 -28 -32 -33 -36 -37 -40 -42 -43 -45 -46 -47 -48 -49 -50 -53 -55 -57])
ans =
-0.058137540146560
sprintf('%0.60f',ans)
ans =
'-0.058137540146559953113669649837902397848665714263916015625000'
If you want to know the stride between two numbers, such that they are different by one bit in the least significant bit for a1, that is given by
eps(a1)
ans =
6.938893903907228e-18
  3 Comments
Walter Roberson
Walter Roberson on 11 Nov 2022
Edited: Walter Roberson on 11 Nov 2022
MATLAB does not have a long data type.
The terminology of long is often associated with C and C++ .
In C long does not have fixed meaning. In C, a long integer has width at least as high as a regular integer, and possibly wider -- but for the purposes of the C standard, a long int could be as little as 16 bits.
In C++ a long int is at least 32 bits.
In C, double is at least 32 bits, but much more often is IEEE 754 double, 64 bits (that is, 52 bits mantissa). In C, long double is at least as big as double but could be wider.
in C++, double is "double precision floating-point type. Matches IEEE-754 binary64 format if supported." and long double is "extended precision floating-point type. Matches IEEE-754 binary128 format if supported, otherwise matches IEEE-754 binary64-extended format if supported, otherwise matches some non-IEEE-754 extended floating-point format as long as its precision is better than binary64 and range is at least as good as binary64, otherwise matches IEEE-754 binary64 format. "
MATLAB does not use the phrase long at all. It has single (IEEE 754 binary single precision, 32 bit word) and double (IEEE 754 binary double precision, 64 bit word) . For integers it has int8, int16, int32, int64 and their unsigned versions such as uint32.
MATLAB never calculates floating point at more than 64 bits -- not unless you are using Symbolic Toolbox (or are interfacing to some external class such as in Java or Python)
Note that the command
format long
has nothing to do with how calculations are done and only affects how results are displayed. The long does not have to do with any kind of extended precision mathematics: it just means more digits where short means fewer digits
Walter Roberson
Walter Roberson on 11 Nov 2022
If you have an input parameter and you are not certain whether the user passeed in a single or a double then just convert it yourself,
gamma_factor = double(gamma_factor) %for example

Sign in to comment.

More Answers (1)

Patrik Forssén
Patrik Forssén on 20 Nov 2022
Edited: Patrik Forssén on 20 Nov 2022
@John D'Errico explained why this happens. If you really need to avoid this, you must therefore use an arbitrary-precision numerical class for your calculations. MATLAB does not have one, but you can interface Java’s. Here is what your example would look like,
z1 = java.math.BigDecimal('1.111111e-19');
a1 = java.math.BigDecimal('-0.0581375401465599531136696498379023978486657142639160156250000000000000');
a2 = a1.add(z1);
disp(char(z1.toPlainString()))
disp(char(a1.toPlainString()))
disp(char(a2.toPlainString()))
0.0000000000000000001111111
-0.0581375401465599531136696498379023978486657142639160156250000000000000
-0.0581375401465599530025585498379023978486657142639160156250000000000000

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!