[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Interpretation of result
From: |
Przemek Klosowski |
Subject: |
Re: Interpretation of result |
Date: |
Tue, 18 Sep 2012 17:32:46 -0400 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:15.0) Gecko/20120828 Thunderbird/15.0 |
On 09/17/2012 02:59 PM, JuanPi wrote:
Can anybody explain this to a simple mortal who did not take a
rigorous numerical calculus class?
octave:89> 99999999999999999 - 99999999999999998
ans = 0
(that is 17 digits) ...
The common idiom endows Octave's double precision numbers with 16+
digits of precision. This is not quite right---like with many
'advertised features', you should read it as 'is guaranteed not to
exceed' :) The IEEE 754 double precision floating point numbers have 52
bit mantissas, and the precision is given by 2^(-52), also known as
'eps'---it doesn't translate into a nice round decimal number like
1/9..99. The double precision FP representation is explained nicely in
http://en.wikipedia.org/wiki/Double-precision_floating-point_format
The numbers you are interested in would require more than 56 bits of
precision: log(99999999999999999)/log(2) is 56.473. Since the bits
required for such precision aren't there, many consecutive numbers are
encoded by the same IEEE binary representation as you can see by asking
Octave to print all the bits:
format bit; a=(99999999999999991:99999999999999999)'
a =
0100001101110110001101000101011110000101110110001001111111111111
0100001101110110001101000101011110000101110110001001111111111111
0100001101110110001101000101011110000101110110001001111111111111
0100001101110110001101000101011110000101110110001001111111111111
0100001101110110001101000101011110000101110110001001111111111111
0100001101110110001101000101011110000101110110001001111111111111
0100001101110110001101000101011110000101110110001001111111111111
0100001101110110001101000101011110000101110110001001111111111111
0100001101110110001101000101011110000101110110001010000000000000
0100001101110110001101000101011110000101110110001010000000000000
0100001101110110001101000101011110000101110110001010000000000000
0100001101110110001101000101011110000101110110001010000000000000
0100001101110110001101000101011110000101110110001010000000000000
0100001101110110001101000101011110000101110110001010000000000000
0100001101110110001101000101011110000101110110001010000000000000
0100001101110110001101000101011110000101110110001010000000000000
0100001101110110001101000101011110000101110110001010000000000000
As you can see, the integers in that range aren't exactly representable
in IEEE 754 form. What's worse, note that Octave thinks the interval
contains 17 numbers---it can't even keep track of the size of that
interval, because it can't perform reliable calculations on its limits.
This isn't a bug in Octave, just a feature of floating point number
representation.
As the Wikipedia article indicates, the maximum integer value that's
exactly representable as a double is 2^53 or 9,007,199,254,740,992,
which is over ten times smaller than your numbers; anything larger loses
precision and will not result in reliable integer-like results.
format bit; a=2^53+(-5:5)'
a =
0100001100111111111111111111111111111111111111111111111111111011
0100001100111111111111111111111111111111111111111111111111111100
0100001100111111111111111111111111111111111111111111111111111101
0100001100111111111111111111111111111111111111111111111111111110
0100001100111111111111111111111111111111111111111111111111111111
0100001101000000000000000000000000000000000000000000000000000000
0100001101000000000000000000000000000000000000000000000000000000
0100001101000000000000000000000000000000000000000000000000000001
0100001101000000000000000000000000000000000000000000000000000010
0100001101000000000000000000000000000000000000000000000000000010
0100001101000000000000000000000000000000000000000000000000000010
Now, Octave has an uint64 data type which of course does not use first
12 bits for the exponent, so it has 12 more bits of precision. It allows
arithmetic on numbers up to 2^64 or 18,446,744,073,709,551,616. See how
the uint64 numbers increment by one and don't show any gaps or repetitions:
format bit; a=uint64(2^53)+(-5:5)'
a =
0000000000011111111111111111111111111111111111111111111111111011
0000000000011111111111111111111111111111111111111111111111111100
0000000000011111111111111111111111111111111111111111111111111101
0000000000011111111111111111111111111111111111111111111111111110
0000000000011111111111111111111111111111111111111111111111111111
0000000000100000000000000000000000000000000000000000000000000000
0000000000100000000000000000000000000000000000000000000000000001
0000000000100000000000000000000000000000000000000000000000000010
0000000000100000000000000000000000000000000000000000000000000011
0000000000100000000000000000000000000000000000000000000000000100
0000000000100000000000000000000000000000000000000000000000000101