85 lines
3.1 KiB
Plaintext
85 lines
3.1 KiB
Plaintext
Long double format
|
|
==================
|
|
|
|
Each long double is made up of two IEEE doubles. The value of the
|
|
long double is the sum of the values of the two parts (except for
|
|
-0.0). The most significant part is required to be the value of the
|
|
long double rounded to the nearest double, as specified by IEEE. For
|
|
Inf values, the least significant part is required to be one of +0.0
|
|
or -0.0. No other requirements are made; so, for example, 1.0 may be
|
|
represented as (1.0, +0.0) or (1.0, -0.0), and the low part of a NaN
|
|
is don't-care.
|
|
|
|
Classification
|
|
--------------
|
|
|
|
A long double can represent any value of the form
|
|
s * 2^e * sum(k=0...105: f_k * 2^(-k))
|
|
where 's' is +1 or -1, 'e' is between 1022 and -968 inclusive, f_0 is
|
|
1, and f_k for k>0 is 0 or 1. These are the 'normal' long doubles.
|
|
|
|
A long double can also represent any value of the form
|
|
s * 2^-968 * sum(k=0...105: f_k * 2^(-k))
|
|
where 's' is +1 or -1, f_0 is 0, and f_k for k>0 is 0 or 1. These are
|
|
the 'subnormal' long doubles.
|
|
|
|
There are four long doubles that represent zero, two that represent
|
|
+0.0 and two that represent -0.0. The sign of the high part is the
|
|
sign of the long double, and the sign of the low part is ignored.
|
|
|
|
Likewise, there are four long doubles that represent infinities, two
|
|
for +Inf and two for -Inf.
|
|
|
|
Each NaN, quiet or signalling, that can be represented as a 'double'
|
|
can be represented as a 'long double'. In fact, there are 2^64
|
|
equivalent representations for each one.
|
|
|
|
There are certain other valid long doubles where both parts are
|
|
nonzero but the low part represents a value which has a bit set below
|
|
2^(e-105). These, together with the subnormal long doubles, make up
|
|
the denormal long doubles.
|
|
|
|
Many possible long double bit patterns are not valid long doubles.
|
|
These do not represent any value.
|
|
|
|
Limits
|
|
------
|
|
|
|
The maximum representable long double is 2^1024-2^918. The smallest
|
|
*normal* positive long double is 2^-968. The smallest denormalised
|
|
positive long double is 2^-1074 (this is the same as for 'double').
|
|
|
|
Conversions
|
|
-----------
|
|
|
|
A double can be converted to a long double by adding a zero low part.
|
|
|
|
A long double can be converted to a double by removing the low part.
|
|
|
|
Comparisons
|
|
-----------
|
|
|
|
Two long doubles can be compared by comparing the high parts, and if
|
|
those compare equal, comparing the low parts.
|
|
|
|
Arithmetic
|
|
----------
|
|
|
|
The unary negate operation operates by negating the low and high parts.
|
|
|
|
An absolute or absolute-negate operation must be done by comparing
|
|
against zero and negating if necessary.
|
|
|
|
Addition and subtraction are performed using library routines. They
|
|
are not at present performed perfectly accurately, the result produced
|
|
will be within 1ulp of the range generated by adding or subtracting
|
|
1ulp from the input values, where a 'ulp' is 2^(e-106) given the
|
|
exponent 'e'. In the presence of cancellation, this may be
|
|
arbitrarily inaccurate. Subtraction is done by negation and addition.
|
|
|
|
Multiplication is also performed using a library routine. Its result
|
|
will be within 2ulp of the correct result.
|
|
|
|
Division is also performed using a library routine. Its result will
|
|
be within 3ulp of the correct result.
|