91 lines
3.3 KiB
Text
91 lines
3.3 KiB
Text
Long double format
|
||
==================
|
||
|
||
Each long double is made up of two IEEE doubles. The value of the
|
||
long double is the sum of the values of the two parts (except for
|
||
-0.0). The most significant part is required to be the value of the
|
||
long double rounded to the nearest double, as specified by IEEE. For
|
||
Inf values, the least significant part is required to be one of +0.0
|
||
or -0.0. No other requirements are made; so, for example, 1.0 may be
|
||
represented as (1.0, +0.0) or (1.0, -0.0), and the low part of a NaN
|
||
is don't-care.
|
||
|
||
Classification
|
||
--------------
|
||
|
||
A long double can represent any value of the form
|
||
s * 2^e * sum(k=0...105: f_k * 2^(-k))
|
||
where 's' is +1 or -1, 'e' is between 1022 and -968 inclusive, f_0 is
|
||
1, and f_k for k>0 is 0 or 1. These are the 'normal' long doubles.
|
||
|
||
A long double can also represent any value of the form
|
||
s * 2^-968 * sum(k=0...105: f_k * 2^(-k))
|
||
where 's' is +1 or -1, f_0 is 0, and f_k for k>0 is 0 or 1. These are
|
||
the 'subnormal' long doubles.
|
||
|
||
There are four long doubles that represent zero, two that represent
|
||
+0.0 and two that represent -0.0. The sign of the high part is the
|
||
sign of the long double, and the sign of the low part is ignored.
|
||
|
||
Likewise, there are four long doubles that represent infinities, two
|
||
for +Inf and two for -Inf.
|
||
|
||
Each NaN, quiet or signalling, that can be represented as a 'double'
|
||
can be represented as a 'long double'. In fact, there are 2^64
|
||
equivalent representations for each one.
|
||
|
||
There are certain other valid long doubles where both parts are
|
||
nonzero but the low part represents a value which has a bit set below
|
||
2^(e-105). These, together with the subnormal long doubles, make up
|
||
the denormal long doubles.
|
||
|
||
Many possible long double bit patterns are not valid long doubles.
|
||
These do not represent any value.
|
||
|
||
Limits
|
||
------
|
||
|
||
The maximum representable long double is 2^1024-2^918. The smallest
|
||
*normal* positive long double is 2^-968. The smallest denormalised
|
||
positive long double is 2^-1074 (this is the same as for 'double').
|
||
|
||
Conversions
|
||
-----------
|
||
|
||
A double can be converted to a long double by adding a zero low part.
|
||
|
||
A long double can be converted to a double by removing the low part.
|
||
|
||
Comparisons
|
||
-----------
|
||
|
||
Two long doubles can be compared by comparing the high parts, and if
|
||
those compare equal, comparing the low parts.
|
||
|
||
Arithmetic
|
||
----------
|
||
|
||
The unary negate operation operates by negating the low and high parts.
|
||
|
||
An absolute or absolute-negate operation must be done by comparing
|
||
against zero and negating if necessary.
|
||
|
||
Addition and subtraction are performed using library routines. They
|
||
are not at present performed perfectly accurately, the result produced
|
||
will be within 1ulp of the range generated by adding or subtracting
|
||
1ulp from the input values, where a 'ulp' is 2^(e-106) given the
|
||
exponent 'e'. In the presence of cancellation, this may be
|
||
arbitrarily inaccurate. Subtraction is done by negation and addition.
|
||
|
||
Multiplication is also performed using a library routine. Its result
|
||
will be within 2ulp of the correct result.
|
||
|
||
Division is also performed using a library routine. Its result will
|
||
be within 3ulp of the correct result.
|
||
|
||
|
||
Copyright (C) 2004-2022 Free Software Foundation, Inc.
|
||
|
||
Copying and distribution of this file, with or without modification,
|
||
are permitted in any medium without royalty provided the copyright
|
||
notice and this notice are preserved.
|