This was open-coded in range reduction for trig and exp functions. Now
there are 3 static inline functions rnint[fl]() that replace open-coded
expressions, and type-generic irint() and i64rint() macros that hide the
complications for efficiently using non-generic irint() and irintl()
functions and casts.
Special details:
ld128/e_rem_pio2l.h needs to use i64rint() since it needs a 46-bit integer
result. Everything else only needs a (less than) 32-bit integer result so
uses irint().
Float and double cases now use float_t and double_t locally instead of
STRICT_ASSIGN() to avoid bugs in extra precision.
On amd64, inline asm is now only used for irint() on long doubles. The SSE
asm for irint() on amd64 only existed because the ifdef tangles made the
correct method of simply casting to int for this case non-obvious.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using mis-identified many licenses so this was mostly a manual - error
prone - task.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
* ld128/k_expl.h:
. Split out a computational kernel,__k_expl(x, &hi, &lo, &k) from expl(x).
x must be finite and not tiny or huge. The kernel returns hi and lo
values for extra precision and an exponent k for a 2**k scale factor.
. Define additional kernels k_hexpl() and hexpl() that include a 1/2
scaling and are used by the hyperbolic functions.
* ld80/s_expl.c:
* ld128/s_expl.c:
. Use the __k_expl() kernel.
Obtained from: bde
as a fairly faithful implementation of the algorithm found in
PTP Tang, "Table-driven implementation of the Expm1 function
in IEEE floating-point arithmetic," ACM Trans. Math. Soft., 18,
211-222 (1992).
Over the last 18-24 months, the code has under gone significant
optimization and testing.
Reviewed by: bde
Obtained from: bde (most of the optimizations)
* Use integral numerical constants, and let the compiler do the
conversion to long double.
ld128/s_expl.c:
* Use integral numerical constants, and let the compiler do the
conversion to long double.
* Use the ENTERI/RETURNI macros, which are no-ops on ld128. This
however makes the ld80 and ld128 identical.
Reviewed by: bde (as part of larger diff)
* In the special case x = -Inf or -NaN, use a micro-optimization
to eliminate the need to access u.xbits.man.
* Fix an off-by-one for small arguments |x| < 0x1p-65.
ld128/s_expl.c:
* In the special case x = -Inf or -NaN, use a micro-optimization
to eliminate the need to access u.xbits.manh and u.xbits.manl.
* Fix an off-by-one for small arguments |x| < 0x1p-114.
Obtained from: bde
* Update the evaluation of the polynomial. This allows the removal
of the now unused variables t23 and t45.
ld128/s_expl.c:
* Update the evaluation of the polynomial and the intermediate
result t. This update allows several numerical constants to be
written as double rather than long double constants. Update
the constants as appropriate.
Obtained from: bde
and use macros to access the e component of the unions. This allows
the portions of the code in ld80 to be identical to the ld128 code.
Obtained from: bde
The names now coincide with the name used in PTP Tang's paper.
* Rename the variable from s to tbl to better reflect that
this is a table, and to be consistent with the naming scheme
in s_exp2l.c
Reviewed by: bde (as part of larger diff)
* Update Copyright years to include 2013.
ld128/s_expl.c:
* Correct and update Copyright years. This code originated from
the ld80 version, so it should reflect the same time period.
Reviewed by: bde (as part of larger diff)
table and the requirement on trailing zero bits.
* Remove the __aligned() compiler directives as these were found
to have a negative effect on the produced code.
Submitted by: bde
Approved by: das (mentor)
. Change the API for the LD80C by removing the explicit passing
of the sign bit. The sign can be determined from the last
parameter of the macro.
. On i386, load long double by bit manipulations to work around
at least a gcc compiler issue. On non-i386 ld80 architectures,
use a simple assignment.
* ld80/s_expl.c:
. Update the only consumer of LD80C.
Submitted by: bde
Approved by: das (mentor)
. Fix the threshold for expl(x) where |x| is small.
. Also update the previously incorrect comment to match the
new threshold.
* ld128/s_expl.c:
. Re-order logic in exceptional cases to match the logic used in
other long double functions.
. Fix the threshold for expl(x) where is |x| is small.
. Also update the previously incorrect comment to match the
new threshold.
Submitted by: bde
Approved by: das (mentor)
. Guard a comment from reformatting by indent(1).
. Re-order variables in declarations to alphabetical order.
. Remove a banal comment.
* ld128/s_expl.c:
. Add a comment to point to ld80/s_expl.c for implementation details.
. Move the #define of INTERVAL to reduce the diff with ld80/s_expl.c.
. twom10000 does not need to be volatile, so move its declaration.
. Re-order variables in declarations to alphabetical order.
. Add a comment that describes the argument reduction.
. Remove the same banal comment found in ld80/s_expl.c.
Reviewed by: bde
Approved by: das (mentor)
Also, update the comment to describe the choice of using
a high and low decomposition of 2^(i/INTERNVAL) for
0 <= i <= INTERVAL in preparation for an implementation of
expm1l.
* Move the #define of INTERVAL above the comment, because the
comment refers to INTERVAL.
Reviewed by: bde
Approved by: das (mentor)
compatibility with the INTERVALS macro used in the soon-to-be-commmitted
expm1l() and someday-to-be-committed log*l() functions.
Add a comment into ld128/s_expl.c noting at gcc issue that was
deleted when rewriting ld80/e_expl.c as ld128/s_expl.c.
Requested by: bde
Approved by: das (mentor)
. Remove a few #ifdefs that should have been removed in the initial
commit.
. Sort fpmath.h to its rightful place.
* ld128/s_expl.c:
. Replace EXPMASK with its actual value.
. Sort fpmath.h to its rightful place.
Requested by: bde
Approved by: das (mentor)
format. These implementations are based on
PTP Tang, "Table-driven implementation of the exponential function
in IEEE floating-point arithmetic," ACM Trans. Math. Soft., 15,
144-157 (1989).
PR: standards/152415
Submitted by: kargl
Reviewed by: bde, das
Approved by: das (mentor)