The C11 standard introduced a set of macros (CMPLX, CMPLXF, CMPLXL) that
can be used to construct complex numbers from a pair of real and
imaginary numbers. Unfortunately, they require some compiler support,
which is why we only define them for Clang and GCC>=4.7.
The cpack() function in libm performs the same task as CMPLX(), but
cannot be used to generate compile-time constants. This means that all
invocations of cpack() can safely be replaced by C11's CMPLX(). To keep
the code building with GCC 4.2, provide copies of CMPLX() that can at
least be used to generate run-time complex numbers.
This makes it easier to build some of the functions outside of libm.
for |x| small.
While here, remove the explicit cast of 0.25 to float. Replace
a multiplication involving 0.25 by a division using an integer
constant 4. Make a similar change in j0() to minimize the diff.
Suggested by: bde
It is automatically set when -fPIC is passed to the compiler.
Reviewed by: dim, kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D1179
A variant of this code has been tested on amd64/i386 for some time by
EMC/Isilon on 10-STABLE/11-CURRENT. It builds on other architectures, but the
code will remain off until it's proven it works on virtual hardware or real
hardware on other architectures
Sponsored by: EMC / Isilon Storage Division
lgamma(x) = -log(x) - log(1+x) + x*(1-g) + x**2*P(x) with g = 0.57...
being the Euler constant and P(x) a polynomial. Substitution of small
into the RHS shows that the last 3 terms are negligible in comparison to
the leading term. The choice of 3 may be conservative.
The value large=2**(p+3) is detemined from Stirling's approximation
lgamma(x) = x*(log(x)-1) - log(x)/2 + log(2*pi)/2 + P(1/x)/x
Again, substitution of large into the RHS reveals the last 3 terms
are negligible in comparison to the leading term.
Move the x=+-0 special case into the |x|<small block.
In the ld80 and ld128 implementaion, use fdlibm compatible comparisons
involving ix, lx, and llx. This replaces several floating point
comparisons (some involving fabsl()) and also fixes the special cases
x=1 and x=2.
While here
. Remove unnecessary parentheses.
. Fix/improve comments due to the above changes.
. Fix nearby whitespace.
* src/e_lgamma_r.c:
. Sort declaration.
. Remove unneeded explicit cast for type conversion.
. Replace a double literal constant by an integer literal constant.
* src/e_lgammaf_r.c:
. Sort declaration.
* ld128/e_lgammal_r.c:
. Replace a long double literal constant by a double literal constant.
* ld80/e_lgammal_r.c:
. Remove unused '#include float.h'
. Replace a long double literal constant by a double literal constant.
Requested by: bde
. Hook e_lgammal[_r].c to the build.
. Create man page links for lgammal[-r].3.
* Symbol.map:
. Sort lgammal to its rightful place.
. Add FBSD_1.4 section for the new lgamal_r symbol.
* ld128/e_lgammal_r.c:
. 128-bit implementataion of lgammal_r().
* ld80/e_lgammal_r.c:
. Intel 80-bit format implementation of lgammal_r().
* src/e_lgamma.c:
. Expose lgammal as a weak reference to lgamma for platforms
where long double is mapped to double.
* src/e_lgamma_r.c:
. Use integer literal constants instead of real literal constants.
Let compiler(s) do the job of conversion to the appropriate type.
. Expose lgammal_r as a weak reference to lgamma_r for platforms
where long double is mapped to double.
* src/e_lgammaf_r.c:
. Fixed the Cygnus Support conversion of e_lgamma_r.c to float.
This includes the generation of new polynomial and rational
approximations with fewer terms. For each approximation, include
a comment on an estimate of the accuracy over the relevant domain.
. Use integer literal constants instead of real literal constants.
Let compiler(s) do the job of conversion to the appropriate type.
This allows the removal of several explicit casts of double values
to float.
* src/e_lgammal.c:
. Wrapper for lgammal() about lgammal_r().
* src/imprecise.c:
. Remove the lgamma.
* src/math.h:
. Add a prototype for lgammal_r().
* man/lgamma.3:
. Document the new functions.
Reviewed by: bde
inf and raises the divided-by-zero exception. Compilers
constant fold one/zero to inf but do not raise the exception.
Introduce a volatile vzero to prevent the constant folding.
Move the declaration of zero into the main declaration block.
While here, fix a nearby disordering of 'lx,ix'
Discussed with: bde
Remove parentheses in a return statement to be consistent with the
rest of the file.
Rename sin_pi() in the float version to sin_pif().
Remove large comment that precedes sin_pif(). The comment
duplicates a comment in e_lgamma_r.c where the algorithm
is documented.
Requested by: bde
sin_pi(x) is only called for x < 0 and |x| < 2**(p-1) where p is
the precision of x. The new argument reduction is an optimization
compared to the old code, and it removes a chunk of dead code.
Accuracy tests in the intervals (-21,-20), (-20,-19), ... (-1,0)
show no differences between the old and new code.
Obtained from: bde
By Richard Earnshaw at ARM
>
>GCC has for a number of years provides a set of pre-defined macros for
>use with determining the ISA and features of the target during
>pre-processing. However, the design was always somewhat cumbersome in
>that each new architecture revision created a new define and then
>removed the previous one. This meant that it was necessary to keep
>updating the support code simply to recognise a new architecture being
>added.
>
>The ACLE specification (ARM C Language Extentions)
>(http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.set.swdev/index.html)
>provides a much more suitable interface and GCC has supported this
>since gcc-4.8.
>
>This patch makes use of the ACLE pre-defines to map to the internal
>feature definitions. To support older versions of GCC a compatibility
>header is provided that maps the traditional pre-defines onto the new
>ACLE ones.
Stop using __FreeBSD_ARCH_armv6__ and switch to __ARM_ARCH >= 6 in the
couple of places in tree. clang already implements ACLE. Add a define
that says we implement version 1.1, even though the implementation
isn't quite complete.
and tgammal in libm. These functions are part of ISO/IEC 9899:1999
and their prototypes should have been moved into the appropriate
__ISO_C_VISIBLE >= 1999 section. After moving the prototypes,
remnants of r236148 can be removed.
PR: standards/191754
Reviewed by: bde
. Add s_erfl.c to building libm.
. Add MLINKS for erfl.3 and erfcl.3.
* Symbol.map:
. Move erfl and erfcl to their proper location.
* ld128/s_erfl.c:
. Implementations of erfl and erfcl in the IEEE 754 128-bit format.
* ld80/s_erfl.c:
. Implementations of erfl and erfcl in the Intel 80-bit format.
* man/erf.3:
. Document the new functions.
. While here, remove an incomplete sentence.
* src/imprecise.c:
. Remove the stupidity of mapping erfl and erfcl to erf and erfc.
* src/math.h:
. Move the declarations of erfl and erfcl to their proper place.
* src/s_erf.c:
. For architectures where double and long double are the same
floating point format, use weak references to map erfl to
erf and ercl to erfc.
Reviewed by: bde (many earlier versions)
* Update the domain and range of comments for the polynomial
approximations, including using the the correct variable names
(e.g., pp(x) instead of p(x)).
* Use hex values of the form 0x3e0375d4 instead of 0x1.06eba8p-3,
which was obtained from printf("%.6a").
* In the domain [0.84375, 1.25], qa(x) can be reduced from a 4th
order polynomial to 3rd order.
* In the domain [1.25,1/0.35], sa(x) can be reduced from a 4th
order polynomial to 3rd order.
* In the domain [1/0.35, 11], the 4th order polynomials rb(x) and
sb(x) can be reduced to 2nd and 3rd order, respectively.
from erronously constant folding expressions of the form
'1 - tiny'. This allows erf[f](x) to raise INEXACT.
* Use 0.5, 1, and 2, which are exactly representable in radix-2
floating point formats. This reduces diffs between s_erf[fl].c.
* While here, add a comment about efx and efx8.
This includes:
o All directories named *ia64*
o All files named *ia64*
o All ia64-specific code guarded by __ia64__
o All ia64-specific makefile logic
o Mention of ia64 in comments and documentation
This excludes:
o Everything under contrib/
o Everything under crypto/
o sys/xen/interface
o sys/sys/elf_common.h
Discussed at: BSDcan
* msun/man/sinh.3:
* msun/man/tanh.3:
. Fix grammar.
* msun/src/e_coshl.c:
* msun/src/e_sinhl.c:
. Fix comment.
* msun/src/s_tanhl.c:
. Remove unused variables.
. Fix location/indentation of comments.
. Use comparison involving ints instead of long double.
. Re-order polynomial evaluation on ld128 for |x| < 0.25.
For now, retain the older order in an "#if 0 ... #else" block.
. Use int comparison to short-circuit the |x| < 1.5 condition.
Requested by: bde
. Hook coshl, sinhl, and tanhl into libm.
. Create symbolic links for corresponding manpages.
. While here remove a nearby extraneous space.
* Symbol.map:
* src/math.h:
. Move coshl, sinhl, and tanhl to their proper locations.
* man/cosh.3:
* man/sinh.3:
* man/tanh.3:
. Update the manpages.
* src/e_cosh.c:
* src/e_sinh.c:
* src/s_tanh.c:
. Add weak reference for LBDL_MANT_DIG==53 targets.
* src/imprecise.c:
. Remove the coshl, sinhl, and tanhl kludge.
* src/e_coshl.c:
. ld80 and ld128 implementation of coshl().
* src/e_sinhl.c:
. ld80 and ld128 implementation of sinhl().
* src/s_tanhl.c:
. ld80 and ld128 implementation of tanhl().
Obtained from: bde (mostly), das and kargl
* ld128/k_expl.h:
. Split out a computational kernel,__k_expl(x, &hi, &lo, &k) from expl(x).
x must be finite and not tiny or huge. The kernel returns hi and lo
values for extra precision and an exponent k for a 2**k scale factor.
. Define additional kernels k_hexpl() and hexpl() that include a 1/2
scaling and are used by the hyperbolic functions.
* ld80/s_expl.c:
* ld128/s_expl.c:
. Use the __k_expl() kernel.
Obtained from: bde
* Use bit twiddling. This requires inclusion of math_private.h
and inclusion of float.h in s_roundl.c. Raise invalid exception.
* Use literal integer constants where possible. Let the compiler
do the appropriate conversion.
* In s_roundf.c, use an F suffix on float constants instead of
promoting float to double and then converting the result back
to float. In s_roundl.c, use an L suffix.
* In s_roundl.c, use the ENTERI and RETURNI macros. This requires
the inclusion of fpmath.h and on __i386__ class hardware ieeefp.h.
Reviewed by: bde
preprocessor) gives the following error:
--- Version.map ---
<stdin>:287:4: error: invalid preprocessing directive
# Implemented as weak aliases for imprecise versions
^
1 error generated.
Change the comment to a C-style one, to prevent this error.
Approved by: re (hrs)
These are weak and so can be replaced by other versions in applications
that choose to do so, and will give a linker warning when used so that
applications that rely on the extra precision can avoid them.
Note that since the C/C++ specs only guarantee that long double has
precision equal to double, code that actually relies on these functions
having greater precision is unportable at best and broken at worst.
. Use integer literal constants instead of double literal constants.
* s_erff.c:
. Use integer literal constants instead of casting double literal
constants to float.
. Update the threshold values from those carried over from erf() to
values appropriate for float.
. New sets of polynomial coefficients for the rational approximations.
These coefficients have little, but positive, effect on the maximum
error in ULP in the four intervals, but do improve the overall
speed of execution.
. Remove redundant GET_FLOAT_WORD(ix,x) as hx already contained the
contents that is packed into ix.
. Update the mask that is used to zero-out lower-order bits in x in
the intervals [1.25, 2.857143] and [2.857143, 12]. In tests on
amd64, this change improves the maximum error in ULP from 6.27739
and 63.8095 to 3.16774 and 2.92095 on these intervals for erffc().
Reviewed by: bde
as a fairly faithful implementation of the algorithm found in
PTP Tang, "Table-driven implementation of the Expm1 function
in IEEE floating-point arithmetic," ACM Trans. Math. Soft., 18,
211-222 (1992).
Over the last 18-24 months, the code has under gone significant
optimization and testing.
Reviewed by: bde
Obtained from: bde (most of the optimizations)
* Use integral numerical constants, and let the compiler do the
conversion to long double.
ld128/s_expl.c:
* Use integral numerical constants, and let the compiler do the
conversion to long double.
* Use the ENTERI/RETURNI macros, which are no-ops on ld128. This
however makes the ld80 and ld128 identical.
Reviewed by: bde (as part of larger diff)
* In the special case x = -Inf or -NaN, use a micro-optimization
to eliminate the need to access u.xbits.man.
* Fix an off-by-one for small arguments |x| < 0x1p-65.
ld128/s_expl.c:
* In the special case x = -Inf or -NaN, use a micro-optimization
to eliminate the need to access u.xbits.manh and u.xbits.manl.
* Fix an off-by-one for small arguments |x| < 0x1p-114.
Obtained from: bde
* Update the evaluation of the polynomial. This allows the removal
of the now unused variables t23 and t45.
ld128/s_expl.c:
* Update the evaluation of the polynomial and the intermediate
result t. This update allows several numerical constants to be
written as double rather than long double constants. Update
the constants as appropriate.
Obtained from: bde
and use macros to access the e component of the unions. This allows
the portions of the code in ld80 to be identical to the ld128 code.
Obtained from: bde
The names now coincide with the name used in PTP Tang's paper.
* Rename the variable from s to tbl to better reflect that
this is a table, and to be consistent with the naming scheme
in s_exp2l.c
Reviewed by: bde (as part of larger diff)
* Update Copyright years to include 2013.
ld128/s_expl.c:
* Correct and update Copyright years. This code originated from
the ld80 version, so it should reflect the same time period.
Reviewed by: bde (as part of larger diff)
* Use ENTERI/RETURNI to allow the use of FP_PE on i386 target.
Reviewed by: das (and bde a long time ago)
Approved by: das (mentor)
Obtained from: bde (polynomial coefficients)
are workarounds for various symptoms of the problem described in clang
bugs 3929, 8100, 8241, 10409, and 12958.
The regression tests did their job: they failed, someone brought it
up on the mailing lists, and then the issue got ignored for 6 months.
Oops. There may still be some regressions for functions we don't have
test coverage for yet.
libc.a and libc_p.a. In addition, define isnan in libm.a and libm_p.a,
but not in libm.so.
This makes it possible to statically link executables using both isnan
and isnanf with libc and libm.
Tested by: kargl
MFC after: 1 week
table and the requirement on trailing zero bits.
* Remove the __aligned() compiler directives as these were found
to have a negative effect on the produced code.
Submitted by: bde
Approved by: das (mentor)
. Change the API for the LD80C by removing the explicit passing
of the sign bit. The sign can be determined from the last
parameter of the macro.
. On i386, load long double by bit manipulations to work around
at least a gcc compiler issue. On non-i386 ld80 architectures,
use a simple assignment.
* ld80/s_expl.c:
. Update the only consumer of LD80C.
Submitted by: bde
Approved by: das (mentor)
. Fix the threshold for expl(x) where |x| is small.
. Also update the previously incorrect comment to match the
new threshold.
* ld128/s_expl.c:
. Re-order logic in exceptional cases to match the logic used in
other long double functions.
. Fix the threshold for expl(x) where is |x| is small.
. Also update the previously incorrect comment to match the
new threshold.
Submitted by: bde
Approved by: das (mentor)
. Guard a comment from reformatting by indent(1).
. Re-order variables in declarations to alphabetical order.
. Remove a banal comment.
* ld128/s_expl.c:
. Add a comment to point to ld80/s_expl.c for implementation details.
. Move the #define of INTERVAL to reduce the diff with ld80/s_expl.c.
. twom10000 does not need to be volatile, so move its declaration.
. Re-order variables in declarations to alphabetical order.
. Add a comment that describes the argument reduction.
. Remove the same banal comment found in ld80/s_expl.c.
Reviewed by: bde
Approved by: das (mentor)
Also, update the comment to describe the choice of using
a high and low decomposition of 2^(i/INTERNVAL) for
0 <= i <= INTERVAL in preparation for an implementation of
expm1l.
* Move the #define of INTERVAL above the comment, because the
comment refers to INTERVAL.
Reviewed by: bde
Approved by: das (mentor)
only affects i386. The double case was intentionally left broken
as an optimization, but we are getting closer to supporting
applications and/or kernels that change the (FreeBSD i386) default
rounding precision from FP_PD to FP_PE and never change it back,
and this requires the STRICT_ALIGN()s that were added to support
FP_PE to actually work in all precisions.
* Remove an extraneous semicolon at the end of a macro that was
supposed to be function-like.
Submitted by: bde
Approved by: das (mentor)
they need to refer to static constants, which C99 does not allow for
extern inline functions.
While here, change a comment in e_rem_pio2f.c to mention the correct
number of bits.
Reviewed by: bde
MFC after: 1 week
compatibility with the INTERVALS macro used in the soon-to-be-commmitted
expm1l() and someday-to-be-committed log*l() functions.
Add a comment into ld128/s_expl.c noting at gcc issue that was
deleted when rewriting ld80/e_expl.c as ld128/s_expl.c.
Requested by: bde
Approved by: das (mentor)
. Remove a few #ifdefs that should have been removed in the initial
commit.
. Sort fpmath.h to its rightful place.
* ld128/s_expl.c:
. Replace EXPMASK with its actual value.
. Sort fpmath.h to its rightful place.
Requested by: bde
Approved by: das (mentor)
format. These implementations are based on
PTP Tang, "Table-driven implementation of the exponential function
in IEEE floating-point arithmetic," ACM Trans. Math. Soft., 15,
144-157 (1989).
PR: standards/152415
Submitted by: kargl
Reviewed by: bde, das
Approved by: das (mentor)
quotation. Also make sure we have the same amount of columns in each row as
the number of columns we specify in the head arguments.
Reviewed by: brueffer
correct sign when the remainder was 0.
Fix a separate bug in remquo alone, in which the remainder and
quotient were both off by a bit in certain cases involving subnormal
remainders.
The bugs affected all platforms except amd64 and i386, on which the
routines are implemented in assembly.
PR: 166463
Submitted by: Ilya Burylov
MFC after: 2 weeks
the function bodies require only 2 to 10 instructions. However, it
leads to application binaries that refer to a private ABI, namely, the
softfloat innards in libc. This could complicate future changes in
the implementation of the floating-point emulation layer, so it seems
best to have programs refer to the official fe* entry points in libm.
use softfloat.
Thanks to Ian Lepore for testing and debugging this patch. The fenv
regression tests pass (at least for Ian's arm chip) with this change.
- Handle cases where exp(x) would overflow, but ccosh(x) ~= exp(x) / 2
shouldn't.
- Use the ccosh(x) ~= exp(x) / 2 approximation to simplify the calculation
when x is large.
Similarly for csinh(). Also fixed the return value of csinh(-Inf +- 0i).
exp(x) scaled down by some factor, and the challenge is doing this
accurately when exp(x) would overflow. This change replaces all of
the tricks we've been using with common __ldexp_exp() and
__ldexp_cexp() routines that handle all the scaling.
bde plans to improve on this further by moving the guts of exp() into
k_exp.c and handling the scaling in a more direct manner. But the
current approach is simple and adequate for now.
library," since complex.h, tgmath.h, and fenv.h are also part of the
math library. Replace the outdated sentence with some references to
the other parts.
- Rename __kernel_log() to k_log1p().
- Move some of the work that was previously done in the kernel log into
the callers. This enables further refactoring to improve accuracy or
speed, although I don't recall the details.
- Use extra precision when adding the final scaling term, which improves
accuracy.
- Describe and work around compiler problems that break some of the
multiprecision calculations.
A fix for a small bug is also included:
- Add a special case for log*(1). This is needed to ensure that log*(1) == +0
instead of -0, even when the rounding mode is FE_DOWNWARD.
Submitted by: bde
no longer "fast" on sparc64. (It really wasn't to begin with, since
the old implementation was using long doubles, and long doubles are
emulated in software on sparc64.)
round-to-nearest mode when the result, rounded to twice machine
precision, was exactly halfway between two machine-precision
values. The essence of the fix is to simulate a "sticky bit" in
the pathological cases, which is how hardware implementations
break the ties.
MFC after: 1 month
fenv.h that are currently inlined.
The definitions are provided in fenv.c via 'extern inline'
declaractions. This assumes the compiler handles 'extern inline' as
specified in C99, which has been true under FreeBSD since 8.0.
The goal is to eventually remove the 'static' keyword from the inline
definitions in fenv.h, so that non-inlined references all wind up
pointing to the same external definition like they're supposed to.
I am deferring the second step to provide a window where
newly-compiled apps will still link against old math libraries.
(This isn't supported, but there's no need to cause undue breakage.)
Reviewed by: stefanf, bde
on i386-class hardware for sinl and cosl. The hand-rolled argument
reduction have been replaced by e_rem_pio2l() implementations. To
preserve history the following commands have been executed:
svn cp src/e_rem_pio2.c ld80/e_rem_pio2l.h
mv ${HOME}/bde/ld80/e_rem_pio2l.c ld80/e_rem_pio2l.h
svn cp src/e_rem_pio2.c ld128/e_rem_pio2l.h
mv ${HOME}/bde/ld128/e_rem_pio2l.c ld128/e_rem_pio2l.h
The ld80 version has been tested by bde, das, and kargl over the
last few years (bde, das) and few months (kargl). An older ld128
version was tested by das. The committed version has only been
compiled tested via 'make universe'.
Approved by: das (mentor)
Obtained from: bde
with r219571 and re-enable building of cbrtl.
Implement the long double version for the cube root function, cbrtl.
The algorithm uses Newton's iterations with a crude estimate of the
cube root to converge to a result.
Reviewed by: bde
Approved by: das
implementing accurate logarithms in different bases. This is based
on an approach bde coded up years ago.
This function should always be inlined; it will be used in only a few
places, and rudimentary tests show a 40% performance improvement in
implementations of log2() and log10() on amd64.
The kernel takes a reduced argument x and returns the same polynomial
approximation as e_log.c, but omitting the low-order term. The low-order
term is much larger than the rest of the approximation, so the caller of
the kernel function can scale it to the appropriate base in extra precision
and obtain a much more accurate answer than by using log(x)/log(b).
Explanation by Steve:
jn[f](n,x) for certain ranges of x uses downward recursion to compute
the value of the function. The recursion sequence that is generated is
proportional to the actual desired value, so a normalization step is
taken. This normalization is j0[f](x) divided by the zeroth sequence
member. As Bruce notes, near the zeros of j0[f](x) the computed value
can have giga-ULP inaccuracy. I found for the 1st zero of j0f(x) only
the leading decimal digit is correct. The solution to the issue is
fairly straight forward. The zeros of j0(x) and j1(x) never coincide,
so as j0(x) approaches a zero, the normalization constant switches to
j1[f](x) divided by the 2nd sequence member. The expectation is that
j1[f](x) is a more accurately computed value.
PR: bin/144306
Submitted by: Steven G. Kargl <kargl@troutmask.apl.washington.edu>
Reviewed by: bde
MFC after: 7 days
and one under lib/msun/amd64. This avoids adding the identifiers to the
.text section, and moves them to the .comment section instead.
Suggested by: bde
Approved by: rpaulo (mentor)
macro expand to __isnanf() instead of isnanf() for float arguments.
This change is needed because isnanf() isn't declared in strict POSIX
or C99 mode.
Compatibility note: Apps using isnan(float) that are compiled after
this change won't link against an older libm.
Reported by: Florian Forster <octo@verplant.org>
bottom of the manpages and order them consistently.
GNU groff doesn't care about the ordering, and doesn't even mention
CAVEATS and SECURITY CONSIDERATIONS as common sections and where to put
them.
Found by: mdocml lint run
Reviewed by: ru
argument for fnstsw. Explicitely specify sizes for the XMM control and
status word and X87 control and status words.
Reviewed by: das
Tested by: avg
MFC after: 2 weeks
The amd64-specific bits of msun use an undocumented constraint, which is
less likely to be supported by other compilers (such as Clang). Change
the code to use a more common machine constraint.
Obtained from: /projects/clangbsd/
FPA floating-point format is identical to the VFP format,
but is always stored in big-endian.
Introduce _IEEE_WORD_ORDER to describe the byte-order of
the FP representation.
Obtained from: Juniper Networks, Inc
#pragma STDC CX_LIMITED_RANGE ON
the "ON" needs to be in caps. gcc doesn't understand this pragma
anyway and assumes it is always on in any case, but icc supports
it and cares about the case.
conj() instead of using expressions like z * I. The latter is bad for
several reasons:
1. It is implemented using arithmetic, which is unnecessary, and can
generate floating point exceptions, contrary to the requirements on
these functions.
2. gcc implements complex multiplication using a formula that breaks
down for infinities, e.g., it gives INFINITY * I == nan + inf I.
- When y/x is huge, it's faster and more accurate to return pi/2
instead of pi - pi/2.
- There's no need for 3 lines of bit fiddling to compute -z.
- Fix a comment.
at compile time regardless of the dynamic precision, and there's
no way to disable this misfeature at compile time. Hence, it's
impossible to generate the appropriate tables of constants for the
long double inverse trig functions in a straightforward way on i386;
this change hacks around the problem by encoding the underlying bits
in the table.
Note that these functions won't pass the regression test on i386,
even with the FPU set to extended precision, because the regression
test is similarly damaged by gcc. However, the tests all pass when
compiled with a modified version of gcc.
Reported by: bde
- Adjust several constants for float precision. Some thresholds
that were appropriate for double precision were never changed
when these routines were converted to float precision. This
has an impact on performance but not accuracy. (Submitted by bde.)
- Reduce the degrees of the polynomials used. A smaller degree
suffices for float precision.
- In asinf(), use double arithmetic in part of the calculation to
avoid a corner case and some complicated arithmetic involving a
division and some buggy constants. This improves performance and
accuracy.
Max error (ulps):
asinf acosf atanf
before 0.925 0.782 0.852
after 0.743 0.804 0.852
As bde points out, it's cheaper for asin*() and acos*() to use
polynomials instead of rational functions, but that's a task for
another day.
spurious optimizations. gcc doesn't support FENV_ACCESS, so when it
folds constants, it assumes that the rounding mode is always the
default and floating point exceptions never matter.
1. architecture-specific files
2. long double format-specific files
3. bsdsrc
4. src
5. man
The original order was virtually the opposite of this.
This should not cause any functional changes at this time. The
difference is only significant when one wants to override, say, a
generic foo.c with a more specialized foo.c (as opposed to foo.S).
- fma(x, y, z) returns z, not NaN, if z is infinite, x and y are finite,
x*y overflows, and x*y and z have opposite signs.
- fma(x, y, z) doesn't generate an overflow, underflow, or inexact exception
if z is NaN or infinite, as per IEEE 754R.
- If the rounding mode is set to FE_DOWNWARD, fma(1.0, 0.0, -0.0) is -0.0,
not +0.0.
This makes little difference in float precision, but in double
precision gives a speedup of about 30% on amd64 (A64 CPU) and i386
(A64). This depends on fabs[f]() being inline and efficient. The
bit fiddling (or any use of SET_HIGH_WORD(), which libm does too
much because it was best on old 32-bit machines) always causes
packing overheads and sometimes causes stalls in the packing, since
it operates on only part of a variable in the double precision case.
It apparently did cause stalls in a critical path here.
fabs(x+0.0)+fabs(y+0.0) when mixing NaNs. This improves
consistency of the result by making it harder for the compiler to reorder
the operands. (FP addition is not necessarily commutative because the
order of operands makes a difference on some machines iff the operands are
both NaNs.)