421 Commits

Author SHA1 Message Date
David Schultz
a2d171e440 Optimize this a bit better.
Submitted by:	bde (although these aren't all of his changes)
2008-01-15 23:31:24 +00:00
David Schultz
d3f9671a7d Implement rintl(), nearbyintl(), lrintl(), and llrintl().
Thanks to bde@ for feedback and testing of rintl().
2008-01-14 02:12:07 +00:00
David Schultz
73b2958b94 - Correct the range check in the double version to catch negative values
that would overflow.
- Style fixes and improved handling of NaNs suggested by bde.
2008-01-11 04:18:25 +00:00
David Schultz
45310fdb5d Grumble. DO declare logbl(), DON'T declare logl() just yet.
bde is going to commit logl() Real Soon Now.
I'm just trying to slow him down with merge conflicts.

Noticed by:	bde
2007-12-20 03:16:55 +00:00
David Schultz
58c9a67ed7 Remove the declaration of logl(). The relevant bits haven't been
committed yet, but the declaration leaked in when I added nan() and
friends.

Reported by:	pav
2007-12-20 00:06:33 +00:00
David Schultz
7cd4a83267 Since nan() is supposed to work the same as strtod("nan(...)", NULL),
my original implementation made both use the same code. Unfortunately,
this meant libm depended on a vendor header at compile time and previously-
unexposed vendor bits in libc at runtime.

Hence, I just wrote my own version of the relevant vendor routine. As it
turns out, mine has a factor of 8 fewer of lines of code, and is a bit more
readable anyway. The strtod() and *scanf() routines still use vendor code.

Reviewed by:	bde
2007-12-18 23:46:32 +00:00
David Schultz
0ba1fd2f72 Remove z_abs(). The z_*() functions were in libf77, and for some reason
someone thought it would be a good idea to copy z_abs() to libm in 1994.
However, it's never been declared or documented anywhere, and I'm
reasonably confident that nobody uses it.

Discussed with: bde, deischen, kan
2007-12-18 01:15:20 +00:00
Bruce Evans
ccef8c4fcb Oops, the previous commit was not needed -- the file was committed but
not checked out due to my checkout error.
2007-12-17 18:21:23 +00:00
Bruce Evans
a18b106ffc Translate from the i386 so that this compiles and runs.
I hope that this and the i386 version of it will not be needed, but
this is currently about 16 cycles or 36% faster than the C version,
and the i386 version is about 8 cycles or 19% faster than the C
version, due to poor optimization of the C version.
2007-12-17 18:12:06 +00:00
Bruce Evans
9ed67737f2 Don't try to build s_nanl.c before it is committed. 2007-12-17 13:20:38 +00:00
David Schultz
6821aba9e5 Add logbl(3) to libm. 2007-12-17 03:53:38 +00:00
David Schultz
3be0479b4c Document the fact that we have nan(3) now, and make some minor clarifications
in other places.
2007-12-17 01:04:43 +00:00
David Schultz
4b6b574455 Implement and document nan(), nanf(), and nanl(). This commit
adds two new directories in msun: ld80 and ld128. These are for
long double functions specific to the 80-bit long double format
used on x86-derived architectures, and the 128-bit format used on
sparc64, respectively.
2007-12-16 21:19:28 +00:00
David Schultz
3c27af2a44 1. Add csqrt{,f}(3).
2. Put carg{,f}(3) under the FBSD_1.1 namespace where it belongs
   (requested by kan@)
2007-12-15 08:39:03 +00:00
David Schultz
aaf70b2314 Implement and document csqrt(3) and csqrtf(3). 2007-12-15 08:38:44 +00:00
David Schultz
ce448a2e74 Update the standards section, and make a minor clarification about the
return value of sqrt.
2007-12-14 07:53:09 +00:00
David Schultz
80f974729f Typo in previous commit 2007-12-14 03:08:10 +00:00
David Schultz
39ebc398b6 Symbol.map additions for carg and cargf. (They're in C99, so I didn't
add a new version for them.)
2007-12-14 03:06:50 +00:00
David Schultz
9768c3fea8 s/C90/C99/ 2007-12-12 23:50:00 +00:00
David Schultz
367d55260f Add a "STANDARDS" section. 2007-12-12 23:49:40 +00:00
David Schultz
205bd64894 Implement carg(3) and cargf(3).
Rotting in an old src tree since: March 2005
2007-12-12 23:43:51 +00:00
Bruce Evans
b5e547df33 Oops, back out previous commit since it was backwards to a wrong branch. 2007-06-14 05:57:13 +00:00
Bruce Evans
d382c5ebb4 MFC: 1.11: fix the threshold for (not) using the simple Taylor approximation. 2007-06-14 05:51:00 +00:00
Bruce Evans
a8a2e00ebf Fix an aliasing bug which was finally detected by gcc-4.2. fdlibm has
hundreds of similar aliasing bugs, but all except this one seem to have
been fixed by Cygnus and/or NetBSD before the modified version of fdlibm
was imported into FreeBSD in 1994.

PR:		standards/113147
Submitted by:	Steve Kargl <sgk@troutmask.apl.washington.edu>
2007-06-11 07:48:52 +00:00
Bruce Evans
20a990117d Merge the relevant part of rev.1.14 of s_cbrt.c (a micro-optimization
involving moving the check for x == 0).  The savings in cycles are
smaller for cbrtf() than for cbrt(), and positive in all measured cases
with gcc-3.4.4, but still very machine/compiler-dependent.
2007-05-29 07:13:07 +00:00
Daniel Eischen
419ecd5dee Bump library versions in preparation for 7.0.
Ok'd by:	kan
2007-05-21 02:49:08 +00:00
Daniel Eischen
00fb440c1a Enable symbol versioning by default. Use WITHOUT_SYMVER to disable it.
Warning, after symbol versioning is enabled, going back is not easy
(use WITHOUT_SYMVER at your own risk).

Change the default thread library to libthr.

There most likely still needs to be a version bump for at least the
thread libraries.  If necessary, this will happen later.
2007-05-13 14:12:40 +00:00
Bruce Evans
9698b3b564 Don't assume that int is signed 32-bits in one place. Keep assuming
that ints have >= 31 value bits elsewhere.  s/int/int32_t/ seems to
have been done too globally for all other files in msun/src before
msun/ was imported into FreeBSD.

Minor fixes in comments.

e_lgamma_r.c:
Describe special cases in more detail:
- exception for lgamma(0) and lgamma(neg.integer)
- lgamma(-Inf) = Inf.  This is wrong but is required by C99 Annex F.  I
  hope to change this.
2007-05-02 16:54:22 +00:00
Bruce Evans
e95cc9b700 Fix tgamma() on some special args:
(1) tgamma(-Inf) returned +Inf and failed to raise any exception, but
    should always have raised an exception, and should behave like
    tgamma(negative integer).
(2) tgamma(negative integer) returned +Inf and raised divide-by-zero,
    but should return NaN and raise "invalid" on any IEEEish system.
(3) About half of the 2**52 negative intgers between -2**53 and -2**52
    were misclassified as non-integers by using floor(x + 0.5) to round
    to nearest, so tgamma(x) was wrong (+-0 instead of +Inf and now NaN)
    on these args.  The floor() expression is hard to use since rounding
    of (x + 0.5) may give x or x + 1, depending on |x| and the current
    rounding mode.  The fixed version uses ceil(x) to classify x before
    operating on x and ends up being more efficient since ceil(x) is
    needed anyway.
(4) On at least the problematic args in (3), tgamma() raised a spurious
    inexact.
(5) tgamma(large positive) raised divide-by-zero but should raise overflow.
(6) tgamma(+Inf) raised divide-by-zero but should not raise any exception.
(7) Raise inexact for tiny |x| in a way that has some chance of not being
    optimized away.

The fix for (5) and (6), and probably for (2), also prevents -O optimizing
away the exception.

PR:		112180 (2)
Standards:	Annex F in C99 (IEC 60559 binding) requires (1), (2) and (6).
2007-05-02 15:24:49 +00:00
Bruce Evans
dd936b27fc Document (in a comment) the current (slightly broken) handling of special
values in more detail, and change the style of this comment to be closer
to fdlibm and C99:
- tgamma(-Inf) was undocumented and is wrong (+Inf, should be NaN)
- tgamma(negative integer) is as intended (+Inf) but not best for IEEE-754
  (NaN)
- tgamma(-0) was documented as being wrong (+Inf) but was correct (-Inf)
- documentation of setting of exceptions (overflow, etc.) was more
  complete here than in most of libm, but was further from matching
  the actual setting than in most of libm, due to various bugs here
  (primarily, always evaluating +Inf one/zero and getting unwanted
  divide-by-zero exceptions from this).  Now the actual behaviour with
  gcc -O0 is documented.  Optimization still breaks setting of exceptions
  all over libm, so nothing can depend on this working.
- tgamma(NaN)'s exception was documented as being wrong (invalid) but was
  correct (no exception with IEEEish NaNs).

Finish (?) rev.1.5.  gamma was not renamed to tgamma in one place.

Finish (?) rev.1.6.  errno.h was not completely removed.
2007-05-02 13:49:28 +00:00
Daniel Eischen
5f864214bb Use C comments since we now preprocess these files with CPP. 2007-04-29 14:05:22 +00:00
Warner Losh
ee7093a640 Remove California Regent's clause 3, per letter 2007-01-09 01:02:06 +00:00
David Schultz
9abb1ff616 Implement modfl(). 2007-01-07 07:54:21 +00:00
David Schultz
8185b32b5a Fix a problem relating to fesetenv() clobbering i387 register stack.
Details: As a side-effect of restoring a saved FP environment,
fesetenv() overwrites the tag word, which indicates which i387
registers are in use.  Normally this isn't a problem because
the calling convention requires the register stack to be empty
on function entry and exit.  However, fesetenv() is inlined, so we
need to tell gcc explicitly that the i387 registers get clobbered.

PR:	85101
2007-01-06 21:46:23 +00:00
David Schultz
93e0663877 Fix a cut-and-paste-o. 2007-01-06 21:23:20 +00:00
David Schultz
829d55ac9c Correctly handle NaN. 2007-01-06 21:22:57 +00:00
David Schultz
9fa229fc8d Correctly handle inf/nan. This routine is currently unused because we
seem to have assembly versions for all architectures, but it can't
hurt to fix it.
2007-01-06 21:22:38 +00:00
David Schultz
6642c3fa74 Remove modf from libm's symbol map. It's actually in libc for
hysterical raisins.
2007-01-06 21:18:17 +00:00
David Schultz
3cb636ce18 Remove an unneeded fnstcw instruction.
Noticed by:	bde
2007-01-05 07:15:26 +00:00
David Schultz
cc0d85b680 Remove a note pertaining to the Alpha. 2007-01-05 07:14:26 +00:00
Bruce Evans
fae6222bdb Moved __BEGIN_DECLS up a little so that it covers __test_sse() and C++
isn't broken,

PR:		104425
2006-10-14 20:35:56 +00:00
Ruslan Ermilov
2b46c64c9c Remove alpha left-overs. 2006-08-22 08:03:01 +00:00
Bruce Evans
d79d610d9c Fixed the threshold for using the simple Taylor approximation.
In e_log.c, there was just a off-by-1 (1 ulp) error in the comment
about the threshold.  The precision of the threshold is unimportant,
but the magic numbers in the code are easier to understand when the
threshold is described precisely.

In e_logf.c, mistranslation of the magic numbers gave an off-by-1
(1 * 16 ulps) error in the intended negative bound for the threshold
and an off-by-7 (7 * 16 ulps) error in the intended positive bound for
the threshold, and the intended bounds were not translated from the
double precision bounds so they were unnecessarily small by a factor
of about 2048.

The optimization of using the simple Taylor approximation for args
near a power of 2 is dubious since it only applies to a relatively
small proportion of args, but if it is done then doing it 2048 times
as often _may_ be more efficient.  (My benchmarks show unexplained
dependencies on the data that increase with further optimizations
in this area.)
2006-07-07 04:33:08 +00:00
Bruce Evans
fe72622ebe Fixed tanh(-0.0) on ia64 and optimizeed tanh(x) for 2**-55 <= |x| <
2**-28 as a side effect, by merging with the float precision version
of tanh() and the double precision version of sinh().

For tiny x, tanh(x) ~= x, and we used the expression x*(one+x) to
return this value (x) and set the inexact flag iff x != 0.  This
doesn't work on ia64 since gcc -O does the dubious optimization
x*(one+x) = x+x*x so as to use fma, so the sign of -0.0 was lost.

Instead, handle tiny x in the same as sinh(), although this is imperfect:
- return x directly and set the inexact flag in a less efficient way.
- increased the threshold for non-tinyness from 2**-55 to 2**-28 so that
  many more cases are optimized than are pessimized.

Updated some comments and fixed bugs in others (ranges for half-open
intervals mostly had the open end backwards, and there were nearby style
bugs).
2006-07-05 22:59:33 +00:00
Bruce Evans
3454a5a101 Removed the optimized asm versions of scalb() and scalbf(). These
functions are only for compatibility with obsolete standards.  They
shouldn't be used, so they shouldn't be optimized.  Use the generic
versions instead.

This fixes scalbf() as a side effect.  The optimized asm version left
garbage on the FP stack.  I fixed the corresponding bug in the optimized
asm scalb() and scalbn() in 1996.  NetBSD fixed it in scalb(), scalbn()
and scalbnf() in 1999 but missed fixing it in scalbf().  Then in 2005
the bug was reimplemented in FreeBSD by importing NetBSD's scalbf().

The generic versions have slightly different error handling:
- the asm versions blindly round the second parameter to a (floating
  point) integer and proceed, while the generic versions return NaN
  if this rounding changes the value.  POSIX permits both behaviours
  (these functions are XSI extensions and the behaviour for a bogus
  non-integral second parameter is unspecified).   Apart from this
  and the bug in scalbf(), the behaviour of the generic versions seems
  to be identical.  (I only exhusatively tested
  generic_scalbf(1.0F, anyfloat) == asm_scalb(1.0F, anyfloat).  This
  covers many representative corner cases involving NaNs and Infs but
  doesn't test exception flags.  The brokenness of scalbf() showed up
  as weird behaviour after testing just 7 integer cases sequentially.)
2006-07-05 20:06:42 +00:00
Bruce Evans
8eca9455de Backed out rev.1.10. It tried to implement ldexpf() as a weak reference
to scalbf(), but ldexpf() cannot be implemented in that way since the
types of the second parameter differ.  ldexpf() can be implemented as
a weak or strong reference to scalbnf() (*) but that was already done
long before rev.1.10 was committed.  The old implementation uses a
reference, so rev.1.10 had no effect on applications.  The C files for
the scalb() family are not used for amd64 or i386, so rev.1.10 had even
less effect for these arches.

(*) scalbnf() raises the radix to the given exponent, while ldexpf()
raises 2 to the given exponent.  Thus the functions are equivalent
except possibly for their error handling iff the radix is 2.  Standards
more or less require identical error handling.  Under FreeBSD, the
functions are equivalent except for more details being missing in
scalbnf()'s man page.
2006-07-05 02:16:29 +00:00
Daniel Eischen
d7eda46253 Add symbol versioning to libm. 2006-03-27 23:59:45 +00:00
Bruce Evans
fd2891004d Oops, on amd64 (and probably on all non-i386 systems), the previous
commit broke the 2**24 cases where |x| > DBL_MAX/2.  There are exponent
range problems not just for denormals (underflow) but for large values
(overflow).  Doubles have more than enough exponent range to avoid the
problems, but I forgot to convert enough terms to double, so there was
an x+x term which was sometimes evaluated in float precision.

Unfortunately, this is a pessimization with some combinations of systems
and compilers (it makes no difference on Athlon XP's, but on Athlon64's
it gives a 5% pessimization with gcc-3.4 but not with gcc-3.3).

Exlain the problem better in comments.
2006-01-05 09:18:48 +00:00
Bruce Evans
4bb9780353 Use double precision internally to optimize cbrtf(), and change the
algorithm for the second step significantly to also get a perfectly
rounded result in round-to-nearest mode.  The resulting optimization
is about 25% on Athlon64's and 30% on Athlon XP's (about 25 cycles
out of 100 on the former).

Using extra precision, we don't need to do anything special to avoid
large rounding errors in the third step (Newton's method), so we can
regroup terms to avoid a division, increase clarity, and increase
opportunities for parallelism.  Rearrangement for parallelism loses
the increase in clarity.  We end up with the same number of operations
but with a division reduced to a multiplication.

Using specifically double precision, there is enough extra precision
for the third step to give enough precision for perfect rounding to
float precision provided the previous steps are accurate to 16 bits.
(They were accurate to 12 bits, which was almost minimal for imperfect
rounding in the old version but would be more than enough for imperfect
rounding in this version (9 bits would be enough now).)  I couldn't
find any significant time optimizations from optimizing the previous
steps, so I decided to optimize for accuracy instead.  The second step
needed a division although a previous commit optimized it to use a
polynomial approximation for its main detail, and this division dominated
the time for the second step.  Use the same Newton's method for the
second step as for the third step since this is insignificantly slower
than the division plus the polynomial (now that Newton's method only
needs 1 division), significantly more accurate, and simpler.  Single
precision would be precise enough for the second step, but doesn't
have enough exponent range to handle denormals without the special
grouping of terms (as in previous versions) that requires another
division, so we use double precision for both the second and third
steps.
2006-01-05 07:57:31 +00:00
Bruce Evans
5776f433ab Extract the high and low words together. With gcc-3.4 on uniformly
distributed non-large args, this saves about 14 of 134 cycles for
Athlon64s and about 5 of 199 cycles for AthlonXPs.

Moved the check for x == 0 inside the check for subnormals.  With
gcc-3.4 on uniformly distributed non-large args, this saves another
5 cycles on Athlon64s and loses 1 cycle on AthlonXPs.

Use INSERT_WORDS() and not SET_HIGH_WORD() when converting the first
approximation from bits to double.  With gcc-3.4 on uniformly distributed
non-large args, this saves another 4 cycles on both Athlon64s and and
AthlonXPs.

Accessing doubles as 2 words may be an optimization on old CPUs, but on
current CPUs it tends to cause extra operations and pipeline stalls,
especially for writes, even when only 1 of the words needs to be accessed.

Removed an unused variable.
2005-12-20 01:21:30 +00:00