Commit Graph

62 Commits

Author SHA1 Message Date
bde
fcad0aa7c4 Oops, the previous i386 version of e_fmodf.S and e_fmodl.S was
actually the amd64 version.
2016-09-04 15:08:14 +00:00
bde
df9121eecb Disconnect the "optimized" asm variants of cos(), sin() and tan() from
the build on i386.  Leave them in the source tree for regression tests.

The asm functions were always much less accurate (by a factor of more
than 10**18 in the worst case).  They were faster on old CPUs.  But
with each new generation of CPUs they get relatively slower.  The
double precision C version's average advantage is about a factor of 2
on Haswell.

The asm functions were already intentionally avoided in float and long
double precision on i386 and in all precisions on amd64.  Float
precision and amd64 give larger advantages to the C version.  The long
double precision C code and compilers' understanding of long double
precision are not so good, so the i387 is still slightly faster for
long double precision, except for the unimportant subcase of huge args
where the sub-optimal C code now somehow beats the i387 by about a
factor of 2.
2016-09-04 14:12:19 +00:00
bde
95d1e1376d Add asm versions of fmod(), fmodf() and fmodl() on amd64. Add asm
versions of fmodf() amd fmodl() on i387.

fmod is similar to remainder, and the C versions are 3 to 9 times
slower than the asm versions on x86 for both, but we had the strange
mixture of all 6 variants of remainder in asm and only 1 of 6
variants of fmod in asm.
2016-09-04 12:22:14 +00:00
kib
2ccc50ad8d Merge the 386 and amd64 versions of the fenv.h, to make cc -m32
compilations which use fenv.h work.

Reviewed by:	tjil
Sponsored by:	The FreeBSD Foundation
2013-04-21 13:31:55 +00:00
tijl
19adcfe770 Optimise i387 trigonometric functions. Replace "andw 0x400,%ax \ jnz" with
"sahf \ jp", "fprem1" with "fprem" and "fstsw %ax" with "fnstsw %ax".
2012-09-16 16:58:49 +00:00
das
818507cb90 Bugfix: feenableexcept() and fedisableexcept() should just return the
old exception mask, not mask | ~FE_ALL_EXCEPT.

MFC after:	2 weeks
2011-10-21 06:25:31 +00:00
das
1e6760c16a Use #include "fenv.h" instead of #include <fenv.h>. This makes it
more convenient to compile the math library by itself.

Requested by:	bde
2011-10-16 05:37:56 +00:00
das
d8a1d87813 Replace two lines accidentally removed in r226218. Thanks to bde
for noticing this.
2011-10-15 04:17:20 +00:00
das
a38603b0b5 Provide external definitions of all of the standardized functions in
fenv.h that are currently inlined.

The definitions are provided in fenv.c via 'extern inline'
declaractions.  This assumes the compiler handles 'extern inline' as
specified in C99, which has been true under FreeBSD since 8.0.

The goal is to eventually remove the 'static' keyword from the inline
definitions in fenv.h, so that non-inlined references all wind up
pointing to the same external definition like they're supposed to.
I am deferring the second step to provide a window where
newly-compiled apps will still link against old math libraries.
(This isn't supported, but there's no need to cause undue breakage.)

Reviewed by:    stefanf, bde
2011-10-10 15:43:09 +00:00
kib
30039e1e2f Add section .note.GNU-stack for assembly files used by 386 and amd64. 2011-01-07 16:13:12 +00:00
dim
410484186a Use __FBSDID() instead of RCSID() in most .S files under lib/msun/i386,
and one under lib/msun/amd64.  This avoids adding the identifiers to the
.text section, and moves them to the .comment section instead.

Suggested by:	bde
Approved by:	rpaulo (mentor)
2010-10-01 20:14:36 +00:00
kib
d4516a049a Placate new binutils, by using 16-bit %ax instead of 32-bit %eax as an
argument for fnstsw. Explicitely specify sizes for the XMM control and
status word and X87 control and status words.

Reviewed by:	das
Tested by:	avg
MFC after:	2 weeks
2010-02-03 20:23:47 +00:00
attilio
4af1dcdee0 Use, in uncovered part, the END() macro in order to improve debugging.
In this specific case, Valgrind won't get confused when analyzing such
functions.

Sponsored by:	Sandvine Incorporated
Tested by:	emaste
MFC:		3 days
2009-05-25 14:37:10 +00:00
das
affd78d50b On i386, gcc truncates long double constants to double precision
at compile time regardless of the dynamic precision, and there's
no way to disable this misfeature at compile time. Hence, it's
impossible to generate the appropriate tables of constants for the
long double inverse trig functions in a straightforward way on i386;
this change hacks around the problem by encoding the underlying bits
in the table.

Note that these functions won't pass the regression test on i386,
even with the FPU set to extended precision, because the regression
test is similarly damaged by gcc. However, the tests all pass when
compiled with a modified version of gcc.

Reported by:  	bde
2008-08-02 03:56:22 +00:00
das
39170f049a Add assembly versions of remquol() and remainderl(). 2008-03-30 21:21:53 +00:00
das
635be49304 Hook up sqrtl() to the build. 2008-03-02 01:48:17 +00:00
das
09521f824a MD implementations of sqrtl(). 2008-03-02 01:48:08 +00:00
das
4f45aea521 Implement rintl(), nearbyintl(), lrintl(), and llrintl().
Thanks to bde@ for feedback and testing of rintl().
2008-01-14 02:12:07 +00:00
das
d717f8cf06 Add logbl(3) to libm. 2007-12-17 03:53:38 +00:00
deischen
2a7306fdc5 Use C comments since we now preprocess these files with CPP. 2007-04-29 14:05:22 +00:00
das
3d86fb6387 Fix a problem relating to fesetenv() clobbering i387 register stack.
Details: As a side-effect of restoring a saved FP environment,
fesetenv() overwrites the tag word, which indicates which i387
registers are in use.  Normally this isn't a problem because
the calling convention requires the register stack to be empty
on function entry and exit.  However, fesetenv() is inlined, so we
need to tell gcc explicitly that the i387 registers get clobbered.

PR:	85101
2007-01-06 21:46:23 +00:00
das
aeb763b099 Remove an unneeded fnstcw instruction.
Noticed by:	bde
2007-01-05 07:15:26 +00:00
bde
d36e6277cb Moved __BEGIN_DECLS up a little so that it covers __test_sse() and C++
isn't broken,

PR:		104425
2006-10-14 20:35:56 +00:00
bde
ac26a61be9 Removed the optimized asm versions of scalb() and scalbf(). These
functions are only for compatibility with obsolete standards.  They
shouldn't be used, so they shouldn't be optimized.  Use the generic
versions instead.

This fixes scalbf() as a side effect.  The optimized asm version left
garbage on the FP stack.  I fixed the corresponding bug in the optimized
asm scalb() and scalbn() in 1996.  NetBSD fixed it in scalb(), scalbn()
and scalbnf() in 1999 but missed fixing it in scalbf().  Then in 2005
the bug was reimplemented in FreeBSD by importing NetBSD's scalbf().

The generic versions have slightly different error handling:
- the asm versions blindly round the second parameter to a (floating
  point) integer and proceed, while the generic versions return NaN
  if this rounding changes the value.  POSIX permits both behaviours
  (these functions are XSI extensions and the behaviour for a bogus
  non-integral second parameter is unspecified).   Apart from this
  and the bug in scalbf(), the behaviour of the generic versions seems
  to be identical.  (I only exhusatively tested
  generic_scalbf(1.0F, anyfloat) == asm_scalb(1.0F, anyfloat).  This
  covers many representative corner cases involving NaNs and Infs but
  doesn't test exception flags.  The brokenness of scalbf() showed up
  as weird behaviour after testing just 7 integer cases sequentially.)
2006-07-05 20:06:42 +00:00
deischen
d76f24935a Add symbol versioning to libm. 2006-03-27 23:59:45 +00:00
bde
eb7e930697 Fixed some comments added in rev.1.5.
The log message for 1.5 said that some small (one or two ulp) inaccuracies
were fixed, and a comment implied that the critical change is to switch
the rounding mode to to-nearest, with a switch of the precision to
extended at no extra cost.  Actually, the errors are very large (ucbtest
finds ones of several hundred ulps), and it is the switch of the
precision that is critical.

Another comment was wrong about NaNs being handled sloppily.
2005-10-30 12:21:02 +00:00
deischen
5d3cf26519 Prevent these functions from using stack outside of their frame.
Reported by:	Marc Olzheim <marcolz at stack dot nl>
OK'd by:	das
2005-05-06 15:44:20 +00:00
das
9c49c2a65a More optimized math functions. 2005-04-16 21:12:55 +00:00
das
da9b203aaf Implement and document remquo() and remquof(). 2005-03-25 04:40:44 +00:00
das
fdf53809bb Make the fenv.h routines work for programs that use SSE for
floating-point arithmetic on i386.  Now I'm going to make excuses
for why this code is kinda scary:

- To avoid breaking the ABI with 5.3-RELEASE, we can't change
  sizeof(fenv_t).  I stuck the saved mxcsr in some discontiguous
  reserved bits in the existing structure.

- Attempting to access the mxcsr on older processors results
  in an illegal instruction exception, so support for SSE must
  be detected at runtime.  (The extra baggage is optimized away
  if either the application or libm is compiled with -msse{,2}.)

I didn't run tests to ensure that this doesn't SIGILL on older 486's
lacking the cpuid instruction or on other processors lacking SSE.
Results from running the fenv regression test on these processors
would be appreciated.  (You'll need to compile the test with
-DNO_STRICT_DFL_ENV.)  If you have an 80386, or if your processor
supports SSE but the kernel didn't enable it, then you're probably out
of luck.

Also, I un-inlined some of the functions that grew larger as a result
of this change, moving them from fenv.h to fenv.c.
2005-03-17 22:21:46 +00:00
das
6448887f3b Replace fegetmask() and fesetmask() with feenableexcept(),
fedisableexcept(), and fegetexcept().  These two sets of routines
provide the same functionality.  I implemented the former as an
undocumented internal interface to make the regression test easier to
write.  However, fe(enable|disable|get)except() is already part of
glibc, and I would like to avoid gratuitous differences.  The only
major flaw in the glibc API is that there's no good way to report
errors on processors that don't support all the unmasked exceptions.
2005-03-16 19:03:46 +00:00
das
70073cd00d - Define the LDBL_PREC to be the number of significant bits in a long
double's mantissa.
- Add an assembly version of scalbnl.
2005-03-07 04:53:48 +00:00
das
4a2bef4123 Add scalbnl, also known as as ldexpl. 2005-03-07 04:52:58 +00:00
das
e67e9ee139 Alias scalbnf as ldexpf. The two are identical in binary
floating-point formats.
2005-03-07 04:52:43 +00:00
das
0ac8896337 Remove the i387 versions of atan(), atan2(), and atan2f().
They are slower than the MI routines on modern hardware,
except for degenerate cases such as the Pentium 4.

PR:		67469
2005-02-21 16:04:23 +00:00
das
967bb5dcb0 Remove i387 versions of asin() and acos(). Although the hardware
instruction was faster on the 486, it's slower than our MD version on
modern processors.

Determined by:	bde
PR:		67469
2005-02-20 22:51:08 +00:00
das
ef7a10667b Remove the float versions of the i387 trig functions obtained from
NetBSD.  They're buggy, giving particularly for inputs larger in
magnitude than 2**63.

Noticed by:	bde
PR:		67469
2005-02-20 22:50:40 +00:00
das
9aed1e79d6 Move machine-dependent crud to its own makefile. 2005-02-04 14:33:39 +00:00
das
ec83c7685d Remove wrappers and other cruft intended to support SVID, mistakes in
C90, and other arcana.  Most of these features were never fully
supported or enabled by default.

Ok:	bde, stefanf
2005-02-04 14:08:32 +00:00
das
4ec986eab3 Mark all inline asms that read the floating-point control or status
registers as volatile.  Instructions that *wrote* to FP state were
already marked volatile, but apparently gcc has license to move
non-volatile asms past volatile asms.  This broke amd64's feupdateenv
at -O2 due to a WAR conflict between fnstsw and fldenv there.
2005-01-14 07:09:23 +00:00
das
20067523af Import the subset of J.T. Conklin's single-precision x86-optimized
math routines that appear to be (a) correct and (b) faster than their
MI counterparts on my Pentium 4.

Obtained from:	NetBSD
2005-01-13 18:58:25 +00:00
das
ed0817dc30 Things that are broken, unneeded, and unused since 1997 belong in the attic. 2005-01-13 15:43:22 +00:00
das
1426450140 Faster lrint() and llrint() implementations for x86. 2005-01-11 23:10:53 +00:00
stefanf
bcffee208f Completely remove s_ilogb.S as the assembler implementation gives very little
speed improvement to none at all over the MI version.

Submitted by:	bde
2004-06-20 10:42:23 +00:00
stefanf
ac3aff3300 Return the same result as the MI version for 0.0, INFINITY and NaN.
Reviewed by:	standards@
2004-06-19 09:30:00 +00:00
das
b1670fc3d8 Add an fenv.h implementation for the i386 port.
Reviewed by:	standards@
2004-06-06 10:04:17 +00:00
bde
66eafb65e6 Removed bogus 'l' suffixes in FP register to register instructions. 2000-06-06 12:12:36 +00:00
peter
76f0c923fe $Id$ -> $FreeBSD$ 1999-08-28 00:22:10 +00:00
bde
354d1e72c9 Fixed wrong mnemonic `setnel' that gas happened to generate correct object
code for.

Obtained from:	a slightly different fix in NetBSD
1997-04-30 20:37:52 +00:00
bde
b964069da2 Include <machine/asm.h> instead of kernel-only <machine/asmacros.h>. 1997-03-09 14:01:11 +00:00