freebsd-skq

Author	SHA1	Message	Date
Bruce Evans	74bbe8ed42	Fixed range reduction for large multiples of pi/2 on systems with broken assignment to floats (e.g., i386 with gcc -O, but not amd64 or ia64; i386 with gcc -O0 worked accidentally). Use an unnamed volatile temporary variable to trick gcc -O into clipping extra precision on assignment. It's surprising that only 1 place needed to be changed. For tanf() on i386 with gcc -O, the bug caused errors > 1 ulp with a density of 2.3% for args larger in magnitude than 128pi/2, with a maximum error of 1.624 ulps. After this fix, exhaustive testing shows that range reduction for floats works as intended assuming that it is in within a factor of about 2^16 of working as intended for doubles. It provides >= 8 extra bits of precision for all ranges. On i386: range max error in double/single ulps extra precision ----- ------------------------------- --------------- 0 to 3pi/4 0x000d3132 / 0.0016 9+ bits 3pi/4 to 128pi/2 0x00160445 / 0.0027 8+ 128pi/2 to +Inf 0x00000030 / 0.00000009 23+ 128pi/2 up, -O0 before fix 0x00000030 / 0.00000009 23+ 128*pi/2 up, -O1 before fix 0x10000000 / 0.5 1 The 23+ bits of extra precision for large multiples corresponds to almost perfect reduction to a pair of floats (24 extra would be perfect). After this fix, the maximum relative error (relative to the corresponding fdlibm double precision function) is < 1 ulp for all basic trig functions on all 2^32 float args on all machines tested: amd64 ia64 i386-O0 i386-O1 ------ ------ ------ ------ cosf: 0.8681 0.8681 0.7927 0.5650 sinf: 0.8733 0.8610 0.7849 0.5651 tanf: 0.9708 0.9329 0.9329 0.7035	2005-10-11 07:56:05 +00:00
Bruce Evans	59b8fc1535	Fixed range reduction near (but not very near) medium-sized multiples of pi/2 (1 line) and expand a comment about related magic (many lines). The bug was essentially the same as for the +-pi/2 case (a mistranslated mask), but was smaller so it only significantly affected multiples starting near +-13*pi/2. At least on amd64, for cosf() on all 2^32 float args, the bug caused 128 errors of >= 1 ulp, with a maximum error of 1.2393 ulps.	2005-10-10 20:02:02 +00:00
Bruce Evans	11cba99f67	Fix numerous errors of >= 1 ulp for cosf(x) and sinf(x) (1 line) and add a comment about related magic (many lines)). __kernel_cos[f]() needs a trick to reduce the error to below 1 ulp when \|x\| >= 0.3 for the range-reduced x. Modulo other bugs, naive code that doesn't use the trick would have an error of >= 1 ulp in about 0.00006% of cases when \|x\| >= 0.3 for the unreduced x, with a maximum relative error of about 1.03 ulps. Mistransation of the trick from the double precision case resulted in errors in about 0.2% of cases, with a maximum relative error of about 1.3 ulps. The mistranslation involved not doing implicit masking of the 32-bit float word corresponding to to implicit masking of the lower 32-bit double word by clearing it. sinf() uses __kernel_cosf() for half of all cases so its errors from this bug are similar. tanf() is not affected. The error bounds in the above and in my other recent commit messages are for amd64. Extra precision for floats on i386's accidentally masks this bug, but only if k_cosf.c is compiled with -O. Although the extra precision helps here, this is accidental and depends on longstanding gcc precision bugs (not clipping extra precision on assignment...), and the gcc bugs are mostly avoided by compiling without -O. I now develop libm mainly on amd64 systems to simplify error detection and debugging.	2005-10-09 21:07:23 +00:00
Bruce Evans	a0e34da09f	Oops, the last-minute optimization in rev.1.8 wasn't a good idea. The 17+17+24 bit pi/2 must only be used when subtraction of the first 2 terms in it from the arg is exact. This happens iff the the arg in bits is one of the 2**17[-1] values on each side of (float)(pi/2). Revert to the algorithm in rev.1.7 and only fix its threshold for using the 3-term pi/2. Use the threshold that maximizes the number of values for which the 3-term pi/2 is used, subject to not changing the algorithm for comparing with the threshold. The 3-term pi/2 ends up being used for about half of its usable range (about 64K values on each side).	2005-10-09 04:29:08 +00:00
Bruce Evans	cd604283af	Fixed syntax error (a missing brace) in previous commit.	2005-10-08 22:55:36 +00:00
Bruce Evans	a7b8acac04	Fixed range reduction near (but not very near) +-pi/2. A bug caused a maximum error of 2.905 ulps for cosf(), but the algorithm for cosf() is good for < 1 ulps and happens to give perfect rounding (< 0.5 ulps) near +-pi/2 except for the bug. The extra relative errors for tanf() were similar (slightly larger). The bug didn't affect sinf() since sinf'(+-pi/2) is 0. For range reduction in ~[-3pi/4, -pi/4] and ~[pi/4, 3pi/4] we must subtract +-pi/2 and the only complication is that this must be done in extra precision. We have handy 17+24-bit and 17+17+24-bit approximations to pi/2. If we always used the former then we would lose up to 24 bits of accuracy due to cancelation of leading bits, but we need to keep at least 24 bits plus a guard digit or 2, and should keep as many guard bits as efficiency permits. So we used the less-precise pi/2 not very near +-pi/2 and switched to using the more-precise pi/2 very near +-pi/2. However, we got the threshold for the switch wrong by allowing 19 bits to cancel, so we ended up with only 21 or 22 bits of accuracy in some cases, which is even worse than naively subtracting pi/2 would have done. Exhaustive checking shows that allowing only 17 bits to cancel (min. accuracy ~24 bits) is sufficient to reduce the maximum error for cosf() near +-pi/2 to 0.726 ulps, but allowing only 6 bits to cancel (min. accuracy ~35-bits) happens to give perfect rounding for cosf() at little extra cost so we prefer that. We actually (in effect) allow 0 bits to cancel and always use the 17+17+24-bit pi/2 (min. accuracy ~41 bits). This is simpler and probably always more efficient too. Classifying args to avoid using this pi/2 when it is not needed takes several extra integer operations and a branch, but just using it takes only 1 FP operation. The patch also fixes misspelling of 17 as 24 in many comments. For the double-precision version, the magic numbers include 33+53 bits for the less-precise pi/2 and (53-32-1 = 20) bits being allowed to cancel, so there are ~33-20 = 13 guard bits. This is sufficient except probably for perfect rounding. The more-precise pi/2 has 33+33+53 bits and we still waste time classifying args to avoid using it. The bug is apparently from mistranslation of the magic 32 in 53-32-1. The number of bits allowed to cancel is not critical and we use 32 for double precision because it allows efficient classification using a 32-bit comparison. For float precision, we must use an explicit mask, and there are fewer bits so there is less margin for error in their allocation. The 32 got reduced to 4 but should have been reduced almost in proportion to the reduction of mantissa bits.	2005-10-08 22:43:55 +00:00
Bruce Evans	0b42281ee9	Fixed aliasing bugs in TRUNC() by using the fdlibm macros for access to doubles as bits. fdlibm-1.1 had similar aliasing bugs, but these were fixed by NetBSD or Cygnus before a modified version of fdlibm was imported in 1994. TRUNC() is only used by tgamma() and some implementation-detail functions. The aliasing bugs were detected by compiling with gcc -O2 but don't seem to have broken tgamma() on i386's or amd64's. They broke my modified version of tgamma(). Moved the definition of TRUNC() to mathimpl.h so that it can be fixed in one place, although the general version is even slower than necessary because it has to operate on pointers to volatiles to handle its arg sometimes being volatile. Inefficiency of the fdlibm macros slows down libm generally, and tgamma() is a relatively unimportant part of libm. The macros act as if on 32-bit words in memory, so they are hard to optimize to direct actions on 64-bit double registers for (non-i386) machines where this is possible. The optimization is too hard for gcc on amd64's, and declaring variables as volatile makes it impossible.	2005-09-19 11:28:19 +00:00
David Schultz	26bd283f2a	Add a missing ldexpf() alias for amd64. Noticed by: bz@, tjr@	2005-09-12 20:54:00 +00:00
Ken Smith	a84020c2b9	Bump the shared library version number of all libraries that have not been bumped since RELENG_5. Reviewed by: ru Approved by: re (not needed for commit check but in principle...)	2005-07-22 17:19:05 +00:00
Ruslan Ermilov	01293bdb90	Markup nit. Approved by: re (blanket)	2005-06-16 21:56:03 +00:00
Ruslan Ermilov	70db9cd000	Fixed compile warning. Approved by: re (blanket)	2005-06-16 21:55:45 +00:00
Ruslan Ermilov	f789cb8293	Assorted markup fixes. Approved by: re	2005-06-15 19:04:04 +00:00
Daniel Eischen	7f8fa2cf47	Prevent these functions from using stack outside of their frame. Reported by: Marc Olzheim <marcolz at stack dot nl> OK'd by: das	2005-05-06 15:44:20 +00:00
Stefan Farfeleder	66116c07a7	Revert the last change, the conversion from long double to double can raise unwanted underflow exceptions. Pointed out by: das	2005-04-28 19:45:55 +00:00
Stefan Farfeleder	8f58ab910f	Use double additions to raise the inexact exception to work around problems with long double addition on sparc64.	2005-04-22 09:57:55 +00:00
Stefan Farfeleder	9eb30792de	Fix raising the inexact exception (FE_INEXACT) if the result differs from the argument. Noticed by: das	2005-04-22 08:30:33 +00:00
Andrey A. Chernov	db7354df52	Fix truncl.3 MLINKS	2005-04-17 19:57:52 +00:00
David Schultz	a4ca7ca8ac	More optimized math functions.	2005-04-16 21:12:55 +00:00
David Schultz	2f2ee27de4	Implement truncl() based on floorl().	2005-04-16 21:12:47 +00:00
David Schultz	07f3bc5b9c	Add roundl(), lroundl(), and llroundl().	2005-04-08 01:24:08 +00:00
David Schultz	4bb190a74b	These files should include s_lround.c instead of s_lrint.c. This only matters for efficiency, not for correctness.	2005-04-08 00:52:27 +00:00
David Schultz	fc87986708	Fix a (coincidentally harmless) bug.	2005-04-08 00:52:16 +00:00
David Schultz	46691dfbe7	Fix a long-standing bug in k_rem_pio2(), which led to large errors when tanf() was called with big arguments close to multiples of pi/2. Reported by: ucbtest via bde	2005-04-05 23:27:47 +00:00
David Schultz	d06a0070af	Build exp2(), exp2f(), and related documentation.	2005-04-05 02:57:39 +00:00
David Schultz	90232fdf16	Document exp2() and exp2f(), and make other minor tweaks and updates.	2005-04-05 02:57:28 +00:00
David Schultz	f8d6ede6b5	Implement exp2() and exp2f().	2005-04-05 02:57:15 +00:00
David Schultz	3b9141ee91	Implement and document remquo() and remquof().	2005-03-25 04:40:44 +00:00
David Schultz	2c2435825a	Fix the double rounding problem with subnormals, and remove the XXX comments, which no longer apply.	2005-03-18 02:27:59 +00:00
David Schultz	21122bea01	Add missing prototypes for fma() and fmaf(), and remove an inaccurate comment.	2005-03-18 01:47:42 +00:00
David Schultz	9233b45ad9	Make the fenv.h routines work for programs that use SSE for floating-point arithmetic on i386. Now I'm going to make excuses for why this code is kinda scary: - To avoid breaking the ABI with 5.3-RELEASE, we can't change sizeof(fenv_t). I stuck the saved mxcsr in some discontiguous reserved bits in the existing structure. - Attempting to access the mxcsr on older processors results in an illegal instruction exception, so support for SSE must be detected at runtime. (The extra baggage is optimized away if either the application or libm is compiled with -msse{,2}.) I didn't run tests to ensure that this doesn't SIGILL on older 486's lacking the cpuid instruction or on other processors lacking SSE. Results from running the fenv regression test on these processors would be appreciated. (You'll need to compile the test with -DNO_STRICT_DFL_ENV.) If you have an 80386, or if your processor supports SSE but the kernel didn't enable it, then you're probably out of luck. Also, I un-inlined some of the functions that grew larger as a result of this change, moving them from fenv.h to fenv.c.	2005-03-17 22:21:46 +00:00
David Schultz	56ad27535a	Spell 'fedisableexcept' correctly.	2005-03-16 22:34:14 +00:00
David Schultz	2e5fb44003	Document feenableexcept(), fedisableexcept(), and fegetexcept().	2005-03-16 19:04:28 +00:00
David Schultz	10b01832c3	Replace fegetmask() and fesetmask() with feenableexcept(), fedisableexcept(), and fegetexcept(). These two sets of routines provide the same functionality. I implemented the former as an undocumented internal interface to make the regression test easier to write. However, fe(enable\|disable\|get)except() is already part of glibc, and I would like to avoid gratuitous differences. The only major flaw in the glibc API is that there's no good way to report errors on processors that don't support all the unmasked exceptions.	2005-03-16 19:03:46 +00:00
David Schultz	3d266bde6d	Replace strong references with weak references. There's no particularly good reason to do this, except that __strong_reference does type checking, whereas __weak_reference does not. On Alpha, the compiler won't accept a 'long double' parameter in place of a 'double' parameter even thought the two types are identical.	2005-03-07 21:27:37 +00:00
Stefan Farfeleder	3ddc6e9440	Remove an obsolete sentence from a comment.	2005-03-07 20:28:26 +00:00
David Schultz	c8642491d5	- If z is 0, one of x or y is 0, and the other is infinite, raise an invalid exception and return an NaN. - If a long double has 113 bits of precision, implement fma in terms of simple long double arithmetic instead of complicated double arithmetic. - If a long double is the same as a double, alias fma as fmal.	2005-03-07 05:02:09 +00:00
David Schultz	388bf3b630	Document scalbnl and scalblnl.	2005-03-07 05:00:44 +00:00
David Schultz	6af2c5a60c	Document nextafterl and nexttoward{,f,l}.	2005-03-07 05:00:29 +00:00
David Schultz	15a53f77fd	Add nexttoward to the list of implemented functions, and explicitly list the four that are still missing.	2005-03-07 04:59:53 +00:00
David Schultz	66d672d8cb	Document fmal.	2005-03-07 04:59:43 +00:00
David Schultz	94e03502dc	Remove ldexp and ldexpf. The former is in libc, and the latter is identical to scalbnf, which is now aliased as ldexpf. Note that the old implementations made the mistake of setting errno and were the only libm routines to do so.	2005-03-07 04:59:30 +00:00
David Schultz	aeb5e711f3	- Remove s_ldexpf.c (now aliased to scalbn.) - Add nexttoward{,f,l} and nextafterl. On all platforms, nexttowardl is an alias for nextafterl. - Add fmal. - Add man pages for new routines: fmal, nextafterl, nexttoward{,f,l}, scalb{,l}nl. Note that on platforms where long double is the same as double, we generally just alias the double versions of the routines, since doing so avoids extra work on the source code level and redundant code in the binary. In particular: ldbl53 ldbl64/113 fmal s_fma.c s_fmal.c ldexpl s_scalbn.c s_scalbnl.c nextafterl s_nextafter.c s_nextafterl.c nexttoward s_nextafter.c s_nexttoward.c nexttowardf s_nexttowardf.c s_nexttowardf.c nexttowardl s_nextafter.c s_nextafterl.c scalbnl s_scalbn.c s_scalbnl.c	2005-03-07 04:59:11 +00:00
David Schultz	228ad57d05	- Define FP_FAST_FMA for sparc64, since fma() is now implemented using sparc64's 128-bit long doubles. - Define FP_FAST_FMAL for ia64. - Prototypes for fmal, frexpl, ldexpl, nextafterl, nexttoward{,f,l}, scalblnl, and scalbnl.	2005-03-07 04:58:43 +00:00
David Schultz	beed720c37	Alias scalbn as ldexpl and scalbnl on platforms where long double is the same as double.	2005-03-07 04:58:03 +00:00
David Schultz	7b6a19039d	- Implement scalblnl. - In scalbln and scalblnf, check the bounds of the second argument. This is probably unnecessary, but strictly speaking, we should report an error if someone tries to compute scalbln(x, INT_MAX + 1ll).	2005-03-07 04:57:50 +00:00
David Schultz	caacab9b5f	Implement nexttowardf. This is used on both platforms with 11-bit exponents and platforms with 15-bit exponents for long doubles.	2005-03-07 04:57:38 +00:00
David Schultz	ef94de735a	Implement nexttoward and nextafterl; the latter is also known as nexttowardl. These are not needed on machines where long doubles look like IEEE-754 doubles, so the implementation only supports the usual long double formats with 15-bit exponents. Anything bizarre, such as machines where floating-point and integer data have different endianness, will cause problems. This is the case with big endian ia64 according to libc/ia64/_fpmath.h. Please contact me if you managed to get a machine running this way.	2005-03-07 04:56:46 +00:00
David Schultz	a506506a1c	- Try harder to trick gcc into not optimizing away statements that are intended to raise underflow and inexact exceptions. - On systems where long double is the same as double, nextafter should be aliased as nexttoward, nexttowardl, and nextafterl.	2005-03-07 04:55:58 +00:00
David Schultz	e0fe8e4440	Implement frexpl.	2005-03-07 04:54:51 +00:00
David Schultz	f8a40fca14	Alias frexp as frexpl on platforms where a long double is the same as a double.	2005-03-07 04:54:39 +00:00

1 2 3 4 5 ...

307 Commits