freebsd-dev

Author	SHA1	Message	Date
Bruce Evans	be396b71c1	2 long double constants were missing L suffixes. This helped break tanl() on !(amd64 \|\| i386). It gave slightly worse than double precision in some cases. tanl() now passes tests of 2^24 values on ia64.	2008-02-18 15:39:52 +00:00
Bruce Evans	19a9e1bb1c	Fix a typo which broke k_tanl.c on !(amd64 \|\| i386).	2008-02-18 14:09:41 +00:00
David Schultz	de336b0c5e	Add kernel functions for 80-bit long doubles. Many thanks to Steve and Bruce for putting lots of effort into these; getting them right isn't easy, and they went through many iterations. Submitted by: Steve Kargl <sgk@apl.washington.edu> with revisions from bde	2008-02-17 07:32:14 +00:00
Bruce Evans	f01bfe5c6d	Fix exp2*(x) on signaling NaNs by returning x+x as usual. This has the side effect of confusing gcc-4.2.1's optimizer into more often doing the right thing. When it does the wrong thing here, it seems to be mainly making too many copies of x with dependency chains. This effect is tiny on amd64, but in some cases on i386 it is enormous. E.g., on i386 (A64) with -O1, the current version of exp2() should take about 50 cycles, but took 83 cycles before this change and 66 cycles after this change. exp2f() with -O1 only speeded up from 51 to 47 cycles. (exp2f() should take about 40 cycles, on an Athlon in either i386 or amd64 mode, and now takes 42 on amd64). exp2l() with -O1 slowed down from 155 cycles to 123 for some args; this is unimportant since the i386 exp2l() is a fake; the wrong thing for it seems to involve branch misprediction.	2008-02-13 10:44:44 +00:00
Bruce Evans	a373e66b85	Use a better method of scaling by 2k. Instead of adding to the exponent bits of the reduced result, construct 2k (hopefully in parallel with the construction of the reduced result) and multiply by it. This tends to be much faster if the construction of 2*k is actually in parallel, and might be faster even with no parallelism since adjustment of the exponent requires a read-modify-wrtite at an unfortunate time for pipelines. In some cases involving exp2 on amd64 (A64), this change saves about 40 cycles or 30%. I think it is inherently only about 12 cycles faster in these cases and the rest of the speedup is from partly-accidentally avoiding compiler pessimizations (the construction of 2**k is now manually scheduled for good results, and -O2 doesn't always mess this up). In most cases on amd64 (A64) and i386 (A64) the speedup is about 20 cycles. The worst case that I found is expf on ia64 where this change is a pessimization of about 10 cycles or 5%. The manual scheduling for plain exp[f] is harder and not as tuned. This change ld128/s_exp2l.c has not been tested.	2008-02-07 03:17:05 +00:00
David Schultz	968b39e3b9	Implement exp2l(). There is one version for machines with 80-bit long doubles (i386, amd64, ia64) and one for machines with 128-bit long doubles (sparc64). Other platforms use the double version. I've only done runtime testing on i386. Thanks to bde@ for helpful discussions and bugfixes.	2008-01-18 21:42:46 +00:00
David Schultz	7cd4a83267	Since nan() is supposed to work the same as strtod("nan(...)", NULL), my original implementation made both use the same code. Unfortunately, this meant libm depended on a vendor header at compile time and previously- unexposed vendor bits in libc at runtime. Hence, I just wrote my own version of the relevant vendor routine. As it turns out, mine has a factor of 8 fewer of lines of code, and is a bit more readable anyway. The strtod() and *scanf() routines still use vendor code. Reviewed by: bde	2007-12-18 23:46:32 +00:00
David Schultz	4b6b574455	Implement and document nan(), nanf(), and nanl(). This commit adds two new directories in msun: ld80 and ld128. These are for long double functions specific to the 80-bit long double format used on x86-derived architectures, and the 128-bit format used on sparc64, respectively.	2007-12-16 21:19:28 +00:00

1 2

58 Commits