freebsd-dev/lib
Bruce Evans 16638b5585 Optimized by eliminating the special case for 0.67434 <= |x| < pi/4.
A single polynomial approximation for tan(x) works in infinite precision
up to |x| < pi/2, but in finite precision, to restrict the accumulated
roundoff error to < 1 ulp, |x| must be restricted to less than about
sqrt(0.5/((1.5+1.5)/3)) ~= 0.707.  We restricted it a bit more to
give a safety margin including some slop for optimizations.  Now that
we use double precision for the calculations, the accumulated roundoff
error is in double-precision ulps so it can easily be made almost 2**29
times smaller than a single-precision ulp.  Near x = pi/4 its maximum
is about 0.5+(1.5+1.5)*x**2/3 ~= 1.117 double-precision ulps.

The minimax polynomial needs to be different to work for the larger
interval.  I didn't increase its degree the old degree is just large
enough to keep the final error less than 1 ulp and increasing the
degree would be a pessimization.  The maximum error is now ~0.80
ulps instead of ~0.53 ulps.

The speedup from this optimization for uniformly distributed args in
[-2pi, 2pi] is 28-43% on athlons, depending on how badly gcc selected
and scheduled the instructions in the old version.  The old version
has some int-to-float conversions that are apparently difficult to schedule
well, but gcc-3.3 somehow did everything ~10 cycles or ~10% faster than
gcc-3.4, with the difference especially large on AXPs.  On A64s, the
problem seems to be related to documented penalties for moving single
precision data to undead xmm registers.  With this version, the speed
is cycles is almost independent of the athlon and gcc version despite
the large differences in instruction selection to use the FPU on AXPs
and SSE on A64s.
2005-11-24 02:04:26 +00:00
..
bind Finish the removal of threads support in ../config.mk,v 1.15. 2005-11-07 15:22:35 +00:00
csu
libalias
libarchive -mdoc sweep. 2005-11-17 13:00:00 +00:00
libatm
libautofs
libbegemot
libbluetooth
libbsnmp
libbz2
libc Fix prototype. 2005-11-23 20:34:37 +00:00
libc_r
libcalendar
libcam
libcom_err
libcompat
libcrypt
libdevinfo
libdevstat
libdisk
libedit -mdoc sweep. 2005-11-17 13:00:00 +00:00
libexpat
libfetch
libform Add missing shared library interdependencies. 2005-11-10 18:07:07 +00:00
libftpio
libgeom
libgpib
libio
libipsec
libipx
libkiconv
libkse o Include <sys/time.h> 2005-11-19 04:47:06 +00:00
libkvm
libmagic Add missing shared library interdependencies. 2005-11-10 18:07:07 +00:00
libmd -mdoc sweep. 2005-11-17 13:00:00 +00:00
libmemstat Tidy up markup and fix two bugs. 2005-11-21 17:18:34 +00:00
libmenu Add missing shared library interdependencies. 2005-11-10 18:07:07 +00:00
libmilter
libmp Add missing shared library interdependencies. 2005-11-10 18:07:07 +00:00
libncp Add missing shared library interdependencies. 2005-11-10 18:07:07 +00:00
libncurses
libnetgraph Recognize all current standard node types. 2005-10-25 20:58:30 +00:00
libngatm
libopie
libpam
libpanel Add missing shared library interdependencies. 2005-11-10 18:07:07 +00:00
libpcap
libpmc -mdoc sweep. 2005-11-17 13:00:00 +00:00
libpthread o Include <sys/time.h> 2005-11-19 04:47:06 +00:00
libradius Add missing shared library interdependencies. 2005-11-10 18:07:07 +00:00
librpcsvc
libsbuf
libsdp
libsm
libsmb Add missing shared library interdependencies. 2005-11-10 18:07:07 +00:00
libsmdb
libsmutil
libstand
libtacplus
libtelnet
libthr Fix name compatible problem with POSIX standard. the sigval_ptr and 2005-11-04 09:41:00 +00:00
libthread_db
libufs
libugidfw
libusbhid
libutil Fix markup, grammar and spelling. 2005-11-18 14:21:28 +00:00
libvgl
libwrap
liby
libypclnt
libz
msun Optimized by eliminating the special case for 0.67434 <= |x| < pi/4. 2005-11-24 02:04:26 +00:00
ncurses Add missing shared library interdependencies. 2005-11-10 18:07:07 +00:00
Makefile Disconnect libc_r from buildworld, it is still kept in the tree to 2005-10-27 03:09:20 +00:00
Makefile.inc