a373e66b85
exponent bits of the reduced result, construct 2**k (hopefully in parallel with the construction of the reduced result) and multiply by it. This tends to be much faster if the construction of 2**k is actually in parallel, and might be faster even with no parallelism since adjustment of the exponent requires a read-modify-wrtite at an unfortunate time for pipelines. In some cases involving exp2* on amd64 (A64), this change saves about 40 cycles or 30%. I think it is inherently only about 12 cycles faster in these cases and the rest of the speedup is from partly-accidentally avoiding compiler pessimizations (the construction of 2**k is now manually scheduled for good results, and -O2 doesn't always mess this up). In most cases on amd64 (A64) and i386 (A64) the speedup is about 20 cycles. The worst case that I found is expf on ia64 where this change is a pessimization of about 10 cycles or 5%. The manual scheduling for plain exp[f] is harder and not as tuned. This change ld128/s_exp2l.c has not been tested. |
||
---|---|---|
.. | ||
bind | ||
csu | ||
libalias | ||
libarchive | ||
libatm | ||
libautofs | ||
libbegemot | ||
libbluetooth | ||
libbsm | ||
libbsnmp | ||
libbz2 | ||
libc | ||
libc_r | ||
libcalendar | ||
libcam | ||
libcom_err | ||
libcompat | ||
libcrypt | ||
libdevinfo | ||
libdevstat | ||
libdisk | ||
libedit | ||
libelf | ||
libexpat | ||
libfetch | ||
libftpio | ||
libgeom | ||
libgpib | ||
libgssapi | ||
libipsec | ||
libipx | ||
libkiconv | ||
libkse | ||
libkvm | ||
libmagic | ||
libmd | ||
libmemstat | ||
libmilter | ||
libmp | ||
libncp | ||
libnetgraph | ||
libngatm | ||
libopie | ||
libpam | ||
libpcap | ||
libpmc | ||
libradius | ||
librpcsvc | ||
librt | ||
libsbuf | ||
libsdp | ||
libsm | ||
libsmb | ||
libsmdb | ||
libsmutil | ||
libstand | ||
libtacplus | ||
libtelnet | ||
libthr | ||
libthread_db | ||
libufs | ||
libugidfw | ||
libusbhid | ||
libutil | ||
libvgl | ||
libwrap | ||
liby | ||
libypclnt | ||
libz | ||
msun | ||
ncurses | ||
Makefile | ||
Makefile.inc |