Bruce Evans 5776f433ab Extract the high and low words together. With gcc-3.4 on uniformly
distributed non-large args, this saves about 14 of 134 cycles for
Athlon64s and about 5 of 199 cycles for AthlonXPs.

Moved the check for x == 0 inside the check for subnormals.  With
gcc-3.4 on uniformly distributed non-large args, this saves another
5 cycles on Athlon64s and loses 1 cycle on AthlonXPs.

Use INSERT_WORDS() and not SET_HIGH_WORD() when converting the first
approximation from bits to double.  With gcc-3.4 on uniformly distributed
non-large args, this saves another 4 cycles on both Athlon64s and and
AthlonXPs.

Accessing doubles as 2 words may be an optimization on old CPUs, but on
current CPUs it tends to cause extra operations and pipeline stalls,
especially for writes, even when only 1 of the words needs to be accessed.

Removed an unused variable.
2005-12-20 01:21:30 +00:00
..
2005-10-04 22:00:35 +00:00
2005-11-24 10:30:44 +00:00
2005-11-24 10:43:35 +00:00
2005-11-19 04:47:06 +00:00
2005-11-17 13:00:00 +00:00
2005-11-24 10:54:47 +00:00
2005-11-19 04:47:06 +00:00
2005-11-24 11:14:06 +00:00
2005-09-26 06:23:43 +00:00
2005-11-24 11:26:36 +00:00