distributed non-large args, this saves about 14 of 134 cycles for
Athlon64s and about 5 of 199 cycles for AthlonXPs.
Moved the check for x == 0 inside the check for subnormals. With
gcc-3.4 on uniformly distributed non-large args, this saves another
5 cycles on Athlon64s and loses 1 cycle on AthlonXPs.
Use INSERT_WORDS() and not SET_HIGH_WORD() when converting the first
approximation from bits to double. With gcc-3.4 on uniformly distributed
non-large args, this saves another 4 cycles on both Athlon64s and and
AthlonXPs.
Accessing doubles as 2 words may be an optimization on old CPUs, but on
current CPUs it tends to cause extra operations and pipeline stalls,
especially for writes, even when only 1 of the words needs to be accessed.
Removed an unused variable.
The purpose of this change is consistency (not performance improvement:)),
as it was hard to tell if fdrop() is MPSAFE or not when I saw it sometimes
under the Giant and sometimes without it.
Glanced at by: ssouhlal, kan
special handling when zero. This caused no PFSYNC_ACT_DEL message and thus
disfunction of pfflowd and state synchronisation in general.
Discovered by: thompsa
Good catch by: thompsa
MFC after: 7 days
We loaded PAE kernels at 2MB and non-PAE kernels at 4MB to skip
the first PSE page (PSE pages are 2MB on PAE), not at 8MB as it
was incorrectly specified in the previous commit.
Submitted by: jhb
to light by the PR. Specifically, convert these three scripts
into good rc.d citizens, making sure that their functionality
is preserved, but the rc.d framework rules are not broken.
Add support for cleanvar as a regular rc.d script in the
default rc.conf, and document this in the man page.
Add a descriptive comment to rc.conf that regarding the
three emulation/compatibility services provided by abi
so users will not be confused by these services not having
their own startup scripts.
PR: conf/84574
Submitted by: Alexander Botero-Lowry
provide enough room for decompression (up to 2.5MB is necessary). This
should be safe to do since we load i386 kernels after 8MB mark now, so
that 16MB is the minimum amount of RAM necessary to even boot FreeBSD.
This makes bzip2-support practically useable.
memory directly available to loader(8) and friends was limited to 640K on i386.
Those times have passed long time ago and now loader(8) can directly access
up to 4GB of RAM at least theoretically. At the same time, there are several
places where it's assumed that malloc() will only allocate memory within
first megabyte.
Remove that assumption by allocating appropriate bounce buffers for BIOS
calls on stack where necessary.
This allows using memory above first megabyte for heap if necessary.
function approximation for the second step. The polynomial has degree
2 for cbrtf() and 4 for cbrt(). These degrees are minimal for the final
accuracy to be essentially the same as before (slightly smaller).
Adjust the rounding between steps 2 and 3 to match. Unfortunately,
for cbrt(), this breaks the claimed accuracy slightly although incorrect
rounding doesn't. Claim less accuracy since its not worth pessimizing
the polynomial or relying on exhaustive testing to get insignificantly
more accuracy.
This saves about 30 cycles on Athlons (mainly by avoiding 2 divisions)
so it gives an overall optimization in the 10-25% range (a larger
percentage for float precision, especially in 32-bit mode, since other
overheads are more dominant for double precision, surprisingly more
in 32-bit mode).
1. The ELF-64 typedefs are now standardized, so that the libelf port
(devel/libelf) does not need to compensate for not having the
Elf64_Xword and Elf64_Sxword types.
2. ELF Symbol versioning support has been added. This also affects
the libelf port (though configure should detect this correctly).
- in preparing for the third approximation, actually make t larger in
magnitude than cbrt(x). After chopping, t must be incremented by 2
ulps to make it larger, not 1 ulp since chopping can reduce it by
almost 1 ulp and it might already be up to half a different-sized-ulp
smaller than cbrt(x). I have not found any cases where this is
essential, but the think-time error bound depends on it. The relative
smallness of the different-sized-ulp limited the bug. If there are
cases where this is essential, then the final error bound would be
5/6+epsilon instead of of 4/6+epsilon ulps (still < 1).
- in preparing for the third approximation, round more carefully (but
still sloppily to avoid branches) so that the claimed error bound of
0.667 ulps is satisfied in all cases tested for cbrt() and remains
satisfied in all cases for cbrtf(). There isn't enough spare precision
for very sloppy rounding to work:
- in cbrt(), even with the inadequate increment, the actual error was
0.6685 in some cases, and correcting the increment increased this
a little. The fix uses sloppy rounding to 25 bits instead of very
sloppy rounding to 21 bits, and starts using uint64_t instead of 2
words for bit manipulation so that rounding more bits is not much
costly.
- in cbrtf(), the 0.667 bound was already satisfied even with the
inadequate increment, but change the code to almost match cbrt()
anyway. There is not enough spare precision in the Newton
approximation to double the inadequate increment without exceeding
the 0.667 bound, and no spare precision to avoid this problem as
in cbrt(). The fix is to round using an increment of 2 smaller-ulps
before chopping so that an increment of 1 ulp is enough. In cbrt(),
we essentially do the same, but move the chop point so that the
increment of 1 is not needed.
Fixed comments to match code:
- in cbrt(), the second approximation is good to 25 bits, not quite 26 bits.
- in cbrt(), don't claim that the second approximation may be implemented
in single precision. Single precision cannot handle the full exponent
range without minor but pessimal changes to renormalize, and although
single precision is enough, 25 bit precision is now claimed and used.
Added comments about some of the magic for the error bound 4/6+epsilon.
I still don't understand why it is 4/6+ and not 6/6+ ulps.
Indent comments at the right of code more consistently.
we can cache its value in the softc. Eliminates one PCI register
write per call to bge_start().
A 1.8% speedup for UDP_RR test on my old box.
Obtained from: NetBSD(jonathan) via delphij
to be compatible with symbol versioning support as implemented by
GNU libc and documented by http://people.redhat.com/~drepper/symbol-versioning
and LSB 3.0.
Implement dlvsym() function to allow lookups for a specific version of
a given symbol.
case if memory allocation failed.
- Remove fourth argument from VLAN_INPUT_TAG(), that was used
incorrectly in almost all drivers. Indicate failure with
mbuf value of NULL.
In collaboration with: yongari, ru, sam
means:
o Remove Elf64_Quarter,
o Redefine Elf64_Half to be 16-bit,
o Redefine Elf64_Word to be 32-bit,
o Add Elf64_Xword and Elf64_Sxword for 64-bit entities,
o Use Elf_Size in MI code to abstract the difference between
Elf32_Word and Elf64_Word.
o Add Elf_Ssize as the signed counterpart of Elf_Size.
MFC after: 2 weeks
o Remove the unused and non-standard SHT_NUM, PT_COUNT and DT_COUNT.
o Add the STV_DEFAULT, STV_INTERNAL, STV_HIDDEN and STV_PROTECTED
symbol visibility constants.
o Add the ELF32_ST_VISIBILITY and ELF64_ST_VISIBILITY macros to
get the symbol visibility from the st_other field.
o Add the ELFOSABI_AIX, ELFOSABI_OPENVMS and ELFOSABI_NSK constants.
o Add the ET_LOOS, ET_HIOS, ET_LOPROC and ET_HIPROC constants.
o Further flesh out the list of machine types. Note that EM_ALPHA
remains non-standard. The standard value for EM_ALPHA is given
by EM_ALPHA_STD (which is a non-standard name :-)
o Add the SHN_LOOS, SHN_HIOS and SHN_XINDEX constants.
o Add the SHT_INIT_ARRAY, SHT_FINI_ARRAY, SHT_PREINIT_ARRAY, SHT_GROUP
and SHT_SYMTAB_SHNDX constants.
o Add the SHF_MERGE, SHF_STRINGS, SHF_INFO_LINK, SHF_LINK_ORDER,
SHF_OS_NONCONFORMING, SHF_GROUP and SHF_MASKOS constants.
o Add the PF_MASKOS and PF_MASKPROC constants.
o Add the STB_LOOS andf STB_HIOS constants.
o Add the STT_COMMON, STT_LOOS and STT_HIOS constants.
MFC after: 1 week
32-bit entity. Also, don't cast the resulting symbol type value to
a datatype smaller than the st_info field type as a quick way to
mask off the upper bits as it may cause inconsistent behaviour when
the macro is used (without explicit casting) on varargs functions.
MFC after: 1 week