blog posting [1].
- Use word-sized test for unaligned pointer before working
the hard way.
Memory page boundary is always integral multiple of a word
alignment boundary. Therefore, if we can access memory
referenced by pointer p, then (p & ~word mask) must be also
accessible.
- Better utilization of multi-issue processor's ability of
concurrency.
The previous implementation utilized a formular that must be
executed sequentially. However, the ~, & and - operations can
actually be caculated at the same time when the operand were
different and unrelated.
The original Hacker's Delight formular also offered consistent
performance regardless whether the input would contain
characters with their highest-bit set, as it catches real
nul characters only.
These two optimizations has shown further improvements over the
previous implementation on microbenchmarks on i386 and amd64 CPU
including Pentium 4, Core Duo 2 and i7.
[1] http://vger.kernel.org/~davem/cgi-bin/blog.cgi/2010/03/08#strlen_1
MFC after: 1 month
wcscasecmp(), and wcsncasecmp().
- Make some previously non-standard extensions visible
if POSIX_VISIBLE >= 200809.
- Use restrict qualifiers in stpcpy().
- Declare off_t and size_t in stdio.h.
- Bump __FreeBSD_version in case the new symbols (particularly
getline()) cause issues with ports.
Reviewed by: standards@
reducing branches and doing word-sized operation.
The idea is taken from J.T. Conklin's x86_64 optimized version of strlen(3)
for NetBSD, and reimplemented in C by me.
Discussed on: -arch@
Copyright attribution is kept the same as in original NetBSD source.
Submitted by: Florian Smeets <flo kasimir com>
Obtained from: NetBSD
MFC after: 2 weeks
on long long arguments.
Reviewed by: bde (previous version, that included asm implementation
for all ffs and fls functions on i386 and amd64)
MFC after: 2 weeks
It is the binary equivalent to strstr(3).
void *memmem(const void *big, size_t big_len,
const void *little, size_t little_len);
Submitted by: Pascal Gloor <pascal.gloor at spale.com>
MFC after: 3 days
implementations inspired by the ones in DragonFly. Unlike the
DragonFly versions, these have a small data cache footprint, and my
tests show that they're never slower than the old code except when the
charset or the span is 0 or 1 characters. This implementation is
generally faster than DragonFly until either the charset or the span
gets in the ballpark of 32 to 64 characters.