freebsd-dev/sys/libkern
Bruce Evans 864c28cf81 Use inline asm instead of unportable intrinsics for the SSE4 crc32
optimization.

This fixes building with gcc-4.2.1 (it doesn't support SSE4).
gas-2.17.50 [FreeBSD] supports SSE4 instructions, so this doesn't
need using .byte directives.

This fixes depending on host user headers in the kernel.

Fix user includes (don't depend on namespace pollution in <nmmintrin.h>
that is not included now).

The instrinsics had no advantages except to sometimes avoid compiler
pessimixations.  clang understands them a bit better than inline asm,
and generates better looking code which also runs better for cem, but
for me it just at the same speed or slower by doing excessive
unrollowing in all the wrong places.  gcc-4.2.1 also doesn't understand
what it is doing with unrolling, but with -O3 somehow it does more
unrolling that helps.

Reduce 1 of the the compiler pessimizations (copying a variable which
already satisfies an "rm" constraint in a good way by being in memory
and not used again, to different memory and accessing it there.  Force
copying it to a register instead).

Try to optimize the inner loops significantly, so as to run at full
speed on smaller inputs.  The algorithm is already very MD, and was
tuned for the throughput of 3 crc32 instructions per cycle found on
at least Sandybridge through Haswell.  Now it is even more tuned for
this, so depends more on the compiler not rearranging or unrolling
things too much.  The main inner loop for should have no difficulty
runing at full speed on these CPUs unless the compiler unrolls it too
much.  However, the main inner loop wasn't even used for buffers smaller
than 24K.  Now it is used for buffers larger than 384 bytes.  Now it
is not so long, and the main outer loop is used more.  The new
optimization is to try to arrange that the outer loop runs in parallel
with the next inner loop except for the final iteration; then reduce
the loop sizes significantly to take advantage of this.

Approved by:	cem
Not tested in production by:	bde
2017-03-26 10:31:48 +00:00
..
arm
x86 Use inline asm instead of unportable intrinsics for the SSE4 crc32 2017-03-26 10:31:48 +00:00
arc4random.c Discard first 3072 bytes of RC4 keystream, this is a bandaid 2017-03-14 06:00:44 +00:00
ashldi3.c Renumber copyright clause 4 2017-02-28 23:42:47 +00:00
ashrdi3.c Renumber copyright clause 4 2017-02-28 23:42:47 +00:00
asprintf.c
bcd.c Use time_t for intermediate values to avoid overflow in clock_ts_to_ct 2017-01-24 18:05:29 +00:00
bcmp.c Renumber copyright clause 4 2017-02-28 23:42:47 +00:00
bsearch.c Renumber copyright clause 4 2017-02-28 23:42:47 +00:00
cmpdi2.c Renumber copyright clause 4 2017-02-28 23:42:47 +00:00
crc32.c calculate_crc32c: Add SSE4.2 implementation on x86 2017-01-31 03:26:32 +00:00
divdi3.c Renumber copyright clause 4 2017-02-28 23:42:47 +00:00
explicit_bzero.c
ffs.c
ffsl.c
ffsll.c
fls.c
flsl.c
flsll.c
fnmatch.c Renumber copyright clause 4 2017-02-28 23:42:47 +00:00
iconv_converter_if.m
iconv_ucs.c libkern: Remove obsolete 'register' keyword 2017-01-12 17:02:29 +00:00
iconv_xlat16.c sys: Replace zero with NULL for pointers. 2017-02-22 02:35:59 +00:00
iconv_xlat.c
iconv.c
inet_aton.c
inet_ntoa.c Remove inet_ntoa() from the kernel 2017-02-16 20:50:01 +00:00
inet_ntop.c
inet_pton.c
jenkins_hash.c
lshrdi3.c Renumber copyright clause 4 2017-02-28 23:42:47 +00:00
mcount.c Renumber copyright clause 4 2017-02-28 23:42:47 +00:00
memcchr.c
memchr.c Renumber copyright clause 4 2017-02-28 23:42:47 +00:00
memcmp.c Renumber copyright clause 4 2017-02-28 23:42:47 +00:00
memmem.c libkern: Remove obsolete 'register' keyword 2017-01-12 17:02:29 +00:00
memmove.c
memset.c
moddi3.c Renumber copyright clause 4 2017-02-28 23:42:47 +00:00
murmur3_32.c
qdivrem.c Renumber copyright clause 4 2017-02-28 23:42:47 +00:00
qsort_r.c
qsort.c Renumber copyright clause 4 2017-02-28 23:42:47 +00:00
quad.h Renumber copyright clause 4 2017-02-28 23:42:47 +00:00
random.c Renumber copyright clause 4 2017-02-28 23:42:47 +00:00
scanc.c Renumber copyright clause 4 2017-02-28 23:42:47 +00:00
strcasecmp.c
strcat.c Renumber copyright clause 4 2017-02-28 23:42:47 +00:00
strchr.c Renumber copyright clause 4 2017-02-28 23:42:47 +00:00
strcmp.c Renumber copyright clause 4 2017-02-28 23:42:47 +00:00
strcpy.c Renumber copyright clause 4 2017-02-28 23:42:47 +00:00
strcspn.c
strdup.c
strlcat.c
strlcpy.c
strlen.c
strncat.c
strncmp.c
strncpy.c Renumber copyright clause 4 2017-02-28 23:42:47 +00:00
strndup.c
strnlen.c
strrchr.c Renumber copyright clause 4 2017-02-28 23:42:47 +00:00
strsep.c Renumber copyright clause 4 2017-02-28 23:42:47 +00:00
strspn.c
strstr.c strstr.c was inadvertently blasted with a copy of isa_nmi.c. Revert 2017-03-01 02:07:51 +00:00
strtol.c Renumber copyright clause 4 2017-02-28 23:42:47 +00:00
strtoq.c Renumber copyright clause 4 2017-02-28 23:42:47 +00:00
strtoul.c Renumber copyright clause 4 2017-02-28 23:42:47 +00:00
strtouq.c Renumber copyright clause 4 2017-02-28 23:42:47 +00:00
strvalid.c
timingsafe_bcmp.c
ucmpdi2.c Renumber copyright clause 4 2017-02-28 23:42:47 +00:00
udivdi3.c Renumber copyright clause 4 2017-02-28 23:42:47 +00:00
umoddi3.c Renumber copyright clause 4 2017-02-28 23:42:47 +00:00
zlib.c