freebsd-skq/sys
bde 5a798b035b Use inline asm instead of unportable intrinsics for the SSE4 crc32
optimization.

This fixes building with gcc-4.2.1 (it doesn't support SSE4).
gas-2.17.50 [FreeBSD] supports SSE4 instructions, so this doesn't
need using .byte directives.

This fixes depending on host user headers in the kernel.

Fix user includes (don't depend on namespace pollution in <nmmintrin.h>
that is not included now).

The instrinsics had no advantages except to sometimes avoid compiler
pessimixations.  clang understands them a bit better than inline asm,
and generates better looking code which also runs better for cem, but
for me it just at the same speed or slower by doing excessive
unrollowing in all the wrong places.  gcc-4.2.1 also doesn't understand
what it is doing with unrolling, but with -O3 somehow it does more
unrolling that helps.

Reduce 1 of the the compiler pessimizations (copying a variable which
already satisfies an "rm" constraint in a good way by being in memory
and not used again, to different memory and accessing it there.  Force
copying it to a register instead).

Try to optimize the inner loops significantly, so as to run at full
speed on smaller inputs.  The algorithm is already very MD, and was
tuned for the throughput of 3 crc32 instructions per cycle found on
at least Sandybridge through Haswell.  Now it is even more tuned for
this, so depends more on the compiler not rearranging or unrolling
things too much.  The main inner loop for should have no difficulty
runing at full speed on these CPUs unless the compiler unrolls it too
much.  However, the main inner loop wasn't even used for buffers smaller
than 24K.  Now it is used for buffers larger than 384 bytes.  Now it
is not so long, and the main outer loop is used more.  The new
optimization is to try to arrange that the outer loop runs in parallel
with the next inner loop except for the final iteration; then reduce
the loop sizes significantly to take advantage of this.

Approved by:	cem
Not tested in production by:	bde
2017-03-26 10:31:48 +00:00
..
amd64 specific end of interrupt implementation for AMD Local APIC 2017-03-25 18:45:09 +00:00
arm Preserve VFP state across signal delivery. 2017-03-26 08:36:56 +00:00
arm64 Add 'device iic' to bring in userland I2C driver. 2017-03-24 22:33:03 +00:00
boot The original author abused Nd (one-line description, used by makewhatis) 2017-03-23 08:34:30 +00:00
bsm
cam Remove "UNMAPPED" messages printed on da periph attach. 2017-03-23 10:50:45 +00:00
cddl MFV r315290, r315291: 7303 dynamic metaslab selection 2017-03-24 09:37:00 +00:00
compat Implement Linux mincore() system call. 2017-03-25 15:47:29 +00:00
conf Use inline asm instead of unportable intrinsics for the SSE4 crc32 2017-03-26 10:31:48 +00:00
contrib Copy needed include files from EDK2. This is a minimal set gleened 2017-03-08 02:47:59 +00:00
crypto Remove pc98 support completely. 2017-01-28 02:22:15 +00:00
ddb Fix right shifts on arches with db_expr_t larger than u_int (LP64 arches 2017-03-18 07:01:18 +00:00
dev iwn: deduplicate code in iwn_tx_data() and iwn_tx_data_raw(). 2017-03-26 09:41:08 +00:00
fs remove procfs ctl interface 2017-03-05 03:05:24 +00:00
gdb
geom After r315112 I broke the tests with eli, instead to pass 0, I should pass 2017-03-13 13:56:01 +00:00
gnu Update our device tree files to a Linux 4.10 2017-03-07 13:56:49 +00:00
i386 specific end of interrupt implementation for AMD Local APIC 2017-03-25 18:45:09 +00:00
isa Renumber copyright clause 4 2017-02-28 23:42:47 +00:00
kern dtrace sched:::preempt should fire only when there is preemption 2017-03-25 19:08:51 +00:00
kgssapi
libkern Use inline asm instead of unportable intrinsics for the SSE4 crc32 2017-03-26 10:31:48 +00:00
mips [mips/broadcom]: Early boot NVRAM support 2017-03-23 19:29:12 +00:00
modules Add a module to build imx5 dtb files. 2017-03-19 19:10:23 +00:00
net Correct handling of ALTQ with epair(4) interfaces but presenting that ALTQ(9) is supported. 2017-03-24 00:55:16 +00:00
net80211 net80211: fix possible panic when wlan(4) interface is destroyed. 2017-03-24 22:29:51 +00:00
netgraph mppc - Finish pluging NETGRAPH_MPPC_COMPRESSION. 2017-01-20 00:02:11 +00:00
netinet Fix reference count leak with L2 caching. 2017-03-25 15:06:28 +00:00
netinet6 Fix reference count leak with L2 caching. 2017-03-25 15:06:28 +00:00
netipsec Introduce the concept of IPsec security policies scope. 2017-03-07 00:13:53 +00:00
netnatm
netpfil pf: Fix possible shutdown race 2017-03-22 21:18:18 +00:00
netsmb
nfs Renumber copyright clause 4 2017-02-28 23:42:47 +00:00
nfsclient Renumber copyright clause 4 2017-02-28 23:42:47 +00:00
nfsserver Renumber copyright clause 4 2017-02-28 23:42:47 +00:00
nlm
ofed Add full VNET support to the inet_get_local_port_range() function in 2017-03-22 15:46:31 +00:00
opencrypto
powerpc Don't bother checking core version 2017-03-24 01:52:10 +00:00
riscv Implement atomic_fcmpset_*() for RISC-V. 2017-02-05 00:32:12 +00:00
rpc Renumber copyright clause 4 2017-02-28 23:42:47 +00:00
security Audit 'fd' and 'cmd' arguments to fcntl(2), and when generating BSM, 2016-11-22 00:41:24 +00:00
sparc64 Renumber copyright clause 4 2017-02-28 23:42:47 +00:00
sys move thread switch tracing from mi_switch to sched_switch 2017-03-23 08:57:04 +00:00
teken Fix bright colors for syscons, and make them work for the first time 2017-03-18 11:13:54 +00:00
tests
tools [fdt] Make DTBs generated by make_dtb.sh overlay-ready 2017-03-10 22:45:07 +00:00
ufs Renumber copyright clause 4 2017-02-28 23:42:47 +00:00
vm Two changes to vm_fault_populate(): 2017-03-19 19:52:47 +00:00
x86 Provide less laborius way to enable busdma DMAR to only short list of devices. 2017-03-26 00:40:35 +00:00
xdr
xen xenstore: fix suspension when using the xenstore device 2017-03-07 09:17:48 +00:00
Makefile Remove pc98 support completely. 2017-01-28 02:22:15 +00:00