Commit Graph

1 Commits

Author SHA1 Message Date
Conrad Meyer
6be2ff7d3e calculate_crc32c: Add SSE4.2 implementation on x86
Derived from an implementation by Mark Adler.

The fast loop performs three simultaneous CRCs over subsets of the data
before composing them.  This takes advantage of certain properties of
the CRC32 implementation in Intel hardware.  (The CRC instruction takes 1
cycle but has 2-3 cycles of latency.)

The CRC32 instruction does not manipulate FPU state.

i386 does not have the crc32q instruction, so avoid it there.  Otherwise
the implementation is identical to amd64.

Add basic userland tests to verify correctness on a variety of inputs.

PR:		216467
Reported by:	Ben RUBSON <ben.rubson at gmail.com>
Reviewed by:	kib@, markj@ (earlier version)
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D9342
2017-01-31 03:26:32 +00:00