Some x86 class CPUs have accelerated intrinsics for SHA1 and SHA256.
Provide this functionality on CPUs that support it.
This implements CRYPTO_SHA1, CRYPTO_SHA1_HMAC, and CRYPTO_SHA2_256_HMAC.
Correctness: The cryptotest.py suite in tests/sys/opencrypto has been
enhanced to verify SHA1 and SHA256 HMAC using standard NIST test vectors.
The test passes on this driver. Additionally, jhb's cryptocheck tool has
been used to compare various random inputs against OpenSSL. This test also
passes.
Rough performance averages on AMD Ryzen 1950X (4kB buffer):
aesni: SHA1: ~8300 Mb/s SHA256: ~8000 Mb/s
cryptosoft: ~1800 Mb/s SHA256: ~1800 Mb/s
So ~4.4-4.6x speedup depending on algorithm choice. This is consistent with
the results the Linux folks saw for 4kB buffers.
The driver borrows SHA update code from sys/crypto sha1 and sha256. The
intrinsic step function comes from Intel under a 3-clause BSDL.[0] The
intel_sha_extensions_sha<foo>_intrinsic.c files were renamed and lightly
modified (added const, resolved a warning or two; included the sha_sse
header to declare the functions).
[0]: https://software.intel.com/en-us/articles/intel-sha-extensions-implementations
Reviewed by: jhb
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D12452