freebsd-skq

d/freebsd-skq

Fork 0

Commit Graph

Author	SHA1	Message	Date
John-Mark Gurney	038ffd3e43	make it so that from/to can be missaligned as it can happen (the geli regression manages to do it)... We use a packed struct to coerce gcc/clang into producing unaligned loads (there is not packed pointer attribute, otherwise this would be easier)... use _storeu_ and _loadu_ when using the structure is overkill... be better at using types properly... Since we allocate our own key schedule and make sure it's aligned, use the __m128i type in various arguments to functions... clang ignores __aligned on prototypes and gcc errors on them, leave them in comments to document that these function arguments are require to be aligned... about all that changes is movdqa -> movdqu from reading the diff of the disassembly output... Noticed by: symbolics at gmx.com MFC after: 3 days	2013-11-06 19:14:49 +00:00
John-Mark Gurney	ff6c7bf5ca	Use the fact that the AES-NI instructions can be pipelined to improve performance... Use SSE2 instructions for calculating the XTS tweek factor... Let the compiler do more work and handle register allocation by using intrinsics, now only the key schedule is in assembly... Replace .byte hard coded instructions w/ the proper instructions now that both clang and gcc support them... On my machine, pulling the code to userland I saw performance go from ~150MB/sec to 2GB/sec in XTS mode. GELI on GNOP saw a more modest increase of about 3x due to other system overhead (geom and opencrypto)... These changes allow almost full disk io rate w/ geli... Reviewed by: -current, -security Thanks to: Mike Hamburg for the XTS tweek algorithm	2013-09-03 18:31:23 +00:00

Author

SHA1

Message

Date

John-Mark Gurney

038ffd3e43

make it so that from/to can be missaligned as it can happen (the geli

regression manages to do it)...  We use a packed struct to coerce
gcc/clang into producing unaligned loads (there is not packed pointer
attribute, otherwise this would be easier)...

use _storeu_ and _loadu_ when using the structure is overkill...

be better at using types properly...  Since we allocate our own key
schedule and make sure it's aligned, use the __m128i type in various
arguments to functions...

clang ignores __aligned on prototypes and gcc errors on them, leave them
in comments to document that these function arguments are require to be
aligned...

about all that changes is movdqa -> movdqu from reading the diff of the
disassembly output...

Noticed by:	symbolics at gmx.com
MFC after:	3 days

2013-11-06 19:14:49 +00:00

John-Mark Gurney

ff6c7bf5ca

Use the fact that the AES-NI instructions can be pipelined to improve

performance... Use SSE2 instructions for calculating the XTS tweek
factor...  Let the compiler do more work and handle register allocation
by using intrinsics, now only the key schedule is in assembly...

Replace .byte hard coded instructions w/ the proper instructions now
that both clang and gcc support them...

On my machine, pulling the code to userland I saw performance go from
~150MB/sec to 2GB/sec in XTS mode.  GELI on GNOP saw a more modest
increase of about 3x due to other system overhead (geom and
opencrypto)...

These changes allow almost full disk io rate w/ geli...

Reviewed by:	-current, -security
Thanks to:	Mike Hamburg for the XTS tweek algorithm

2013-09-03 18:31:23 +00:00

2 Commits