- Operate on uint64_t types when doing XORing, etc. instead of uint8_t.
- Don't bzero() temporary block for every AES block. Do it once for entire
data block.
- AES-NI is available only on little endian architectures. Simplify code
that takes block number from IV.
Benchmarks:
Memory-backed md(4) device, software AES-XTS, 4kB sector:
# dd if=/dev/md0.eli bs=1m
59.61MB/s
Memory-backed md(4) device, old AES-NI AES-XTS, 4kB sector:
# dd if=/dev/md0.eli bs=1m
97.29MB/s
Memory-backed md(4) device, new AES-NI AES-XTS, 4kB sector:
# dd if=/dev/md0.eli bs=1m
221.26MB/s
127% performance improvement between old and new code.
Harddisk, raw speed:
# dd if=/dev/ada0 bs=1m
137.63MB/s
Harddisk, software AES-XTS, 4kB sector:
# dd if=/dev/ada0.eli bs=1m
47.83MB/s (34% of raw disk speed)
Harddisk, old AES-NI AES-XTS, 4kB sector:
# dd if=/dev/ada0.eli bs=1m
68.33MB/s (49% of raw disk speed)
Harddisk, new AES-NI AES-XTS, 4kB sector:
# dd if=/dev/ada0.eli bs=1m
108.35MB/s (78% of raw disk speed)
58% performance improvement between old and new code.
As a side-note, GELI with AES-NI using AES-CBC can achive native disk speed.
MFC after: 3 days