with macro based around memcmp(). The latter is expected to be some
8 times faster on a modern 64-bit architectures.
In practice, throughput of doing conv=sparse from /dev/zero to /dev/null
went up some 5-fold here from 1.9GB/sec to 9.7GB/sec with this change
(bs=128k).
MFC after: 2 weeks