ac1eb54956
Instead of finding the exact size to fit in we can just shift the target by -8 + tail. Doing a blind write to a previously rep stosq'ed area comes with a penalty so do it conditionally. Sample win on EPYC when zeroing a 257 sized buffer (tail = 1) aligned to 16 bytes: before: 44782846 ops/s after: 46118614 ops/s Idea stolen from NetBSD. Sponsored by: The FreeBSD Foundation