amd64: fix up memset added in r333324

There was a missing trick expanding the passed pattern to a full word
by multiplication. As a side effect non-zero patterns would be
incorrectly laid down.

This stems from the use of rep stosq which is word-sized, while the passed
argument is byte-sized.

I initially repurposed memcpy into memset without taking this into account.
All but non-bzero testing was performed with a variant utilizing ERMS, i.e.
using only stosb which happens to not into the problem whatsoever. So my bad
twice.

Thanks to Oliver Pinter for noting the problem and providing a testcase.
This commit is contained in:
Mateusz Guzik 2018-05-07 20:54:42 +00:00
parent cfb13e0a97
commit bed34b0b04

View File

@ -239,7 +239,8 @@ ENTRY(memset)
PUSH_FRAME_POINTER
movq %rdi,%r9
movq %rdx,%rcx
movq %rsi,%rax
movabs $0x0101010101010101,%rax
imulq %rsi,%rax
shrq $3,%rcx
rep
stosq