Allow __builtin_memset instead of bzero for small buffers of known size

In particular this eliminates function calls and related register save/restore
when only few writes would suffice.

Example speed up can be seen in a fstat microbenchmark on AMD Ryzen cpus, where
the throughput went up by ~4.5%.

Thanks to cem@ for benchmarking and reviewing the patch.

MFC after:	1 week
This commit is contained in:
Mateusz Guzik 2017-09-08 20:09:14 +00:00
parent 3c700e2e4c
commit 2e2baf2ec3

View File

@ -258,6 +258,12 @@ void hexdump(const void *ptr, int length, const char *hdr, int flags);
#define ovbcopy(f, t, l) bcopy((f), (t), (l))
void bcopy(const void * _Nonnull from, void * _Nonnull to, size_t len);
void bzero(void * _Nonnull buf, size_t len);
#define bzero(buf, len) ({ \
if (__builtin_constant_p(len) && (len) <= 64) \
__builtin_memset((buf), 0, (len)); \
else \
bzero((buf), (len)); \
})
void explicit_bzero(void * _Nonnull, size_t);
void *memcpy(void * _Nonnull to, const void * _Nonnull from, size_t len);