freebsd-dev/sys/amd64
Mateusz Guzik 97bb9a0818 amd64: make memset less slow with mov
rep stos has a high startup time even on modern microarchitectures like
Skylake. Intel optimization manuals discuss how for small sizes it is
beneficial to go for streaming stores. Since those cannot be used without
extra penalty in the kernel I investigated performance impact of just
regular movs.

The patch below implements a very simple scheme: a 32-byte loop followed
by filling in the remainder of at most 31 bytes. It has a 256 breaking
point on which it falls back to rep stos. It provides a significant win
over the current primitive on several machines I tested (both Intel and
AMD). A 64-byte loop did not provide any benefit even for multiple of 64
sizes.

See the review for benchmark data.

Reviewed by:	kib
Approved by:	re (gjb)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17398
2018-10-05 19:25:09 +00:00
..
acpica Rename assym.s to assym.inc 2018-03-20 17:58:51 +00:00
amd64 amd64: make memset less slow with mov 2018-10-05 19:25:09 +00:00
cloudabi32 Use TO_PTR() to convert integers to pointers. 2017-11-26 14:45:56 +00:00
cloudabi64 Use TO_PTR() to convert integers to pointers. 2017-11-26 14:45:56 +00:00
conf amd64: enable options NUMA in GENERIC and MINIMAL 2018-09-11 23:54:31 +00:00
ia32 Use SMAP on amd64. 2018-07-29 20:47:00 +00:00
include Handle a guest executing a vm instruction by trapping and raising an 2018-09-27 11:16:19 +00:00
linux Futex support functions in linux.ko and linux32.ko on amd64 should be 2018-08-07 18:29:10 +00:00
linux32 Futex support functions in linux.ko and linux32.ko on amd64 should be 2018-08-07 18:29:10 +00:00
pci sys/amd64: further adoption of SPDX licensing ID tags. 2017-11-27 15:03:07 +00:00
sgx Rename assym.s to assym.inc 2018-03-20 17:58:51 +00:00
vmm Handle a guest executing a vm instruction by trapping and raising an 2018-09-27 11:16:19 +00:00
Makefile Bring the tags and links entries for amd64 up to date. 2015-10-27 22:59:24 +00:00