freebsd-skq/sys/i386
bde eb85aca715 On amd64, declare sse2_pagezero() and start using it again, but only
for zeroing pages in idle where nontemporal writes are clearly best.
This is almost a no-op since zeroing in idle works does nothing good
and is off by default.  Fix END() statement forgotten in previous
commit.

Align the loop in sse2_pagezero().  Since it writes to main memory,
the loop doesn't have to be very carefully written to keep up.
Unrolling it was considered useless or harmful and was not done on
i386, but that was too careless.

Timing for i386: the loop was not unrolled at all, and moved only 4
bytes/iteration.  So on a 2GHz CPU, it needed to run at 2 cycles/
iteration to keep up with a memory speed of just 4GB/sec.  But when
it crossed a 16-byte boundary, on old CPUs it ran at 3 cycles/
iteration so it gave a maximum speed of 2.67GB/sec and couldn't even
keep up with PC3200 memory.  Fix the alignment so that it keep up with
4GB/sec memory, and unroll once to get nearer to 8GB/sec.  Further
unrolling might be useless or harmful since it would prevent the loop
fitting in 16-bytes.  My test system with an old CPU and old DDR1 only
needed 5+ GB/sec.  My test system with a new CPU and DDR3 doesn't need
any changes to keep up ~16GB/sec.

Timing for amd64: with 8-byte accesses and newer faster CPUs it is
easy to reach 16GB/sec but not so easy to go much faster.  The
alignment doesn't matter much if the CPU is not very old.  The loop
was already unrolled 4 times, but needs 32 bytes and uses a fancy
method that doesn't work for 2-way unrolling in 16 bytes.  Just
align it to 32-bytes.
2016-08-29 13:07:21 +00:00
..
acpica If x86 CPU implementation of the MWAIT instruction reasonably 2015-05-09 12:28:48 +00:00
bios Make it explicit that D_MEM cdevsw d_flag is to signify that the 2016-05-01 17:46:56 +00:00
cloudabi32 Convert pointers obtained from the threadattr_t structure with TO_PTR(). 2016-08-24 10:13:18 +00:00
conf Make CloudABI work on i386. 2016-08-22 17:37:31 +00:00
i386 On amd64, declare sse2_pagezero() and start using it again, but only 2016-08-29 13:07:21 +00:00
ibcs2 Don't create pointless backups of generated files in "make sysent". 2016-07-28 21:29:04 +00:00
include Fix build for !SMP kernels after the Xen MSIX workaround. 2016-08-22 21:23:17 +00:00
isa Remove the spic(4) driver for the Sony Vaoi Jogdial. 2016-08-19 23:39:08 +00:00
linux Don't create pointless backups of generated files in "make sysent". 2016-07-28 21:29:04 +00:00
pci As <machine/pmap.h> is included from <vm/pmap.h>, there is no need to 2016-02-22 09:02:20 +00:00
svr4 sys: use our roundup2/rounddown2() macros when param.h is available. 2016-04-21 19:57:40 +00:00
xbox After r261980, the local ptr variable in xbox_init() is no longer used, 2014-02-16 22:48:36 +00:00
Makefile