freebsd-skq/sys/amd64/include
Bruce Evans 1a5735873e On amd64, declare sse2_pagezero() and start using it again, but only
for zeroing pages in idle where nontemporal writes are clearly best.
This is almost a no-op since zeroing in idle works does nothing good
and is off by default.  Fix END() statement forgotten in previous
commit.

Align the loop in sse2_pagezero().  Since it writes to main memory,
the loop doesn't have to be very carefully written to keep up.
Unrolling it was considered useless or harmful and was not done on
i386, but that was too careless.

Timing for i386: the loop was not unrolled at all, and moved only 4
bytes/iteration.  So on a 2GHz CPU, it needed to run at 2 cycles/
iteration to keep up with a memory speed of just 4GB/sec.  But when
it crossed a 16-byte boundary, on old CPUs it ran at 3 cycles/
iteration so it gave a maximum speed of 2.67GB/sec and couldn't even
keep up with PC3200 memory.  Fix the alignment so that it keep up with
4GB/sec memory, and unroll once to get nearer to 8GB/sec.  Further
unrolling might be useless or harmful since it would prevent the loop
fitting in 16-bytes.  My test system with an old CPU and old DDR1 only
needed 5+ GB/sec.  My test system with a new CPU and DDR3 doesn't need
any changes to keep up ~16GB/sec.

Timing for amd64: with 8-byte accesses and newer faster CPUs it is
easy to reach 16GB/sec but not so easy to go much faster.  The
alignment doesn't matter much if the CPU is not very old.  The loop
was already unrolled 4 times, but needs 32 bytes and uses a fancy
method that doesn't work for 2-way unrolling in 16 bytes.  Just
align it to 32-bytes.
2016-08-29 13:07:21 +00:00
..
pc Add more UEFI/e820 memory types from latest specifications. 2016-07-24 09:15:11 +00:00
xen x86/xen: Consolidate xen-os.h in a single place 2015-10-21 10:04:35 +00:00
_align.h
_bus.h
_inttypes.h
_limits.h
_stdint.h
_types.h
acpica_machdep.h
apm_bios.h
asm.h Revert r274772: it is not valid on MIPS 2014-11-25 03:50:31 +00:00
asmacros.h Extend earlier addition of stack frames to most of support.S. This makes 2014-11-13 22:11:44 +00:00
atomic.h atomic: Add testandclear on i386/amd64 2016-05-16 07:19:33 +00:00
bus_dma.h
bus.h
clock.h xen: implement an early timer for Xen PVH 2014-03-11 10:20:42 +00:00
counter.h Replace a number of conflations of mp_ncpus and mp_maxid with either 2016-07-06 14:09:49 +00:00
cpu.h amd64/i386: introduce APIC hooks for different APIC implementations. 2014-06-16 08:43:03 +00:00
cpufunc.h For amd64 non-PCID machines, and for i386 machines with support for 2015-12-03 11:14:14 +00:00
cputypes.h Move shared variables from {amd64,i386}/initcpu.c to x86/identcpu.c. 2015-12-23 21:41:42 +00:00
db_machdep.h
dump.h Factor out duplicated code from dumpsys() on each architecture into generic 2015-01-07 01:01:39 +00:00
elf.h
endian.h
exec.h
fdt.h
float.h
floatingpoint.h
fpu.h Create a separate structure for per-CPU state saved across suspend and 2014-09-06 15:23:28 +00:00
frame.h
gdb_machdep.h
ieeefp.h
in_cksum.h Rationalize BSD license on sys/*/include/in_cksum.h 2015-08-05 19:05:12 +00:00
intr_machdep.h Fix build for !SMP kernels after the Xen MSIX workaround. 2016-08-22 21:23:17 +00:00
iodev.h
kdb.h
limits.h
md_var.h On amd64, declare sse2_pagezero() and start using it again, but only 2016-08-29 13:07:21 +00:00
memdev.h
metadata.h Move amd64 metadata.h to x86 and share with i386 2016-01-07 19:47:26 +00:00
minidump.h
mp_watchdog.h
nexusvar.h
npx.h
ofw_machdep.h
param.h Enable DEVICE_NUMA with up to 8 domains by default on amd64. 2016-04-12 21:23:44 +00:00
pcb.h Export various helper variables describing the layout and size of 2015-11-12 22:00:59 +00:00
pci_cfgreg.h
pcpu.h Rewrite amd64 PCID implementation to follow an algorithm described in 2015-05-09 19:11:01 +00:00
pmap.h Add locking annotations to amd64 struct md_page members. 2016-05-10 09:58:51 +00:00
pmc_mdep.h Use single instance of the identical INKERNEL() and PMC_IN_KERNEL() 2015-07-02 14:37:21 +00:00
ppireg.h
proc.h Eliminate pvh_global_lock from the amd64 pmap. 2016-05-14 23:35:11 +00:00
profile.h
psl.h
ptrace.h
pvclock.h Generalized parts of the XEN timer code into a generic pvclock 2015-02-04 08:26:43 +00:00
reg.h
reloc.h
resource.h Add support for managing PCI bus numbers. As with BARs and PCI-PCI bridge 2014-02-12 04:30:37 +00:00
runq.h
segments.h Hide struct pcb definition by #ifdef __amd64__ braces. If cc -m32 2013-11-26 19:38:42 +00:00
setjmp.h
sf_buf.h Merge all MD sf_buf allocators into one MI, residing in kern/subr_sfbuf.c 2014-08-05 09:44:10 +00:00
sigframe.h
signal.h
smp.h Merge common parts of i386 and amd64 md_var.h and smp.h into 2015-12-07 17:41:20 +00:00
specialreg.h
stack.h Merge stack(9) implementations for i386 and amd64 under x86/. 2015-09-11 03:24:07 +00:00
stdarg.h
sysarch.h
timerreg.h
trap.h
tss.h
ucontext.h
varargs.h
vdso.h
vm.h Reassign copyright statements on several files from Advanced 2015-04-23 14:22:20 +00:00
vmm_dev.h Restructure memory allocation in bhyve to support "devmem". 2015-06-18 06:00:17 +00:00
vmm_instruction_emul.h Deprecate the 3-way return values from vm_gla2gpa() and vm_copy_setup(). 2015-05-06 16:25:20 +00:00
vmm.h sys/amd64: Small spelling fixes. 2016-05-03 22:13:04 +00:00
vmparam.h Retire VM_FREEPOOL_CACHE as the next step in eliminating PG_CACHE pages. 2015-06-08 04:59:32 +00:00