83c001d3c2
Reduce the cost of TLB invalidation on x86 by using per-CPU completion flags Reduce contention during TLB invalidation operations by using a per-CPU completion flag, rather than a single atomically-updated variable. On a Westmere system (2 sockets x 4 cores x 1 threads), dtrace measurements show that smp_tlb_shootdown is about 50% faster with this patch; observations with VTune show that the percentage of time spent in invlrng_single_page on an interrupt (actually doing invalidation, rather than synchronization) increases from 31% with the old mechanism to 71% with the new one. (Running a basic file server workload.) Submitted by: Anton Rang <rang at acm.org> Reviewed by: cem (earlier version) Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D8041 |
||
---|---|---|
.. | ||
xen | ||
_align.h | ||
_inttypes.h | ||
_limits.h | ||
_stdint.h | ||
_types.h | ||
acpica_machdep.h | ||
apicreg.h | ||
apicvar.h | ||
apm_bios.h | ||
bus.h | ||
busdma_impl.h | ||
cputypes.h | ||
dump.h | ||
elf.h | ||
endian.h | ||
fdt.h | ||
float.h | ||
fpu.h | ||
frame.h | ||
init.h | ||
legacyvar.h | ||
mca.h | ||
metadata.h | ||
mptable.h | ||
ofw_machdep.h | ||
pci_cfgreg.h | ||
psl.h | ||
ptrace.h | ||
pvclock.h | ||
reg.h | ||
segments.h | ||
setjmp.h | ||
sigframe.h | ||
signal.h | ||
specialreg.h | ||
stack.h | ||
stdarg.h | ||
sysarch.h | ||
trap.h | ||
ucontext.h | ||
vdso.h | ||
vmware.h | ||
x86_smp.h | ||
x86_var.h |