kib 559623d89a Re-apply r306516 (by cem):
Reduce the cost of TLB invalidation on x86 by using per-CPU completion flags

Reduce contention during TLB invalidation operations by using a per-CPU
completion flag, rather than a single atomically-updated variable.

On a Westmere system (2 sockets x 4 cores x 1 threads), dtrace measurements
show that smp_tlb_shootdown is about 50% faster with this patch; observations
with VTune show that the percentage of time spent in invlrng_single_page on an
interrupt (actually doing invalidation, rather than synchronization) increases
from 31% with the old mechanism to 71% with the new one.  (Running a basic file
server workload.)

Submitted by:	Anton Rang <rang at acm.org>
Reviewed by:	cem (earlier version)
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D8041
2016-10-04 17:01:24 +00:00
..
2014-11-25 03:50:31 +00:00
2014-11-22 00:01:14 +00:00
2016-06-23 02:21:37 +00:00
2016-10-04 17:01:24 +00:00
2015-08-05 07:21:44 +00:00
2015-04-01 21:48:54 +00:00
2015-06-02 19:20:39 +00:00
2015-06-28 03:22:26 +00:00
2016-05-03 22:13:04 +00:00
2015-07-16 14:41:58 +00:00