this change is to improve concurrency:
- Drop global state stored in the shadow overflow page table (and all other
global state)
- Remove all global locks
- Use per-PTE lock bits to allow parallel page insertion
- Reconstruct state when requested for evicted PTEs instead of buffering
it during overflow
This drops total wall time for make buildworld on a 32-thread POWER8 system
by a factor of two and system time by a factor of three, providing performance
20% better than similarly clocked Core i7 Xeons per-core. Performance on
smaller SMP systems, where PMAP lock contention was not as much of an issue,
is nearly unchanged.
Tested on: POWER8, POWER5+, G5 UP, G5 SMP (64-bit and 32-bit kernels)
Merged from: user/nwhitehorn/ppc64-pmap-rework
Looked over by: jhibbits, andreast
MFC after: 3 months
Relnotes: yes
Sponsored by: FreeBSD Foundation