Commit Graph

3818 Commits

Author SHA1 Message Date
Mark Johnston
64b3893010 Initialize marker pages in vm_page_domain_init().
They were previously initialized by the corresponding page daemon
threads, but for vmd_inacthead this may be too late if
vm_page_deactivate_noreuse() is called during boot.

Reported and tested by:	cperciva
Reviewed by:	alc, kib
MFC after:	1 week
2018-04-19 14:09:44 +00:00
Mark Johnston
9de8fcfddf Ensure that m and skip_m belong to the same object.
Pages allocated from a given reservation may belong to different
objects. It is therefore possible for vm_page_ps_test() to be called
with the base page's object unlocked. Check for this case before
asserting that the object lock is held.

Reported by:	jhb
Reviewed by:	kib
MFC after:	1 week
2018-04-17 18:49:17 +00:00
Konstantin Belousov
e55d32b7b3 Handle Skylake-X errata SKZ63.
SKZ63 Processor May Hang When Executing Code In an HLE Transaction
Region

Problem: Under certain conditions, if the processor acquires an HLE
(Hardware Lock Elision) lock via the XACQUIRE instruction in the Host
Physical Address range between 40000000H and 403FFFFFH, it may hang
with an internal timeout error (MCACOD 0400H) logged into
IA32_MCi_STATUS.

Move the pages from the range into the blacklist.  Add a tunable to
not waste 4M if local DoS is not the issue.

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D15001
2018-04-07 17:06:13 +00:00
Brooks Davis
6469bdcdb6 Move most of the contents of opt_compat.h to opt_global.h.
opt_compat.h is mentioned in nearly 180 files. In-progress network
driver compabibility improvements may add over 100 more so this is
closer to "just about everywhere" than "only some files" per the
guidance in sys/conf/options.

Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of
sys/compat/linux/*.c.  A fake _COMPAT_LINUX option ensure opt_compat.h
is created on all architectures.

Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the
set of compiled files.

Reviewed by:	kib, cem, jhb, jtl
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14941
2018-04-06 17:35:35 +00:00
Mark Johnston
c098768e4d Ensure the background laundering threshold is positive after a scan.
The division added in r331732 meant that we wouldn't attempt a
background laundering until at least v_free_target - v_free_min clean
pages had been freed by the page daemon since the last laundering. If
the inactive queue is depleted but not completely empty (e.g., because
it contains busy pages), it can thus take a long time to meet this
threshold. Restore the pre-r331732 behaviour of using a non-zero
background laundering threshold if at least one inactive queue scan has
elapsed since the last attempt at background laundering.

Submitted by:	tijl (original version)
2018-04-02 15:07:41 +00:00
Gleb Smirnoff
b92b26ad08 Use UMA_SLAB_SPACE macro. No functional change here. 2018-04-02 05:15:25 +00:00
Gleb Smirnoff
96a10340ce In uma_startup_count() handle special case when zone will fit into
single slab, but with alignment adjustment it won't. Again, when
there is only one item in a slab alignment can be ignored. See
previous revision of this file for more info.

PR:		227116
2018-04-02 05:14:31 +00:00
Gleb Smirnoff
1ca6ed4589 Handle a special case when a slab can fit only one allocation,
and zone has a large alignment. With alignment taken into
account uk_rsize will be greater than space in a slab. However,
since we have only one item per slab, it is always naturally
aligned.

Code that will panic before this change with 4k page:

	z = uma_zcreate("test", 3984, NULL, NULL, NULL, NULL, 31, 0);
	uma_zalloc(z, M_WAITOK);

A practical scenario to hit the panic is a machine with 56 CPUs
and 2 NUMA domains, which yields in zone size of 3984.

PR:		227116
MFC after:	2 weeks
2018-04-02 05:11:59 +00:00
Jeff Roberson
c33e3a642b Add a uma cache of free pages in the DEFAULT freepool. This gives us
per-cpu alloc and free of pages.  The cache is filled with as few trips
to the phys allocator as possible by the use of a new
vm_phys_alloc_npages() function which allocates as many as N pages.

This code was originally by markj with the import function rewritten by
me.

Reviewed by:	markj, kib
Tested by:	pho
Sponsored by:	Netflix, Dell/EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D14905
2018-04-01 04:50:05 +00:00
Jeff Roberson
e8bb2dc7c9 Add the flag ZONE_NOBUCKETCACHE. This flag instructions UMA not to keep
a cache of fully populated buckets.  This will be used in a follow-on
commit.

The flag idea was originally from markj.

Reviewed by:	markj, kib
Tested by:	pho
Sponsored by:	Netflix, Dell/EMC Isilon
2018-04-01 04:47:05 +00:00
Konstantin Belousov
19ea042eb8 Make vm_map_max/min/pmap KBI stable.
There are out of tree consumers of vm_map_min() and vm_map_max(), and
I believe there are consumers of vm_map_pmap(), although the later is
arguably less in the need of KBI-stable interface. For the consumers
benefit, make modules using this KPI not depended on the struct vm_map
layout.

Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D14902
2018-03-30 10:55:31 +00:00
Mark Johnston
6068486258 Fix the background laundering mechanism after r329882.
Rather than using the number of inactive queue scans as a metric for
how many clean pages are being freed by the page daemon, have the
page daemon keep a running counter of the number of pages it has freed,
and have the laundry thread use that when computing the background
laundering threshold.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D14884
2018-03-29 14:27:40 +00:00
Jeff Roberson
e5818a53db Implement several enhancements to NUMA policies.
Add a new "interleave" allocation policy which stripes pages across
domains with a stride or width keeping contiguity within a multi-page
region.

Move the kernel to the dedicated numbered cpuset #2 making it possible
to assign kernel threads and memory policy separately from user.  This
also eliminates the need for the complicated interrupt binding code.

Add a sysctl API for viewing and manipulating domainsets.  Refactor some
of the cpuset_t manipulation code using the generic bitset type so that
it can be used for both.  This probably belongs in a dedicated subr file.

Attempt to improve the include situation.

Reviewed by:	kib
Discussed with:	jhb (cpuset parts)
Tested by:	pho (before review feedback)
Sponsored by:	Netflix, Dell/EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D14839
2018-03-29 02:54:50 +00:00
Jeff Roberson
146bf2c66d Move vm_ndomains to vm.h where it can be used with a single header include
rather than requiring a half-dozen.  Many non-vm files may want to know
the number of valid domains.

Sponsored by:	Netflix, Dell/EMC Isilon
2018-03-27 03:27:02 +00:00
Konstantin Belousov
8ec533d336 Allow to specify for vm_fault_quick_hold_pages() that nofault mode
should be honored.

We must not sleep or acquire any MI VM locks if TDP_NOFAULTING is
specified.  On the other hand, there were some callers in the tree
which set TDP_NOFAULTING for larger scope than needed, I fixed the
code which I wrote, but I suspect that linuxkpi and out of tree drm
drivers might abuse this still.

So only enable the mode for vm_fault_quick_hold_pages() where
vm_fault_hold() is not called when specifically asked by user.  I
decided to use vm_prot_t flag to not change KPI.  Since number of
flags in vm_prot_t is limited, I reused the same flag which was
already consumed for vm_map_lookup().

Reported and tested by:	pho (as part of the larger patch)
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D14825
2018-03-26 16:31:12 +00:00
Konstantin Belousov
ed9e8bc468 Account the size of the vslock-ed memory by the thread.
Assert that all such memory is unwired on return to usermode.

The count of the wired memory will be used to detect the copyout mode.

Tested by:	pho (as part of the larger patch)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2018-03-24 13:51:27 +00:00
Konstantin Belousov
63b5d112b6 For vm_zone_stats() sysctl handler, do not drain sbuf calling
copyout(9) while owning zone lock.

Despite old value sysctl buffer is wired, spurious faults might still
occur.

Note that we still own the uma_rwlock there, but this lock does not
participate in sensitive lock orders.

Reported and tested by:	pho (as part of the larger patch)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2018-03-24 13:48:53 +00:00
Jeff Roberson
2d3f4181de Fix two compliation problems on non-amd64 architectures. 2018-03-23 18:24:02 +00:00
Mark Johnston
4046851367 Correct a couple of assertion messages in vm_page_reclaim_run().
MFC after:	3 days
2018-03-23 14:38:56 +00:00
Cy Schubert
72346b2232 Fix build on i386 without INVARIANTS following r331369.
--- vm_reserv.o ---
In file included from /opt/src/svn-current/sys/vm/vm_reserv.c:48:
In file included from /opt/src/svn-current/sys/sys/counter.h:37:
./machine/counter.h:174:3: error: implicit declaration of function
'critical_enter' is invalid in C99 [-Werror,-Wimplicit-function-declarat
ion]
                critical_enter();

Reviewed by:	jeff@
2018-03-23 03:22:30 +00:00
Jeff Roberson
5c930c894d Lock reservations with a dedicated lock in each reservation. Protect the
vmd_free_count with atomics.

This allows us to allocate and free from reservations without the free lock
except where a superpage is allocated from the physical layer, which is
roughly 1/512 of the operations on amd64.

Use the counter api to eliminate cache conention on counters.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	Netflix, Dell/EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D14707
2018-03-22 19:21:11 +00:00
Jeff Roberson
9a4b4cd3bc Start witness much earlier in boot so that we can shrink the pend list and
make it more immune to further change.

Reviewed by:	markj, imp (Part of D14707)
Sponsored by:	Netflix, Dell/EMC Isilon
2018-03-22 19:11:43 +00:00
Jeff Roberson
cdfeced8ff Use read_mostly and alignment tags to eliminate or limit false sharing.
Reviewed by:	markj (Part of D14707)
Sponsored by:	Netflix, Dell/EMC Isilon
2018-03-22 19:06:50 +00:00
Konstantin Belousov
79e9552ebb Check for wrap-around in vm_phys_alloc_seg_contig().
It is possible to provide insane values for size in contigmalloc(9)
request, which usually not reaches the phys allocator due to failing
KVA allocation.  But with the forthcoming 4/4 i386, where 32bit
architecture has almost 4G KVA, contigmalloc(1G) is not unreasonable
outright and KVA might be available sometimes.

Then, the calculation of pa_end could wrap around, depending on the
physical address, and the checks in vm_phys_alloc_seg_contig() would
pass while the iteration in the loop after the 'done' label goes out
of the vm_page_array bounds.

Fix it by detecting the wrap.

Reported and tested by:	pho
Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D14767
2018-03-20 16:17:55 +00:00
Mark Johnston
c6a70eaea8 Avoid dequeuing the fault page during a soft fault.
Such pages are re-enqueued at the end of the fault handler, preserving
LRU. Rather than performing two separate operations per fault, simply
requeue the page at the end of the fault (or bump its activation count
if it resides in PQ_ACTIVE, avoiding the page queue lock entirely).
This elides some page lock and page queue lock operations in common
cases, e.g., CoW faults.

Note that we must still dequeue the source page for "optimized" CoW
faults since the page may not remain enqueued while it is moved to
another object.

Reviewed by:	alc, kib
Tested by:	pho
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D14625
2018-03-18 16:49:30 +00:00
Mark Johnston
0eb50f9cd2 Have vm_page_{deactivate,launder}() requeue already-queued pages.
In many cases the page is not enqueued so the change will have no
effect. However, the change is needed to support an optimization in
the fault handler and in some cases (sendfile, the buffer cache) it
was being emulated by the caller anyway.

Reviewed by:	alc
Tested by:	pho
MFC after:	2 weeks
X-Differential Revision: https://reviews.freebsd.org/D14625
2018-03-18 16:40:56 +00:00
Mark Johnston
434862acb1 Have vm_page_replace() assert that the new page is not enqueued.
The new page does not belong to a VM object, but the page daemon does
not expect to encounter such pages.

Reviewed by:	alc, kib
Tested by:	pho
MFC after:	1 week
X-Differential Revision: https://reviews.freebsd.org/D14625
2018-03-18 16:35:40 +00:00
Conrad Meyer
5d3b36666b Fix GCC build: Remove redundant pagedaemon_wakeup declaration
Introduced in r331018.

Reported by:	kevans
Sponsored by:	Dell EMC Isilon
2018-03-16 07:05:09 +00:00
Jeff Roberson
30fbfdda6c Eliminate pageout wakeup races. Take another step towards lockless
vmd_free_count manipulation.  Reduce the scope of the free lock by
using a pageout lock to synchronize sleep and wakeup.  Only trigger
the pageout daemon on transitions between states.  Drive all wakeup
operations directly as side-effects from freeing memory rather than
requiring an additional function call.

Reviewed by:	markj, kib
Tested by:	pho
Sponsored by:	Netflix, Dell/EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D14612
2018-03-15 19:23:07 +00:00
Konstantin Belousov
741e1c9196 Revert the chunk from r330410 in vm_page_reclaim_run().
There, the pages freed might be managed but the page's lock is not
owned.  For KPI correctness, the page lock is requried around the call
to vm_page_free_prep(), which is asserted.  Reclaim loop already did
the work which could be done by vm_page_free_prep(), so the lock is
not needed and the only consequence of not owning it is the assert
trigger.

Instead of adding the locking to satisfy the assert, revert to the
code that calls vm_page_free_phys() directly.

Reported by:	pho
Discussed with:	jeff
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2018-03-13 18:27:23 +00:00
Jeff Roberson
f4af595964 Don't assert that the domain free lock is held until we're certain that
there is a valid reservation.  This can trip erroneously when memory
falls within a domain but doesn't have the reservation initialized because
it does not meet size or alignment requirements.

Reported by:	pho, mjg
Sponsored by:	Netflix, Dell/EMC Isilon
2018-03-07 22:04:27 +00:00
Konstantin Belousov
2a8e8f7892 Remove redundant test from r330410.
If the input slist is non-empty, counter cannot be zero after freeing.

Noted by:	mjg
MFC after:	2 weeks
2018-03-04 21:15:31 +00:00
Konstantin Belousov
8c8ee2ee1c Unify bulk free operations in several pmaps.
Submitted by:	Yoshihiro Ota
Reviewed by:	markj
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D13485
2018-03-04 20:53:20 +00:00
Mark Johnston
3b8cf4acf0 Give the 0th domain's page daemon thread a consistent name.
Page daemon threads for other domains show up in ps(1) output as
"pagedaemon/domN", so let that be the case for domain 0 as well.

Submitted by:	Kevin Bowling <kevin.bowling@kev009.com>
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D14518
2018-02-27 16:51:09 +00:00
Mark Johnston
59d3150b58 Restore the pre-r329882 inactive page shortage computation.
With r329882, in the absence of a free page shortage we would only take
len(PQ_INACTIVE)+len(PQ_LAUNDRY) into account when deciding whether to
aggressively scan PQ_ACTIVE. Previously we would also include the
number of free pages in this computation, ensuring that we wouldn't scan
PQ_ACTIVE with plenty of free memory available. The change in behaviour
was most noticeable immediately after booting, when PQ_INACTIVE and
PQ_LAUNDRY are nearly empty.

Reviewed by:	jeff
2018-02-24 20:47:22 +00:00
Konstantin Belousov
cd84455f91 Hide all vm/vm_pageout.h content under #ifdef _KERNEL.
There are no parts useful for usermode applications in
vm/vm_pageout.h.  Even for the specific applications like fstat and
lsof.

In my opinion, this protection is redundant and instead userspace
should not include the header at all.  Since there are apparently
broken third party codebases, give them a bit of slack by providing
transitional period.

Reported by:	julian
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2018-02-24 10:26:26 +00:00
Mark Johnston
5f70fb1425 Correct some comments after r328954.
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D14486
2018-02-23 23:27:53 +00:00
Mark Johnston
9140bff7ed Remove a bogus assertion from vm_page_launder().
After r328977, a wired page m may have m->queue != PQ_NONE.

Reviewed by:	kib
X-MFC with:	r328977
Differential Revision:	https://reviews.freebsd.org/D14485
2018-02-23 23:25:22 +00:00
Jeff Roberson
5f8cd1c0bf Add a generic Proportional Integral Derivative (PID) controller algorithm and
use it to regulate page daemon output.

This provides much smoother and more responsive page daemon output, anticipating
demand and avoiding pageout stalls by increasing the number of pages to match
the workload.  This is a reimplementation of work done by myself and mlaier at
Isilon.

Reviewed by:	bsdimp
Tested by:	pho
Sponsored by:	Netflix, Dell/EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D14402
2018-02-23 22:51:51 +00:00
Konstantin Belousov
2c0f13aa59 vm_wait() rework.
Make vm_wait() take the vm_object argument which specifies the domain
set to wait for the min condition pass.  If there is no object
associated with the wait, use curthread' policy domainset.  The
mechanics of the wait in vm_wait() and vm_wait_domain() is supplied by
the new helper vm_wait_doms(), which directly takes the bitmask of the
domains to wait for passing min condition.

Eliminate pagedaemon_wait().  vm_domain_clear() handles the same
operations.

Eliminate VM_WAIT and VM_WAITPFAULT macros, the direct functions calls
are enough.

Eliminate several control state variables from vm_domain, unneeded
after the vm_wait() conversion.

Scetched and reviewed by:	jeff
Tested by:	pho
Sponsored by:	The FreeBSD Foundation, Mellanox Technologies
Differential revision:	https://reviews.freebsd.org/D14384
2018-02-20 10:13:13 +00:00
Mark Johnston
3f060b60b1 Use the conventional name for an array of pages.
No functional change intended.

Discussed with:	kib
MFC after:	3 days
2018-02-16 15:38:22 +00:00
Konstantin Belousov
ada27a3bb8 Cleanup unused page argument for vm_reserv_break().
Reviewed by:	markj
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D14364
2018-02-14 00:34:02 +00:00
Konstantin Belousov
d929ad7f91 Ensure memory consistency on COW.
From the submitter description:
The process is forked transitioning a map entry to COW
Thread A writes to a page on the map entry, faults, updates the pmap to
  writable at a new phys addr, and starts TLB invalidations...
Thread B acquires a lock, writes to a location on the new phys addr, and
  releases the lock
Thread C acquires the lock, reads from the location on the old phys addr...
Thread A ...continues the TLB invalidations which are completed
Thread C ...reads from the location on the new phys addr, and releases
  the lock

In this example Thread B and C [lock, use and unlock] properly and
neither own the lock at the same time.  Thread A was writing somewhere
else on the page and so never had/needed the lock. Thread C sees a
location that is only ever read|modified under a lock change beneath
it while it is the lock owner.

To fix this, perform the two-stage update of the copied PTE.  First,
the PTE is updated with the address of the new physical page with
copied content, but in read-only mode.  The pmap locking and the page
busy state during PTE update and TLB invalidation IPIs ensure that any
writer to the page cannot upgrade the PTE to the writable state until
all CPUs updated their TLB to not cache old mapping.  Then, after the
busy state of the page is lifted, the faults for write can proceed and
do not violate the consistency of the reads.

The change is done in vm_fault because most architectures do need IPIs
to invalidate remote TLBs.  More, I think that hardware guarantees of
atomicity of the remote TLB invalidation are not enough to prevent the
inconsistent reads of non-atomic reads, like multi-word accesses
protected by a lock.  So instead of modifying each pmap invalidation
code, I did it there.

Discovered and analyzed by: Elliott.Rabe@dell.com
Reviewed by:	markj
PR:	225584 (appeared to have the same cause)
Tested by:	Elliott.Rabe@dell.com, emaste, Mike Tancsa <mike@sentex.net>, truckman
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D14347
2018-02-14 00:31:45 +00:00
Konstantin Belousov
607970bc8e Do not call pmap_enter() with invalid protection mode.
If the map entry elookup was performed due to the mapping changes, we
need to ensure that there is still some access permission bit
requested which is compatible with the current vm_map_entry mode.  If
not, restart the handler from scratch instead of trying to save the
current progress.

Also adjust fault_type to not include cleared permission bits.

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D14347
2018-02-14 00:25:18 +00:00
Konstantin Belousov
c4be9169c0 Do not leak rv->psind in some specific situations.
Suppose that we have an object with a mapped superpage, and that all
pages in the superpages are held (by some driver).  Additionally,
suppose that the object is terminated, e.g. because the only process
mapping it is exiting.  Then the reservation is broken, but the pages
cannot be freed until later, when they are unheld.  In this situation,
the reservation code cannot clean psind, since no pages are freed, and
the page is freed and then reused with invalid psind.

Clean psind on vm_reserv_break() to avoid the situation.

Reported and tested by:	Slava Shwartsman
Reviewed by:	markj
Sponsored by:	Mellanox Technologies
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D14335
2018-02-13 15:36:28 +00:00
Jeff Roberson
e958ad4cf3 Make v_wire_count a per-cpu counter(9) counter. This eliminates a
significant source of cache line contention from vm_page_alloc().  Use
accessors and vm_page_unwire_noq() so that the mechanism can be easily
changed in the future.

Reviewed by:	markj
Discussed with:	kib, glebius
Tested by:	pho (earlier version)
Sponsored by:	Netflix, Dell/EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D14273
2018-02-12 22:53:00 +00:00
Gleb Smirnoff
f7d3578564 Fix boot_pages exhaustion on machines with many domains and cores, where
size of UMA zone allocation is greater than page size. In this case zone
of zones can not use UMA_MD_SMALL_ALLOC, and we  need to postpone switch
off of this zone from startup_alloc() until full launch of VM.

o Always supply number of VM zones to uma_startup_count(). On machines
  with UMA_MD_SMALL_ALLOC ignore it completely, unless zsize goes over
  a page. In the latter case account VM zones for number of allocations
  from the zone of zones.
o Rewrite startup_alloc() so that it will immediately switch off from
  itself any zone that is already capable of running real alloc.
  In worst case scenario we may leak a single page here. See comment
  in uma_startup_count().
o Hardcode call to uma_startup2() into vm_mem_init(). Otherwise some
  extra SYSINITs, e.g. vm_page_init() may sneak in before.
o While here, remove uma_boot_pages_mtx. With recent changes to boot
  pages calculation, we are guaranteed to use all of the boot_pages
  in the early single threaded stage.

Reported & tested by:	mav
2018-02-09 04:45:39 +00:00
Gleb Smirnoff
5073a08328 Fix three miscalculations in amount of boot pages:
o Most of startup zones have struct uma_slab embedded into the slab,
  so provide macro UMA_SLAB_SPACE and use it instead of UMA_SLAB_SIZE,
  when calculating how many pages would certain kind of allocations
  require. Some zones are offpage, so we might have a positive inaccuracy.
o The keg for the zone of zones is allocated "dynamically", so we
  need +1 when calculating amount of pages for kegs. [1]
o The zones of zones and zones of kegs have arbitrary alignment of 32,
  and this also needs to be accounted for. [2]

While here, spread more comments and improve diagnostic messages.

Reported by:	pho [1], jtl [2]
2018-02-07 18:32:51 +00:00
Mark Johnston
1d3a1bcfac Dequeue wired pages lazily.
Previously, wiring a page would cause it to be removed from its page
queue. In the common case, unwiring causes it to be enqueued at the tail
of that page queue. This change modifies vm_page_wire() to not dequeue
the page, thus avoiding the highly contended page queue locks. Instead,
vm_page_unwire() takes care of requeuing the page as a single operation,
and the page daemon dequeues wired pages as they are encountered during
a queue scan to avoid needlessly revisiting them later. For pages in
PQ_ACTIVE we do even better, since a requeue is unnecessary.

The change improves scalability for some common workloads. For instance,
threads wiring pages into the buffer cache no longer need to modify
global page queues, and unwiring is usually done by the bufspace thread,
so concurrency is not as much of an issue. As another example, many
sysctl handlers wire the output buffer to avoid faults on copyout, and
since the buffer is likely to be in PQ_ACTIVE, we now entirely avoid
modifying the page queue in this case.

The change also adds a block comment describing some properties of
struct vm_page's reference counters, and the busy lock.

Reviewed by:	jeff
Discussed with:	alc, kib
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D11943
2018-02-07 16:57:10 +00:00
Gleb Smirnoff
d2be4a1e4f Use correct arithmetic to calculate how many pages we need for kegs
and hashes.  There is no functional change with current sizes.
2018-02-06 22:13:40 +00:00