Commit Graph

791 Commits

Author SHA1 Message Date
Mark Johnston
0401989282 vm: Round up npages and alignment for contig reclamation
When searching for runs to reclaim, we need to ensure that the entire
run will be added to the buddy allocator as a single unit.  Otherwise,
it will not be visible to vm_phys_alloc_contig() as it is currently
implemented.  This is a problem for allocation requests that are not a
power of 2 in size, as with 9KB jumbo mbuf clusters.

Reported by:	alc
Reviewed by:	alc
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D28924
2021-03-02 10:21:02 -05:00
Max Laier
14b5a3c7d5 vm pqbatch: move unmanaged page assert under pagequeue lock
This KASSERT is overzealous because of the following race condition:
 1) A managed page which is currently in PQ_LAUNDRY is freed.
    vm_page_free_prep calls vm_page_dequeue_deferred()

    The page state is:
       PQ_LAUNDRY, PGA_DEQUEUE|PGA_ENQUEUED

 2) The laundry worker comes around and pick up the page and calls
    vm_pageout_defer(m, PQ_LAUNDRY, true) to check if page is still in the
    queue.  We do a vm_page_astate_load and get
       PQ_LAUNDRY, PGA_DEQUEUE|PGA_ENQUEUED
    as per above.

 3) The laundry worker is pre-empted and another thread allocates our page
    from the free pool.  For example vm_page_alloc_domain_after calls
    vm_page_dequeue() and sets VPO_UNMANAGED because we are allocating for
    an OBJT_UNMANAGED object.

    The page state is:
       PQ_NONE, 0 - VPO_UNMANAGED

 4) The laundry worker resumes, and processes vm_pageout_defer based on the
    stale astate which leads to a call to vm_page_pqbatch_submit, which will
    trip on the KASSERT.

Submitted by:	mlaier
Reviewed by:	markj, rlibby
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D28563
2021-02-24 15:56:16 -08:00
Mark Johnston
5c18744ea9 vm: Honour the "noreuse" flag to vm_page_unwire_managed()
This flag indicates that the page should be enqueued near the head of
the inactive queue, skipping the LRU queue.  It is used when unwiring
pages from the buffer cache following direct I/O or after I/O when
POSIX_FADV_NOREUSE or _DONTNEED advice was specified, or when
sendfile(SF_NOCACHE) completes.  For the direct I/O and sendfile cases
we only enqueue the page if we decide not to free it, typically because
it's mapped.

Pass "noreuse" through to vm_page_release_toq() so that we actually
honour the desired LRU policy for these scenarios.

Reported by:	bdrewery
Reviewed by:	alc, kib
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D28555
2021-02-10 11:10:27 -05:00
Mark Johnston
81846def34 vm: Fix some bugs in the page busying code
In vm_page_busy_acquire(), load the object pointer using
atomic_load_ptr() as we do elsewhere.  Per the comment, the object
identity must be consistent across sleeps.

In vm_page_grab_sleep(), pass the correct pindex to
_vm_page_busy_sleep().  The pindex is used to re-check the page's
identity before going to sleep.  In particular, vm_page_grab_sleep() is
used in unlocked grab, so the object lock is not necessarily held when
verifying the page's identity, and the pindex may change if the page is
moved, or freed and re-allocated.  I believe this can result in spurious
VM_PAGER_FAILs from vm_page_grab_valid_unlocked() or early termination
of vm_page_grab_pages_unlocked().

In vm_page_grab_pages(), pass the correct pindex to
vm_page_grab_sleep().  Otherwise I believe vm_page_grab_pages() will
effectively spin when attempting to busy a busy page after the first
index in the range.

Reviewed by:	alc, kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D27607
2020-12-27 17:01:44 -05:00
Mark Johnston
1fea4b25c9 Wrap a long line in vm_pqbatch_process_page() 2020-11-19 15:41:42 +00:00
Mark Johnston
9e3e737608 Micro-optimize vm_page_pqbatch_submit()
Avoid calling vm_page_domain() twice.

Discussed with:	alc (in D27207)
2020-11-19 15:40:58 +00:00
Mark Johnston
431fb8abd7 vm_phys: Try to clean up NUMA KPIs
It can useful for code outside the VM system to look up the NUMA domain
of a page backing a virtual or physical address, specifically when
creating NUMA-aware data structures.  We have _vm_phys_domain() for
this, but the leading underscore implies that it's an internal function,
and vm_phys.h has dependencies on a number of other headers.

Rename vm_phys_domain() to vm_page_domain(), and _vm_phys_domain() to
vm_phys_domain().  Make the latter an inline function.

Add _vm_phys.h and define struct vm_phys_seg there so that it's easier
to use in other headers.  Include it from vm_page.h so that
vm_page_domain() can be defined there.

Include machine/vmparam.h from _vm_phys.h since it depends directly on
some constants defined there.

Reviewed by:	alc
Reviewed by:	dougm, kib (earlier versions)
Differential Revision:	https://reviews.freebsd.org/D27207
2020-11-19 03:59:21 +00:00
Konstantin Belousov
6f3b523c9a Avoid dump_avail[] redefinition.
Move dump_avail[] extern declaration and inlines into a new header
vm/vm_dumpset.h.  This fixes default gcc build for mips.

Reviewed by:	alc, scottph
Tested by:	kevans (previous version)
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D26741
2020-10-14 22:51:40 +00:00
Bryan Drewery
c2c6fb90e0 Use unlocked page lookup for inmem() to avoid object lock contention
Reviewed By:	kib, markj
Submitted by:	mlaier
Sponsored by:	Dell EMC
Differential Revision:	https://reviews.freebsd.org/D26653
2020-10-09 23:49:42 +00:00
D Scott Phillips
00e6614750 Sparsify the vm_page_dump bitmap
On Ampere Altra systems, the sparse population of RAM within the
physical address space causes the vm_page_dump bitmap to be much
larger than necessary, increasing the size from ~8 Mib to > 2 Gib
(and overflowing `int` for the size).

Changing the page dump bitmap also changes the minidump file
format, so changes are also necessary in libkvm.

Reviewed by:	jhb
Approved by:	scottl (implicit)
MFC after:	1 week
Sponsored by:	Ampere Computing, Inc.
Differential Revision:	https://reviews.freebsd.org/D26131
2020-09-21 22:21:59 +00:00
D Scott Phillips
ab041f713a Move vm_page_dump bitset array definition to MI code
These definitions were repeated by all architectures, with small
variations. Consolidate the common definitons in machine
independent code and use bitset(9) macros for manipulation. Many
opportunities for deduplication remain in the machine dependent
minidump logic. The only intended functional change is increasing
the bit index type to vm_pindex_t, allowing the indexing of pages
with address of 8 TiB and greater.

Reviewed by:	kib, markj
Approved by:	scottl (implicit)
MFC after:	1 week
Sponsored by:	Ampere Computing, Inc.
Differential Revision:	https://reviews.freebsd.org/D26129
2020-09-21 22:20:37 +00:00
Konstantin Belousov
89d2fb14d5 Add interruptible variant of vm_wait(9), vm_wait_intr(9).
Also add msleep flags argument to vm_wait_doms(9).

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D24652
2020-09-08 23:28:09 +00:00
Mark Johnston
a2d704d19f Avoid unnecessary object locking in vm_page_grab_pages_unlocked().
We were needlessly acquiring the object lock to call
vm_page_grab_pages() even when all of the requested pages were looked up
locklessly.  Fix that, stop testing for count == 0 in
vm_page_grab_pages(), and add assertions to help catch this kind of
mistake.

Reported by:	cem
Reviewed by:	alc, cem, dougm, jeff
Differential Revision:	https://reviews.freebsd.org/D26304
2020-09-02 19:59:25 +00:00
Mark Johnston
411096d034 Permit vm_page_wire() to be called on pages not belonging to an object.
For such pages ref_count is effectively a consumer-managed field, but
there is no harm in calling vm_page_wire() on them.
vm_page_unwire_noq() handles them as well.  Relax the vm_page_wire()
assertions to permit this case which is triggered by some out-of-tree
code. [1]

Also guard a conditional assertion with INVARIANTS.  Otherwise the
conditions are evaluated even though the result is unused. [2]

Reported by:	bz, cem [1], kib [2]
Reviewed by:	dougm, kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26173
2020-08-25 13:45:06 +00:00
Conrad Meyer
b7883452d4 Back out unrelated change
Reported by:	kib, markj
X-MFC-With:	r364129
2020-08-12 00:21:30 +00:00
Conrad Meyer
0292c54bdb Add support for multithreading the inactive queue pageout within a domain.
In very high throughput workloads, the inactive scan can become overwhelmed
as you have many cores producing pages and a single core freeing.  Since
Mark's introduction of batched pagequeue operations, we can now run multiple
inactive threads working on independent batches.

To avoid confusing the pid and other control algorithms, I (Jeff) do this in
a mpi-like fan out and collect model that is driven from the primary page
daemon.  It decides whether the shortfall can be overcome with a single
thread and if not dispatches multiple threads and waits for their results.

The heuristic is based on timing the pageout activity and averaging a
pages-per-second variable which is exponentially decayed. This is visible in
sysctl and may be interesting for other purposes.

I (Jeff) have verified that this does indeed double our paging throughput
when used with two threads. With four we tend to run into other contention
problems.  For now I would like to commit this infrastructure with only a
single thread enabled.

The number of worker threads per domain can be controlled with the
'vm.pageout_threads_per_domain' tunable.

Submitted by:	jeff (earlier version)
Discussed with:	markj
Tested by:	pho
Sponsored by:	probably Netflix (based on contemporary commits)
Differential Revision:	https://reviews.freebsd.org/D21629
2020-08-11 20:37:45 +00:00
Mark Johnston
efec381dd1 Remove most lingering references to the page lock in comments.
Finish updating comments to reflect new locking protocols introduced
over the past year.  In particular, vm_page_lock is now effectively
unused.

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25868
2020-08-04 14:59:43 +00:00
Mark Johnston
958d8f527c Remove the volatile qualifier from busy_lock.
Use atomic(9) to load the lock state.  Some places were doing this
already, so it was inconsistent.  In initialization code, the lock state
is still initialized with plain stores.

Reviewed by:	alc, kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25861
2020-07-29 19:38:49 +00:00
Mark Johnston
782ebde52e vm_page_free_invalid(): Relax the xbusy assertion.
vm_page_assert_xbusied() asserts that the busying thread is the current
thread.  For some uses of vm_page_free_invalid() (e.g., error handling
in vnode_pager_generic_getpages_done()), this condition might not hold.

Reported by:	Jenkins via trasz
Reviewed by:	chs, kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25828
2020-07-27 14:25:10 +00:00
Chuck Silvers
4dfa06e114 Add a new function vm_page_free_invalid() for freeing invalid pages
that might be wired.  If the page is wired then it cannot be freed now,
but the thread that eventually unwires it will free it at that point.

Reviewed by:	markj, kib
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D25430
2020-07-17 23:09:36 +00:00
Scott Long
ffc568ba8b Revert r362998, r326999 while a better compatibility strategy is devised. 2020-07-09 22:38:36 +00:00
Scott Long
b302c2e5c9 Migrate the feature of excluding RAM pages to use "excludelist"
as its nomenclature.

MFC after:	1 week
2020-07-07 20:33:11 +00:00
Konstantin Belousov
ee06cffcd2 vm_page_free_prep(): correct description of the required page and object state.
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D25482
2020-06-27 02:31:39 +00:00
Mark Johnston
a9ea09e548 Re-check for wirings after busying the page in vm_page_release_locked().
A concurrent unlocked lookup can wire the page after
vm_page_release_locked() releases the last wiring, in which case
vm_page_release_locked() must not free the page.  Once the xbusy lock is
acquired, that, the object lock and the fact that the page is unmapped
ensure that the wire count cannot increase, so re-check for new wirings
after the page is xbusied.

Update the comment above vm_page_wired() to reflect the new
synchronization rules.

Reported by:	glebius
Reviewed by:	alc, jeff, kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D24592
2020-04-28 13:51:41 +00:00
Mark Johnston
70e68b19a4 Handle trashed queue pointers in vm_page_acquire_unlocked().
vm_page_acquire_unlocked() relies on type-stability of vm_page
structures and assumes that the listq linkage pointers always point to a
vm_page or are NULL.  QUEUE_MACRO_DEBUG_TRASH breaks that assumption, so
add an explicit check for a trashed queue pointer before dereferencing.

Reported and tested by:	pho
Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D24472
2020-04-20 14:45:17 +00:00
Bryan Drewery
adc0388117 Remove dead code leftover from r331018.
Sponsored by:	Dell EMC
2020-03-31 01:12:53 +00:00
Konstantin Belousov
a7c55b3e1b ddb show pginfo: print pages reference value in hex.
It is more useful this way after the VPRC_ flags were introduced.

Sponsored by:	The FreeBSD Foundation
2020-03-28 12:21:52 +00:00
Jeff Roberson
d1105e9441 Check for busy or wired in vm_page_relookup(). Some callers will only keep
a page wired and expect it to still be present.

Reported by:	delphij@FreeBSD.org
Reviewed by:	kib
2020-03-11 22:25:45 +00:00
Mark Johnston
d869a17e62 Use COUNTER_U64_DEFINE_EARLY() in places where it simplifies things.
Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D23978
2020-03-06 19:10:00 +00:00
Mark Johnston
1ed42f6fdd Avoid doubly wiring a newly allocated page in vm_page_grab_valid().
This fixes a regression from r358363.

Reported by:	manu, jbeich
Tested by:	jbeich
2020-03-01 22:09:11 +00:00
Jeff Roberson
6be21eb778 Provide a lock free alternative to resolve bogus pages. This is not likely
to be much of a perf win, just a nice code simplification.

Reviewed by:	markj, kib
Differential Revision:	https://reviews.freebsd.org/D23866
2020-02-28 21:42:48 +00:00
Jeff Roberson
3f39f80ab3 Support the NOCREAT flag for grab_valid_unlocked.
Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D23865
2020-02-28 20:32:35 +00:00
Jeff Roberson
c49be4f1c6 Add unlocked grab* function variants that use lockless radix code to
lookup pages.  These variants will fall back to their locked counterparts
if the page is not present.

Discussed with:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D23449
2020-02-27 02:37:27 +00:00
Pawel Biernacki
7029da5c36 Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE.  All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by:	kib (mentor, blanket)
Commented by:	kib, gallatin, melifaro
Differential Revision:	https://reviews.freebsd.org/D23718
2020-02-26 14:26:36 +00:00
Ryan Libby
eaa17d4291 sys/vm: quiet -Wwrite-strings
Discussed with:	kib
Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D23796
2020-02-23 03:32:04 +00:00
Jeff Roberson
6c5f36ff30 Eliminate some unnecessary uses of UMA_ZONE_VM. Only zones involved in
virtual address or physical page allocation need to be marked with this
flag.

Reviewed by:	markj
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D23712
2020-02-19 08:17:27 +00:00
Jeff Roberson
f212367b42 Refactor _vm_page_busy_sleep to reduce the delta between the various
sleep routines and introduce a variant that supports lockless sleep.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D23612
2020-02-17 01:08:00 +00:00
Mateusz Guzik
23ed568caa vm: remove no longer needed atomic_load_ptr casts 2020-02-14 23:16:29 +00:00
Mark Johnston
06ef60525f Fix handling of WAITFAIL in vm_page_grab() and vm_page_grab_pages().
After sleeping through a memory shortage, we must return NULL rather
than retry.

Discussed with:	jeff
Reported by:	pho
Sponsored by:	The FreeBSD Foundation
2020-02-13 23:18:35 +00:00
Jeff Roberson
ee9e43f8dd Add an explicit busy state for free pages. This improves behavior with
potential bugs that access freed pages as well as providing a path
towards lockless page lookup.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D23444
2020-02-04 20:33:01 +00:00
Mark Johnston
f0a273c00f Remove a couple of lingering usages of the page lock.
Update vm_page_scan_contig() and vm_page_reclaim_run() to stop using
vm_page_change_lock().  It has no use after r356157.  Remove
vm_page_change_lock() now that it has no users.

Remove an unncessary check for wirings in vm_page_scan_contig(), which
was previously checking twice.  The check is racy until
vm_page_reclaim_run() ensures that the page is unmapped, so one check is
sufficient.

Reviewed by:	jeff, kib (previous versions)
Tested by:	pho (previous version)
Differential Revision:	https://reviews.freebsd.org/D23279
2020-02-01 18:23:51 +00:00
Jeff Roberson
d6e13f3b4d Don't hold the object lock while calling getpages.
The vnode pager does not want the object lock held.  Moving this out allows
further object lock scope reduction in callers.  While here add some missing
paging in progress calls and an assert.  The object handle is now protected
explicitly with pip.

Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D23033
2020-01-19 23:47:32 +00:00
Jeff Roberson
a81c400e75 Simplify VM and UMA startup by eliminating boot pages. Instead use careful
ordering to allocate early pages in the same way boot pages were but only
as needed.  After the KVA allocator has started up we allocate the KVA that
we consumed during boot.  This also makes the boot pages freeable since they
have vm_page structures allocated with the rest of memory.

Parts of this patch were written and tested by markj.

Reviewed by:	glebius, markj
Differential Revision:	https://reviews.freebsd.org/D23102
2020-01-16 05:01:21 +00:00
Gleb Smirnoff
9328cbc047 Always multiple vm.pgcache_zone_max to number of CPUs, and rename it
respectively.  The tunable controls how big is the size of per-cpu
vm page cache.  Previously the value was split for all CPUs in system,
so configuring same value on machines with different count of CPUs
yielded in different cache size available to a particular CPU.

Reviewed by:	markj
Obtained from:	Netflix
2020-01-10 19:32:08 +00:00
Jeff Roberson
79c9f9429a Fix uma boot pages calculations on NUMA machines that also don't have
MD_UMA_SMALL_ALLOC.  This is unusual but not impossible.  Fix the alignemnt
of zones while here.  This was already correct because uz_cpu strongly
aligned the zone structure but the specified alignment did not match
reality and involved redundant defines.

Reviewed by:	markj, rlibby
Differential Revision:	https://reviews.freebsd.org/D23046
2020-01-06 02:51:19 +00:00
Mark Johnston
758b2c02bb Restore a vm_page_wired() check in vm_page_mvqueue() after r356156.
We now set PGA_DEQUEUE on a managed page when it is wired after
allocation, and vm_page_mvqueue() ignores pages with this flag set,
ensuring that they do not end up in the page queues.  However, this is
not sufficient for managed fictitious pages or pages managed by the
TTM.  In particular, the TTM makes use of the plinks.q queue linkage
fields for its own purposes.

PR:	242961
Reported and tested by:	Greg V <greg@unrelenting.technology>
2019-12-29 20:01:03 +00:00
Mark Johnston
9b888dd9bd Clear queue op flags in vm_page_mvqueue().
This fixes a regression in r356155, introduced at the last minute.  In
particular, we must clear PGA_REQUEUE_HEAD before inserting into any
queue besides PQ_INACTIVE since that operation is implemented only for
PQ_INACTIVE.

Reported by:	pho, Jenkins via lwhsu
2019-12-29 15:39:43 +00:00
Mark Johnston
727150ff03 Remove some unused functions.
The previous series of patches orphaned some vm_page functions, so
remove them.

Reviewed by:	dougm, kib
Sponsored by:	Netflix, Intel
Differential Revision:	https://reviews.freebsd.org/D22886
2019-12-28 19:04:29 +00:00
Mark Johnston
9f5632e6c8 Remove page locking for queue operations.
With the previous reviews, the page lock is no longer required in order
to perform queue operations on a page.  It is also no longer needed in
the page queue scans.  This change effectively eliminates remaining uses
of the page lock and also the false sharing caused by multiple pages
sharing a page lock.

Reviewed by:	jeff
Tested by:	pho
Sponsored by:	Netflix, Intel
Differential Revision:	https://reviews.freebsd.org/D22885
2019-12-28 19:04:00 +00:00
Mark Johnston
b7f30bff2f Generalize lazy dequeue logic for wired pages.
Some recent work aims to remove the use of the page lock for
synchronizing updates to page queue state.  This change adds a mechanism
to preserve the existing behaviour of lazily dequeuing wired pages,
which was previously synchronized using the page lock.

Handle this by setting PGA_DEQUEUE when a managed page's wire count
transitions from 0 to 1.  When the page daemon encounters a page with a
flag in PGA_QUEUE_OP_MASK set, it creates a batch queue entry for that
page, but in so doing it does not modify the page itself and thus racing
with a concurrent free of the page is harmless.  The flag is advisory;
the page daemon still checks for wirings after acquiring the object and
page xbusy locks.

vm_page_unwire_managed() now clears PGA_DEQUEUE on a 1->0 transition.
It must do this before dropping the reference to avoid a use-after-free
but also handles races with concurrent wirings to ensure that
PGA_DEQUEUE is not left unset on a wired page.

Reviewed by:	jeff
Tested by:	pho
Sponsored by:	Netflix, Intel
Differential Revision:	https://reviews.freebsd.org/D22882
2019-12-28 19:03:46 +00:00