Commit Graph

4783 Commits

Author SHA1 Message Date
Mark Johnston
fff19e0ed2 vm_object: Remove redundant OBJ_SWAP checks
With the removal of OBJT_DEFAULT, OBJ_ANON implies OBJ_SWAP.

Note, this means that vm_object_split() is more expensive than it used
to be, as it holds busy locks until the end of the range is reached,
even if the object has no swap blocks allocated.

Reviewed by:	alc, kib
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D35789
2022-07-17 07:09:48 -04:00
Mark Johnston
0cb2610ee2 vm: Remove handling for OBJT_DEFAULT objects
Now that OBJT_DEFAULT objects can't be instantiated, we can simplify
checks of the form object->type == OBJT_DEFAULT || (object->flags &
OBJ_SWAP) != 0.  No functional change intended.

Reviewed by:	alc, kib
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D35788
2022-07-17 07:09:48 -04:00
Mark Johnston
fffc1c594a vm_object: Release object swap charge in the swap pager destructor
With the removal of OBJT_DEFAULT, we can simply handle this in
swap_pager_dealloc().  No functional change intended.

Suggested by:	alc
Reviewed by:	alc, kib
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D35787
2022-07-17 07:09:48 -04:00
Mark Johnston
cb6757c0a6 swap_pager: Removing handling for objects with OBJ_SWAP clear
With the removal of OBJT_DEFAULT, we can assume that pager operations
provide an object with OBJ_SWAP set.  Also, we do not need to convert
objects from type OBJT_DEFAULT.  Thus, remove checks for OBJ_SWAP and
remove code which modifies the object type.  In some places, replace the
check for OBJ_SWAP with a check for whether any swap blocks are
assigned.

Reviewed by:	alc, kib
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D35786
2022-07-17 07:09:48 -04:00
Mark Johnston
5d32157d4e vm_object: Modify vm_object_allocate_anon() to return OBJT_SWAP objects
With this change, OBJT_DEFAULT objects are no longer allocated.
Instead, anonymous objects are always of type OBJT_SWAP and always have
OBJ_SWAP set.

Modify the page fault handler to check the swap block radix tree in
places where it checked for objects of type OBJT_DEFAULT.  In
particular, there's no need to invoke getpages for an OBJT_SWAP object
with no swap blocks assigned.

Reviewed by:	alc, kib
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D35785
2022-07-17 07:09:48 -04:00
Mark Johnston
eee9aab9cb vm_mmap: Remove obsolete code and comments from vm_mmap()
In preparation for removing OBJT_DEFAULT, eliminate some stale/unhelpful
comments from vm_mmap(), and remove an unused case.  In particular, the
remaining callers of vm_mmap() in the tree do not specify OBJT_DEFAULT.

It's much more common to use vm_map_find() to map an object into user
memory, so rather than adjusting vm_mmap() to handle OBJT_SWAP objects,
let's further discourage its use and simply remove OBJT_DEFAULT
handling.

Reviewed by:	dougm, alc, kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D35778
2022-07-13 09:39:26 -04:00
Mark Johnston
31508912d8 uma: Apply a missed piece of review feedback from D35738
Fixes:	93cd28ea82 ("uma: Use a taskqueue to execute uma_timeout()")
2022-07-13 09:30:00 -04:00
Mark Johnston
70b2996120 vm_map: Simplify a call to vm_object_allocate_anon()
vm_object_allocate_anon() automatically sets "charge" to 0 if no cred
reference is provided, so the caller doesn't need any conditional logic.

No functional change intended.

Reviewed by:	alc, kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D35781
2022-07-12 09:10:15 -04:00
Mark Johnston
e1979b45b6 vm_object: Assert that overcommit charge is released in the object dtor
Reviewed by:	alc, kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D35780
2022-07-12 09:10:15 -04:00
Mark Johnston
93cd28ea82 uma: Use a taskqueue to execute uma_timeout()
uma_timeout() has several responsibilities; it visits every UMA zone and
as of recently will drain underutilized caches, so is rather expensive
(>1ms in some cases).  Currently it is executed by softclock threads
and so will preempt most other CPU activity.  None of this work requires
a high scheduling priority, though, so defer it to a taskqueue so as to
avoid stalling higher-priority work.

Reviewed by:	rlibby, alc, mav, kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D35738
2022-07-11 15:58:43 -04:00
Mark Johnston
b57be759d0 vm_fault: Fix some nits in vm_fault_copy_entry()
- Correct the description (vm_fault_copy_entry() does not create a
  shadow object).
- Move some initialization and assertions out of the scope of the object
  locks, when doing so makes sense.
- Merge a pair of conditional blocks.
- Use __unused when appropriate.

No functional change intended.

Reviewed by:	alc
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2022-07-11 15:58:42 -04:00
Mark Johnston
e123264e4d vm: Fix racy checks for swap objects
Commit 4b8365d752 introduced the ability to dynamically register
VM object types, for use by tmpfs, which creates swap-backed objects.
As a part of this, checks for such objects changed from

  object->type == OBJT_DEFAULT || object->type == OBJT_SWAP

to

  object->type == OBJT_DEFAULT || (object->flags & OBJ_SWAP) != 0

In particular, objects of type OBJT_DEFAULT do not have OBJ_SWAP set;
the swap pager sets this flag when converting from OBJT_DEFAULT to
OBJT_SWAP.

A few of these checks are done without the object lock held.  It turns
out that this can result in false negatives since the swap pager
converts objects like so:

  object->type = OBJT_SWAP;
  object->flags |= OBJ_SWAP;

Fix the problem by adding explicit tests for OBJT_SWAP objects in
unlocked checks.

PR:		258932
Fixes:		4b8365d752 ("Add OBJT_SWAP_TMPFS pager")
Reported by:	bdrewery
Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D35470
2022-06-20 12:48:14 -04:00
Mark Johnston
540da48d83 vm_kern: Update KMSAN shadow maps when allocating kmem memory
This addresses a couple of false positive reports for memory returned by
malloc_large().

Sponsored by:	The FreeBSD Foundation
2022-06-20 12:48:13 -04:00
Mark Johnston
a932a5a649 uma: Mark zeroed slabs as initialized for KMSAN
Otherwise zone initializers can produce false positives, e.g., when
lock_init() attempts to detect double initialization.

Sponsored by:	The FreeBSD Foundation
2022-06-20 12:48:13 -04:00
Mark Johnston
1f88394b7f vm_fault: Avoid unnecessary object relocking in vm_fault_copy_entry()
Suggested by:	alc
Reviewed by:	alc, kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D35485
2022-06-14 18:19:07 -04:00
Mark Johnston
d0443e2b98 vm_fault: Fix a racy copy of page valid bits
We do not hold the object lock or a page busy lock when copying src_m's
validity state.  Prior to commit 45d72c7d7f we marked dst_m as fully
valid.

Use the source object's read lock to ensure that valid bits are not
concurrently cleared.

Reviewed by:	alc, kib
Fixes:		45d72c7d7f ("vm_fault_copy_entry: accept invalid source pages.")
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D35471
2022-06-14 18:18:09 -04:00
Mark Johnston
630f633f2a vm_object: Use the vm_object_(set|clear)_flag() helpers
... rather than setting and clearing flags inline.  No functional change
intended.

Reviewed by:	alc, kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D35469
2022-06-14 12:00:59 -04:00
Gordon Bergling
860740ae0f vm: Fix a common typo in a source code comment
- s/independant/independent/

MFC after:	3 days
2022-06-05 09:52:32 +02:00
Gordon Bergling
f77a88c855 vm_page: Fix a typo in a source code comment
- s/consistancy/consistency/

MFC after:	3 days
2022-06-04 12:52:22 +02:00
Doug Moore
fa8a6585c7 vm_phys: avoid waste in multipage allocation
In vm_phys_alloc_contig, for an allocation bigger than the size of any
buddy queue free block, avoid examining any maximum-size free block
more than twice, by only starting to consider a sequence of adjacent
max-blocks starting at a max-block that does not follow another
max-block.  If that first max-block follows adjacent blocks of smaller
size, and if together they provide enough memory to reduce by one the
number of max-blocks required for this allocation, use them as part of
this allocation.

Reviewed by:	markj
Tested by:	pho
Discussed with:	alc
Differential Revision:	https://reviews.freebsd.org/D34815
2022-04-26 02:56:23 -05:00
John Baldwin
52526922ac vm_phys_init: Quiet unused but set warnings about npages.
npages is used in two optional cases:

- to conditionally create a separate DMA32 free list

- to index vm_page_array for VM_PHYSSEG_SPARSE

Add in more #ifdef's around npages statements.

Reviewed by:	alc, markj
Differential Revision:	https://reviews.freebsd.org/D34887
2022-04-18 12:06:14 -07:00
Mark Johnston
f82177b8cf vm: Initialize the transient buffer mapping arena with M_WAITOK
The wait flag is passed to UMA when allocating boundary tags for the
initial span, and UMA expects either M_WAITOK or M_NOWAIT to be present.

Reported by:	cperciva
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2022-04-14 15:46:14 -04:00
Mark Johnston
6fb7c42d59 vm: Move the "vm_wait in early boot" assertion to the proper place
The assertion was added in commit 1771e987ca.  After that, vm_wait()
and friends were refactored such that the actual sleep happens
elsewhere.  Now the assertion condition is not checked when
vm_wait_doms() is called directly, and it is checked even if we are not
going to sleep (because vm_page_count_min_set(wdoms) is false).

Reviewed by:	alc, kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D34909
2022-04-14 15:45:54 -04:00
John Baldwin
b8ebd99aa5 vm: Use __diagused for variables only used in KASSERT(). 2022-04-13 16:08:20 -07:00
John Baldwin
40cbcb996c vm_fault_dontneed: Inline value of variable used once in an assertion. 2022-04-13 16:08:19 -07:00
Enji Cooper
567378cc07 Fix OID format for vm.swap_reserved and vm.swap_total
The correct OID format for CTLTYPE_U64 is `QU` (`uquad_t`), not `A`
(text expressed via `char *`).

This issue was noticed while doing an sysctl tree walk using a
sysctl(9) consumer that relies on the OID format to intuit what the
type should be for a given sysctl.

MFC after:	1 month
Sponsored by:	DellEMC Isilon
Differential Revision: https://reviews.freebsd.org/D34877
2022-04-10 18:17:09 -07:00
John Baldwin
2e7838ae84 vm_phys_early_alloc: mem_index is only used under #ifdef NUMA.
Possibly mem_index should just reuse biggestone since this loop is
already reusing biggestsize.
2022-04-08 17:25:13 -07:00
John Baldwin
a7e1a58554 uma_zfree_smr: uz_flags is only used if NUMA is defined. 2022-04-08 17:25:13 -07:00
Gordon Bergling
f167c46e79 memguard(9): Fix two typos in source code comments
- s/comparsion/comparison/

MFC after:	3 days
2022-04-02 13:51:27 +02:00
Peter Jeremy
9a89977bf6
kern: Fix typo in kassert message.
- s/unepxected/unexpected/
MFC after:	3 days
2022-04-02 21:36:17 +11:00
Doug Moore
557dc337e6 vm_phys: check small blocks to finish allocation
In vm_phys_alloc_queues_contig, in the case that a sequence of
max-order blocks are sought to fulfill an allocation, a sequence is
ruled out if it does not have enough max-order blocks to satisfy the
allocation. However, there may be smaller blocks of free memory that
follow the last max-order block in the sequence, and they may be big
enough to complete the allocation request, so check for that
possibility before giving up on that block sequence.

Reviewed by:	markj
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D34724
2022-03-31 16:19:55 -05:00
Doug Moore
342056fa1c vm_phys: alloc pages without duplicating searches.
In the search for contiguous pages, as each page segment is examined,
check to see if the free list set for the next page segment differs
from the set for the current segment, and avoid a pointless search if
they do not differ.

Discussed with:	alc
Reviewed by:	markj
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D33947
2022-03-31 01:40:46 -05:00
Mark Johnston
d53927b0ba uma: Don't allow a limit to be set in a warm zone
The limit accounting in UMA does not tolerate this.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2022-03-30 15:42:18 -04:00
Mark Johnston
54361f9020 uma: Use the correct type for a return value
zone_alloc_bucket() returns a pointer, not a bool.

MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2022-03-30 15:42:05 -04:00
Brooks Davis
b1ad6a9000 syscallarg_t: Add a type for system call arguments
This more clearly differentiates system call arguments from integer
registers and return values. On current architectures it has no effect,
but on architectures where pointers are not integers (CHERI) and may
not even share registers (CHERI-MIPS) it is necessiary to differentiate
between system call arguments (syscallarg_t) and integer register values
(register_t).

Obtained from:	CheriBSD

Reviewed by:	imp, kib
Differential Revision:	https://reviews.freebsd.org/D33780
2022-03-28 19:43:03 +01:00
Eric van Gyzen
490b09f240 uma_zalloc_domain: call uma_zalloc_debug in multi-domain path
It was only called in the non-NUMA and single-domain paths.
Some of its assertions were duplicated in uma_zalloc_domain,
but some things were missed, especially memguard.

Reviewed by:	markj, rstone
MFC after:	1 week
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D34472
2022-03-25 20:10:38 -05:00
Eric van Gyzen
a8cbb835bf uma_zalloc: assert M_NOWAIT ^ M_WAITOK
The uma_zalloc functions expect exactly one of [M_NOWAIT, M_WAITOK].
If neither or both are passed, print an error and a stack dump.
Only do this ten times, to prevent livelock.  In the future, after
this exposes enough bad callers, this will be changed to a KASSERT().

Reviewed by:	rstone, markj
MFC after:	1 month
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D34452
2022-03-25 20:10:37 -05:00
Eric van Gyzen
cfbb5f8ce0 vm_ksubmap_init: pass M_WAITOK to vmem_init -> uma_zalloc_arg
uma_zalloc_arg expects exactly one of the two WAIT flags.  A future
commit will assert this.

Reviewed by:	rstone
MFC after:	1 month
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D34450
2022-03-25 20:10:37 -05:00
Mateusz Guzik
bb92cd7bcd vfs: NDFREE(&nd, NDF_ONLY_PNBUF) -> NDFREE_PNBUF(&nd) 2022-03-24 10:20:51 +00:00
Mark Johnston
389a3fa693 uma: Add UMA_ZONE_UNMANAGED
Allow a zone to opt out of cache size management.  In particular,
uma_reclaim() and uma_reclaim_domain() will not reclaim any memory from
the zone, nor will uma_timeout() purge cached items if the zone is idle.
This effectively means that the zone consumer has control over when
items are reclaimed from the cache.  In particular, uma_zone_reclaim()
will still reclaim cached items from an unmanaged zone.

Reviewed by:	hselasky, kib
MFC after:	3 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D34142
2022-02-15 09:25:34 -05:00
John Baldwin
becaf6433b Use vmspace->vm_stacktop in place of sv_usrstack in more places.
Reviewed by:	markj
Obtained from:	CheriBSD
Differential Revision:	https://reviews.freebsd.org/D34174
2022-02-14 10:57:30 -08:00
Konstantin Belousov
b51927b7b0 Revert "vm_pageout_scans: correct detection of active object"
This reverts commit 3de96d664a.

Problem is that it is possible to reach the state with ref_count ==
1 for the mapped non-anonymous object. For instance, anonymous posix
shmfd or linux shmfs object could be mapped, and then corresponding
file descriptor closed, dropping the object reference owned by the
shmfd/shmfs file.  Then the check in inactive scan assumes that the
object and page are not mapped and frees the page, while they are not.

PR:	261707
Discussed with:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	now
2022-02-10 16:55:10 +02:00
Robert Wing
c9e023541a pbuf_ctor(): lock the buffer with LK_NOWAIT
This LOR happens when reading from a file backed MD device:

lock order reversal:
 1st 0xfffffe00431eaac0 pbufwait (pbufwait, lockmgr) @ /cobra/src/sys/vm/vm_pager.c:471
 2nd 0xfffff80003f17930 ufs (ufs, lockmgr) @ /cobra/src/sys/dev/md/md.c:977
lock order pbufwait -> ufs attempted at:
    #0 0xffffffff80c78ead at witness_checkorder+0xbdd
    #1 0xffffffff80bd6a52 at lockmgr_lock_flags+0x182
    #2 0xffffffff80f52d5c at ffs_lock+0x6c
    #3 0xffffffff80d0f3f4 at _vn_lock+0x54
    #4 0xffffffff80708629 at mdstart_vnode+0x499
    #5 0xffffffff807060ec at md_kthread+0x20c
    #6 0xffffffff80bbfcd0 at fork_exit+0x80
    #7 0xffffffff810b809e at fork_trampoline+0xe

This LOR was previously blessed by witness before commit 531f8cfea0
("Use dedicated lock name for pbufs").

Instead of blessing ufs and pbufwait, use LK_NOWAIT to prevent recording
the lock order. LK_NOWAIT will be a nop here as the lock is dropped in
pbuf_dtor(). The takes the same approach as 5875b94c74 ("buf_alloc():
lock the buffer with LK_NOWAIT").

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D34183
2022-02-07 10:05:20 -09:00
Konstantin Belousov
0b8643eaf6 vmmeter(): Fix detection of the named swap objects
Noted and reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D33549
2022-02-02 11:39:58 +02:00
Konstantin Belousov
4cf9f5d807 vm_object: restore handling of shadow_count for all type of objects
instead of only OBJ_ANON objects that are backing, as it is now.
This is required for e.g. vm_meter is_object_active() detection, and
should be useful in some more cases.

Use refcount KPI for all objects, regardless of owning the object lock,
and the fact that currently OBJ_ANON cannot change for the live object.

Noted and reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D33549
2022-02-02 11:39:51 +02:00
Konstantin Belousov
d950c5898a vm/vm_extern.h, vm/vm_page.h: use sys/kassert.h
instead of fatty sys/systm.h.

Suggested by:	jhb
Reviewed by:	alc, imp, jhb (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D34089
2022-02-01 05:55:35 +02:00
Konstantin Belousov
f4cdb9d7c3 vm/vm_pager.h: use sys/systm.h header
it is needed for __read_mostly attribute definition, which right now
comes from vm/vm_page.h including sys/systm.h

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D34089
2022-02-01 05:55:35 +02:00
Konstantin Belousov
531f8cfea0 Use dedicated lock name for pbufs
Also remove a pointer to array variable, use array address directly.

Reviewed by:	markj, mckusick
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D34072
2022-01-31 04:46:14 +02:00
John Baldwin
29d481ae6a Make <vm/vm_extern.h> more self-contained.
Add a nested include of <sys/systm.h> for recently added assertions.
Without this, existing code (such as in drm-kmod) needs to be patched
to add the newly required header.

While here, rewrite the assertions using KASSERT().

Reviewed by:	dougm, alc, imp, kib
Differential Revision:	https://reviews.freebsd.org/D34070
2022-01-28 13:14:03 -08:00
Konstantin Belousov
3de96d664a vm_pageout_scans: correct detection of active object
For non-anonymous swap objects, there is always a reference from the
owner to the object to keep it from recycling.  Account for it when
deciding should we query pmap for hardware active references for the
page.

As result, we avoid unneeded calls to pmap_ts_referenced(), which for
non-mapped page means avoiding unneccessary lock and unlock of the pv list.

Reviewed by:	markj
Discussed with:	alc
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D33924
2022-01-22 19:34:32 +02:00
Doug Moore
0ce7909cd0 vm_phys: add essential segment bounds check
A lower-bound segment check is necessary in vm_phys_alloc_seg_contig.
Add one.

Reported by:	jenkins
Reviewed by:	alc
Fixes:	da92ecbc0d vm_phys: fix seg->end test in alloc_seg_contig
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D33945
2022-01-19 00:42:39 -06:00
Doug Moore
da92ecbc0d vm_phys: fix seg->end test in alloc_seg_contig
In vm_phys_alloc_seg_contig, in allocating multiple memory blocks for
a huge allocation, ensure that the end of the allocated range does not
exceed the upper segment limit.

Reorder a couple of checks to improve code layout.

Reviewed by:	alc
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D33870
2022-01-18 12:49:09 -06:00
Mark Johnston
46d35d415a fork: Copy the vm_stacktop field into the new vmspace
Fixes:	1811c1e957 ("exec: Reimplement stack address randomization")
Reported by:	pho
Reported by:	syzbot+0446312a51bc13ead834@syzkaller.appspotmail.com
Sponsored by:	The FreeBSD Foundation
2022-01-18 10:51:49 -05:00
Mark Johnston
1811c1e957 exec: Reimplement stack address randomization
The approach taken by the stack gap implementation was to insert a
random gap between the top of the fixed stack mapping and the true top
of the main process stack.  This approach was chosen so as to avoid
randomizing the previously fixed address of certain process metadata
stored at the top of the stack, but had some shortcomings.  In
particular, mlockall(2) calls would wire the gap, bloating the process'
memory usage, and RLIMIT_STACK included the size of the gap so small
(< several MB) limits could not be used.

There is little value in storing each process' ps_strings at a fixed
location, as only very old programs hard-code this address; consumers
were converted decades ago to use a sysctl-based interface for this
purpose.  Thus, this change re-implements stack address randomization by
simply breaking the convention of storing ps_strings at a fixed
location, and randomizing the location of the entire stack mapping.
This implementation is simpler and avoids the problems mentioned above,
while being unlikely to break compatibility anywhere the default ASLR
settings are used.

The kern.elfN.aslr.stack_gap sysctl is renamed to kern.elfN.aslr.stack,
and is re-enabled by default.

PR:		260303
Reviewed by:	kib
Discussed with:	emaste, mw
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D33704
2022-01-17 16:12:36 -05:00
Mark Johnston
a04ce833f9 uma: Avoid polling for an invalid SMR sequence number
Buckets in an SMR-enabled zone can legitimately be tagged with
SMR_SEQ_INVALID.  This effectively means that the zone destructor (if
any) was invoked on all items in the bucket, and the contained memory is
safe to reuse.  If the first bucket in the full bucket list was tagged
this way, UMA would unnecessarily poll per-CPU state before attempting
to fetch a full bucket from the list.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2022-01-14 15:38:02 -05:00
Mark Johnston
4a864f624a vm_pageout: Print a more accurate message to the console before an OOM kill
Previously we'd always print "out of swap space."  This can be
misleading, as there are other reasons an OOM kill can be triggered.  In
particular, it's entirely possible to trigger an OOM kill on a system
with plenty of free swap space.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D33810
2022-01-14 15:04:21 -05:00
Brooks Davis
0910a41ef3 Revert "syscallarg_t: Add a type for system call arguments"
Missed issues in truss on at least armv7 and powerpcspe need to be
resolved before recommit.

This reverts commit 3889fb8af0.
This reverts commit 1544e0f5d1.
2022-01-12 23:29:20 +00:00
Brooks Davis
1544e0f5d1 syscallarg_t: Add a type for system call arguments
This more clearly differentiates system call arguments from integer
registers and return values. On current architectures it has no effect,
but on architectures where pointers are not integers (CHERI) and may
not even share registers (CHERI-MIPS) it is necessiary to differentiate
between system call arguments (syscallarg_t) and integer register values
(register_t).

Obtained from:	CheriBSD

Reviewed by:	imp, kib
Differential Revision:	https://reviews.freebsd.org/D33780
2022-01-12 22:51:25 +00:00
Doug Moore
84e2ae64c5 vm_reserv: use enhanced bitstring for popmaps
vm_reserv.c uses its own bitstring implemenation for popmaps. Using
the bitstring_t type from a standard header eliminates the code
duplication, allows some bit-at-a-time operations to be replaced with
more efficient bitstring range operations, and, in
vm_reserv_test_contig, allows bit_ffc_area_at to more efficiently
search for a big-enough set of consecutive zero-bits.

Make bitstring changes improve the vm_reserv code.  Define a bit_ntest
method to test whether a range of bits is all set, or all clear.
Define bit_ff_at and bit_ff_area_at to implement the ffs and ffc
versions with a parameter to choose between set- and clear- bits.
Improve the area_at implementation.  Modify the bit_nset and
bit_nclear implementations to allow code optimization in the cases
when start or end are multiples of _BITSTR_BITS.

Add a few new cases to bitstring_test.

Discussed with:	alc
Reviewed by:	markj
Tested by:	pho (earlier version)
Differential Revision:	https://reviews.freebsd.org/D33312
2022-01-12 11:03:53 -06:00
Mark Johnston
c4a25e0713 vm_pageout: Group sysctl variables together with sysctl definitions
Fix some style bugs while here.  No functional change intended.

Reviewed by:	alc, kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D33811
2022-01-11 09:27:45 -05:00
Mark Johnston
43b3b8e52d swap_pager: uma_zcreate() doesn't fail
Remove always-false checks for UMA zone creation failure.  No functional
change intended.

Reviewed by:	alc, kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D33809
2022-01-11 09:27:45 -05:00
Doug Moore
ae13829ddc vm_addr_ok: add power2 invariant check
With INVARIANTS defined, have vm_addr_align_ok and vm_addr_bound_ok
panic when passed an alignment/boundary parameter that is not a power
of two.

Reviewed by:	alc
Suggested by:	kib, se
Differential Revision:	https://reviews.freebsd.org/D33725
2022-01-10 01:17:25 -06:00
Konstantin Belousov
c25a30e255 Dump page tracking no longer needed on mips
Reviewed by:	imp
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D33763
2022-01-06 06:00:39 +02:00
Konstantin Belousov
f54882a862 Remove special kstack allocation code for mips.
The arch required two-pages alignment due to single TLB entry caching
two consequtive mappings.

Reviewed by:	imp
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D33763
2022-01-06 04:43:56 +02:00
Doug Moore
f76916c095 vm_reserv: #include vm_extern.h explicitly, for arm.
Fixes:	c606ab59e7 vm_extern: use standard address checkers everywhere
2021-12-31 00:40:25 -06:00
Doug Moore
e6930b1c5f vm_phys: convert error back to warning
Move an assignment back to where it was before, to turn the
defined-but-not-used error back into a set-but-not-used warning.

Fixes:	01e115ab83 vm_phys: #include vm_extern
2021-12-31 00:23:46 -06:00
Doug Moore
01e115ab83 vm_phys: #include vm_extern
Arm64 and powerpc don't include vm_extern.h indirectly in vm_phys.c, which
means that for the sake of those architectures, it must be included explicitly.

Also, fix a set-unused warning that jenkins also found.

Reported by:	Jenkins
Fixes:	c606ab59e7 vm_extern: use standard address checkers everywhere
2021-12-30 23:31:18 -06:00
Doug Moore
c606ab59e7 vm_extern: use standard address checkers everywhere
Define simple functions for alignment and boundary checks and use them
everywhere instead of having slightly different implementations
scattered about. Define them in vm_extern.h and use them where
possible where vm_extern.h is included.

Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D33685
2021-12-30 22:09:08 -06:00
Gleb Smirnoff
841e0a8757 uma: with KTR trace allocs/frees from SMR zones 2021-12-29 23:08:33 -08:00
Gleb Smirnoff
28782f73df uma: with KTR report item being freed in uma_zfree_arg() 2021-12-29 23:08:15 -08:00
Doug Moore
8119cdd38b vm_phys: hide vm_phys_set_pool
It is only called in the file that defines it, so make it static and
remove the declaration from the header.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D33688
2021-12-29 11:17:33 -06:00
John Baldwin
d90e41a154 sys/vm: Use C99 fixed-width integer types.
No functional change.

Reviewed by:	imp, kib, emaste
Differential Revision:	https://reviews.freebsd.org/D33641
2021-12-28 09:43:21 -08:00
Doug Moore
49fd2d51f0 vm_reserv: fix zero-boundary error
Handle specially the boundary==0 case of vm_reserv_reclaim_config,
by turning off boundary adjustment in that case.

Reviewed by:	alc
Tested by:	pho, madpilot
2021-12-26 11:40:27 -06:00
Doug Moore
4bae154fe8 vm_page: Move a comment
fb38b29b56 (page_alloc_br) vm_page: Remove extra test, dup code from page alloc
should have moved a comment block when it moved the function call that followed it.

Move the comment block now.
2021-12-24 16:10:30 -06:00
Doug Moore
0d5fac2872 vm: alloc pages from reserv before breaking it
Function vm_reserv_reclaim_contig breaks a reservation with enough
free space to satisfy an allocation request and returns the free space
to the buddy allocator. Change the function to allocate the request
memory from the reservation before breaking it, and return that memory
to the caller. That avoids a second call to the buddy allocator and
guarantees successful allocation after breaking the reservation, where
that success is not currently guaranteed.

Reviewed by:	alc, kib (previous version)
Differential Revision:	https://reviews.freebsd.org/D33644
2021-12-24 12:59:16 -06:00
Doug Moore
184c63db3c Fix clerical error in page alloc
Fix a very recent change that introduced a page accounting error in
case of a reserveration being broken.
Reviewed by:	alc
Fixes:	fb38b29b56 (page_alloc_br) vm_page: Remove extra test, dup code from page alloc
Differential Revision:	https://reviews.freebsd.org/D33645
2021-12-24 02:47:21 -06:00
Doug Moore
fb38b29b56 vm_page: Remove extra test, dup code from page alloc
Extract code common to functions vm_page_alloc_contig_domain and
vm_page_alloc_noobj_contig_domain into a new function.  Do so in a way
that eliminates a bound-to-fail reservation test after a reservation
is broken by a call from vm_page_alloc_contig_domain.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D33551
2021-12-23 22:45:47 -06:00
Stephen J. Kiernan
18048b6e3c Eliminate key press requirement "show vmopag" command output.
Summary:
One was required to press a key to continue after every 18 lines of
output. This requirement had been in the "show vmopag" command since it
was introduced, which was many years before paging was added to DDB.
With paging, this explict key check is no longer necessary.

Obtained from:	Juniper Networks, Inc.
MFC after:	1 week

Test Plan:
Run "show vmopag" from db> prompt and see that it does not need additional
keypresses other than the ones needed for the pager.

Subscribers: imp, #contributor_reviews_base

Differential Revision: https://reviews.freebsd.org/D33550
2021-12-19 19:40:52 -05:00
Rick Macklem
cd37afd8b6 vm_object: Make is_object_active() global
Commit 867c27c23a modified the NFS client so that
it does IO_APPEND writes directly to the NFS server,
bypassing the buffer cache.  However, this could result
in stale data in client pages when the file is mmap(2)'d.
As such, the NFS client needs to call is_object_active()
to check if the file is mmap(2)'d.

This patch renames is_object_active() to vm_object_is_active(),
moves it to sys/vm/vm_object.c and makes it global, so that
the NFS client can call it in a future commit.

Reviewed by:	kib
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D33520
2021-12-19 16:11:44 -08:00
Doug Moore
f7aa44763d Correct type size format error in KASSERT.
Reported by:	jenkins
Fixes:	6f1c890827 vm: Don't break vm reserv that can't meet align reqs
2021-12-16 13:48:58 -06:00
Doug Moore
6f1c890827 vm: Don't break vm reserv that can't meet align reqs
Function vm_reserv_test_contig has incorrectly used its alignment
and boundary parameters to find a well-positioned range of empty pages
in a reservation.  Consequently, a reservation could be broken
mistakenly when it was unable to provide a satisfactory set of pages.

Rename the function, correct the errors, and add assertions to detect
the error in case it appears again.

Reviewed by:	alc, markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D33344
2021-12-16 12:20:56 -06:00
Mark Johnston
88642d978a vm_fault: Fix vm_fault_populate()'s handling of VM_FAULT_WIRE
vm_map_wire() works by calling vm_fault(VM_FAULT_WIRE) on each page in
the rage.  (For largepage mappings, it calls vm_fault() once per large
page.)

A pager's populate method may return more than one page to be mapped.
If VM_FAULT_WIRE is also specified, we'd wire each page in the run, not
just the fault page.  Consider an object with two pages mapped in a
vm_map_entry, and suppose vm_map_wire() is called on the entry.  Then,
the first vm_fault() would allocate and wire both pages, and the second
would encounter a valid page upon lookup and wire it again in the
regular fault handler.  So the second page is wired twice and will be
leaked when the object is destroyed.

Fix the problem by modify vm_fault_populate() to wire only the fault
page.  Also modify the error handler for pmap_enter(psind=1) to not test
fs->wired, since it must be false.

PR:		260347
Reviewed by:	alc, kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D33416
2021-12-14 15:10:46 -05:00
Konstantin Belousov
5346570276 swapoff: add one more variant of the syscall
Requested and reviewed by:	brooks
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D33343
2021-12-09 02:48:46 +02:00
Doug Moore
9f32cb5b1c Set uninitialized popmap bits in vm_reserv_init
In vm_reserv_init, set all the marker popmap bits in vm_reserv_init,
and not just the bits of the first popmap entry.

Reviewed by:	markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D33258
2021-12-05 17:17:25 -06:00
Gleb Smirnoff
2cb67bd798 uma: remove unused *item argument from cache_free()
Reviewed by:		markj
Differential revision:	https://reviews.freebsd.org/D33272
2021-12-05 10:44:47 -08:00
Mark Johnston
39a7396f5d vm_page: Tighten the object lock assertion in vm_page_invalid()
A page must not become invalid while vm_fault_soft_fast() is attempting
to map unbusied pages for reading.

Note that all callers hold the object write lock already, and
vm_page_set_invalid() asserts the object write lock.

Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D33250
2021-12-05 10:51:11 -05:00
Konstantin Belousov
e8dc2ba29c swapoff(2): add a SWAPOFF_FORCE flag
The flag requests skipping the heuristic which tries to avoid leaving
system with more allocated memory than available from RAM and remanining
swap.

Reviewed by:	markj
Discussed with:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D33165
2021-12-05 00:20:58 +02:00
Konstantin Belousov
a4e4132fa3 swapoff(2): replace special device name argument with a structure
For compatibility, add a placeholder pointer to the start of the
added struct swapoff_new_args, and use it to distinguish old vs. new
style of syscall invocation.

Reviewed by:	markj
Discussed with:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D33165
2021-12-05 00:20:58 +02:00
Konstantin Belousov
6df359449f swap_pager.c: Remove MPSAFE and ARGSUSED annotations
Reviewed by:	markj
Discussed with:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D33165
2021-12-05 00:20:58 +02:00
Konstantin Belousov
0190c38b9d swapoff_one(): only check free pages count manually turning swap off
When swap is turned off due to system shutdown or reboot, ignore the
check.  Problem is that the check is not accurate by any means, free
page count can legitimately be low while system still able to page in
everything from the swap.  Then, we turn swap off if swapping on
real file or some non-standard geom provider, and typically panic
when system appears to actually need to unavailable page.

For syscall, it is better to be safe than sorry.

Reported and tested by:	peterj
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D33147
2021-11-29 18:38:02 +02:00
Mateusz Guzik
7e1d3eefd4 vfs: remove the unused thread argument from NDINIT*
See b4a58fbf64 ("vfs: remove cn_thread")

Bump __FreeBSD_version to 1400043.
2021-11-25 22:50:42 +00:00
Konstantin Belousov
b19740f4ce swap_pager: lock vnode in swapdev_strategy()
VOP_STRATEGY() requires locked vnode.  Note that we lock the swap vnode
while pages are busy, but this would only cause real LoR if pages belong
to the swap vnode, which must not be the case for correct use.

Reported and tested by:	peterj
Reviewed by:	markj
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D33119
2021-11-25 21:34:50 +02:00
Konstantin Belousov
6ddf41faa6 swapon: extend the region where the swap vnode is locked
to cover VOP_GETATTR() call in sys_swapon().  Move locking from inside
swapongeom() and swaponvp() into sys_swapon().

Reported by and tested by:	peterj
Reviewed by:	markj
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D33119
2021-11-25 21:34:44 +02:00
Konstantin Belousov
a6d04f34a4 swap pager: lock vnode around VOP_CLOSE()
Reported and tested by:	peterj
Reviewed by:	markj
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D33119
2021-11-25 21:34:39 +02:00
Mark Johnston
d47d3a94bb vm_fault: Factor out per-object operations into vm_fault_object()
No functional change intended.

Obtained from:	jeff (object_concurrency patches)
Reviewed by:	kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D33018
2021-11-24 14:02:56 -05:00
Mark Johnston
f1b642c255 vm_fault: Introduce a fault_status enum for internal return types
Rather than overloading the meanings of the Mach statuses, introduce a
new set for use internally in the fault code.  This makes the control
flow easier to follow and provides some extra error checking when a
fault status variable is used in a switch statement.

vm_fault_lookup() and vm_fault_relookup() continue to use Mach statuses
for now, as there isn't much benefit to converting them and they
effectively pass through a status from vm_map_lookup().

Obtained from:	jeff (object_concurrency patches)
Reviewed by:	kib
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D33017
2021-11-24 14:02:55 -05:00
Mark Johnston
45c09a74d6 vm_fault: Move nera into faultstate
This makes it easier to factor out pieces of vm_fault().  No functional
change intended.

Obtained from:	jeff (object_concurrency patches)
Reviewed by:	kib
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D33016
2021-11-24 14:02:55 -05:00
Mitchell Horne
10fe6f80a6 minidump: Use the provided dump bitset
When constructing the set of dumpable pages, use the bitset provided by
the state argument, rather than assuming vm_page_dump invariably. For
normal kernel minidumps this will be a pointer to vm_page_dump, but when
dumping the live system it will not.

To do this, the functions in vm_dumpset.h are extended to accept the
desired bitset as an argument. Note that this provided bitset is assumed
to be derived from vm_page_dump, and therefore has the same size.

Reviewed by:	kib, markj, jhb
MFC after:	2 weeks
Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D31992
2021-11-19 15:05:52 -04:00
Brooks Davis
01ce7fca44 ommap: fix signed len and pos arguments
4.3 BSD's mmap took an int len and long pos.  Reject negative lengths
and in freebsd32 sign-extend pos correctly rather than mis-handling
negative positions as large positive ones.

Reviewed by:	kib
2021-11-15 18:34:28 +00:00
Mark Johnston
d28af1abf0 vm: Add a mode to vm_object_page_remove() which skips invalid pages
This will be used to break a deadlock in ZFS between the per-mountpoint
teardown lock and page busy locks.  In particular, when purging data
from the page cache during dataset rollback, we want to avoid blocking
on the busy state of invalid pages since the busying thread may be
blocked on the teardown lock in zfs_getpages().

Add a helper, vn_pages_remove_valid(), for use by filesystems.  Bump
__FreeBSD_version so that the OpenZFS port can make use of the new
helper.

PR:		258208
Reviewed by:	avg, kib, sef
Tested by:	pho (part of a larger patch)
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32931
2021-11-15 13:01:30 -05:00
Mark Johnston
a2665158d0 vm_page: Remove vm_page_sbusy() and vm_page_xbusy()
They are unused today and cannot be safely used in the face of unlocked
lookup, in which pages may be busied without the object lock held.

Obtained from:	jeff (object_concurrency patches)
Reviewed by:	kib
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D32948
2021-11-15 13:01:30 -05:00
Mark Johnston
87b646630c vm_page: Consolidate page busy sleep mechanisms
- Modify vm_page_busy_sleep() and vm_page_busy_sleep_unlocked() to take
  a VM_ALLOC_* flag indicating whether to sleep on shared-busy, and fix
  up callers.
- Modify vm_page_busy_sleep() to return a status indicating whether the
  object lock was dropped, and fix up callers.
- Convert callers of vm_page_sleep_if_busy() to use vm_page_busy_sleep()
  instead.
- Remove vm_page_sleep_if_(x)busy().

No functional change intended.

Obtained from:	jeff (object_concurrency patches)
Reviewed by:	kib
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D32947
2021-11-15 13:01:30 -05:00
Mark Johnston
b0acc3f11b vm_pager: Optimize an assertion
Obtained from:	jeff (object_concurrency patches)
Reviewed by:	kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D32946
2021-11-15 13:01:30 -05:00
Mark Johnston
e4bdb6857a vm_page: Handle VM_ALLOC_NORECLAIM in the contiguous page allocator
We added _NORECLAIM to request that kmem_alloc_contig_pages() not spend
time scanning physical memory for candidates to reclaim.  In some
situations the scanning can induce large amounts of undesirable latency,
and it's less important that the request be satisfied than it is that we
not spend many milliseconds scanning.

The problem extends to vm_reserv_reclaim_contig(), which unlike
vm_reserv_reclaim() may have to scan the entire list of partially
populated reservations.  Use VM_ALLOC_NORECLAIM to request that this
scan not be executed.[1]

As a side effect, this fixes a regression in 02fb0585e7 ("vm_page:
Drop handling of VM_ALLOC_NOOBJ in vm_page_alloc_contig_domain()")
where VM_ALLOC_CONTIG was not included in VPAC_FLAGS or VPANC_FLAGS even
though it is not masked by kmem_alloc_contig_pages().[2]

Reported by:	gallatin [1], glebius [2]
Reviewed by:	alc, glebius, kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32899
2021-11-11 14:26:41 -05:00
Gordon Bergling
c28e39c3d6 Fix a common typo in syctl descriptions
- s/maxiumum/maximum/

MFC after:	3 days
2021-11-03 20:49:24 +01:00
Mark Johnston
7585c5db25 uma: Fix handling of reserves in zone_import()
Kegs with no items reserved have uk_reserve = 0.  So the check
keg->uk_reserve >= dom->ud_free_items will be true once all slabs are
depleted.  Then, rather than go and allocate a fresh slab, we return to
the cache layer.

The intent was to do this only when the keg actually has a reserve, so
modify the check to verify this first.  Another approach would be to
make uk_reserve signed and set it to -1 until uma_zone_reserve() is
called, but this requires a few casts elsewhere.

Fixes:	1b2dcc8c54 ("uma: Avoid depleting keg reserves when filling a bucket")
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32516
2021-11-01 09:51:43 -04:00
Mark Johnston
fab343a716 uma: Improve M_USE_RESERVE handling in keg_fetch_slab()
M_USE_RESERVE is used in a couple of places in the VM to avoid unbounded
recursion when the direct map is not available, as is the case on 32-bit
platforms or when certain kernel sanitizers (KASAN and KMSAN) are
enabled.  For example, to allocate KVA, the kernel might allocate a
kernel map entry, which might require a new slab, which requires KVA.

For these zones, we use uma_prealloc() to populate a reserve of items,
and then in certain serialized contexts M_USE_RESERVE can be used to
guarantee a successful allocation.  uma_prealloc() allocates the
requested number of items, distributing them evenly among NUMA domains.
Thus, in a first-touch zone, to satisfy an M_USE_RESERVE allocation we
might have to check the slab lists of other domains than the current one
to provide the semantics expected by consumers.

So, try harder to find an item if M_USE_RESERVE is specified and the keg
doesn't have anything for current (first-touch) domain.  Specifically,
fall back to a round-robin slab allocation.  This change fixes boot-time
panics on NUMA systems with KASAN or KMSAN enabled.[1]

Alternately we could have uma_prealloc() allocate the requested number
of items for each domain, but for some existing consumers this would be
quite wasteful.  In general I think keg_fetch_slab() should try harder
to find free slabs in other domains before trying to allocate fresh
ones, but let's limit this to M_USE_RESERVE for now.

Also fix a separate problem that I noticed: in a non-round-robin slab
allocation with M_WAITOK, rather than sleeping after a failed slab
allocation we simply try again.  Call vm_wait_domain() before retrying.

Reported by:	mjg, tuexen [1]
Reviewed by:	alc
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32515
2021-11-01 09:51:18 -04:00
Konstantin Belousov
350fc36b4c sysctl vm.objects: yield if hog
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D31163
2021-10-25 20:34:02 +03:00
Konstantin Belousov
7738118e9a vm.objects_swap: disable reporting some information
For making the call faster, do not count active/inactive object queues,
and do not report vnode info if any (for tmpfs).

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D31163
2021-10-25 20:34:01 +03:00
Konstantin Belousov
42812ccc96 Add vm.swap_objects sysctl
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D31163
2021-10-25 20:34:01 +03:00
Konstantin Belousov
1b610624fd vm_object_list: split sysctl handler in separate function
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D31163
2021-10-25 20:34:01 +03:00
Mark Johnston
d7acbe481d vm_page: Break reservations to handle noobj allocations
vm_reserv_reclaim_*() will release pages to the default freepool, not
the direct freepool from which noobj allocations are drawn.  But if both
pools are empty, the noobj allocator variants must break reservations to
make progress.

Reported by:	cy
Reviewed by:	kib (previous version)
Fixes:	b498f71bc5 ("vm_page: Add a new page allocator interface for unnamed pages")
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32592
2021-10-22 09:25:59 -04:00
Mark Johnston
a9d6f1fe0a Remove some remaining references to VM_ALLOC_NOOBJ
Reviewed by:	alc, kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32037
2021-10-19 21:22:56 -04:00
Mark Johnston
b801c79dda vm_fault: Stop specifying VM_ALLOC_ZERO
Now vm_page_alloc() and friends will unconditionally preserve PG_ZERO,
so there is no point in setting this flag.

Eliminate a local variable and add a comment explaining why we
prioritize the allocation when the process is doomed.

No functional change intended.

Reviewed by:	kib, alc
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32036
2021-10-19 21:22:56 -04:00
Mark Johnston
02fb0585e7 vm_page: Drop handling of VM_ALLOC_NOOBJ in vm_page_alloc_contig_domain()
As in vm_page_alloc_domain_after(), unconditionally preserve PG_ZERO.

Implement vm_page_alloc_noobj_contig_domain().

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32034
2021-10-19 21:22:56 -04:00
Mark Johnston
c40cf9bc62 vm_page: Stop handling VM_ALLOC_NOOBJ in vm_page_alloc_domain_after()
This makes the allocator simpler since it can assume object != NULL.
Also modify the function to unconditionally preserve PG_ZERO, so
VM_ALLOC_ZERO is effectively ignored (and still must be implemented by
the caller for now).

Reviewed by:	alc, kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32033
2021-10-19 21:22:56 -04:00
Mark Johnston
84c3922243 Convert consumers to vm_page_alloc_noobj_contig()
Remove now-unneeded page zeroing.  No functional change intended.

Reviewed by:	alc, hselasky, kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32006
2021-10-19 21:22:56 -04:00
Mark Johnston
92db9f3bb7 Introduce vm_page_alloc_noobj_contig()
This is the same as vm_page_alloc_noobj(), but allocates physically
contiguous runs of memory.  For now it is implemented in terms of
vm_page_alloc_contig(), with the difference that
vm_page_alloc_noobj_contig() implements VM_ALLOC_ZERO by zeroing the
page.

Reviewed by:	alc, kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32005
2021-10-19 21:22:56 -04:00
Mark Johnston
a4667e09e6 Convert vm_page_alloc() callers to use vm_page_alloc_noobj().
Remove page zeroing code from consumers and stop specifying
VM_ALLOC_NOOBJ.  In a few places, also convert an allocation loop to
simply use VM_ALLOC_WAITOK.

Similarly, convert vm_page_alloc_domain() callers.

Note that callers are now responsible for assigning the pindex.

Reviewed by:	alc, hselasky, kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31986
2021-10-19 21:22:56 -04:00
Mark Johnston
b498f71bc5 vm_page: Add a new page allocator interface for unnamed pages
The diff adds vm_page_alloc_noobj() and vm_page_alloc_noobj_domain().
These mostly correspond to vm_page_alloc() and vm_page_alloc_domain()
when no VM object is specified, with the exception that they handle
VM_ALLOC_ZERO by zeroing the page, rather than by preserving PG_ZERO.

This simplifies callers and will permit simplification of the
vm_page_alloc_domain() definition.

Since the new allocator variant is similar to vm_page_alloc_freelist(),
implement both of them using a common backend allocator function.  No
functional change intended.

Reviewed by:	alc, kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31985
2021-10-19 21:22:55 -04:00
Mark Johnston
a23e6a1078 vm_page: Move vm_page_alloc_check() to after page allocator definitions
This way all of the vm_page_alloc_*() allocator functions are grouped
together.

MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-10-19 21:22:50 -04:00
Edward Tomasz Napierala
0f559a9f09 Make vmdaemon timeout configurable
Make vmdaemon timeout configurable, so that one can adjust
how often it runs.

Here's a trick: set this to 1, then run 'limits -m 0 sh',
then run whatever you want with 'ktrace -it XXX', and observe
how the working set changes over time.

Reviewed By:	kib
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D22038
2021-10-17 13:49:29 +01:00
Dawid Gorecki
889b56c8cd setrlimit: Take stack gap into account.
Calling setrlimit with stack gap enabled and with low values of stack
resource limit often caused the program to abort immediately after
exiting the syscall. This happened due to the fact that the resource
limit was calculated assuming that the stack started at sv_usrstack,
while with stack gap enabled the stack is moved by a random number
of bytes.

Save information about stack size in struct vmspace and adjust the
rlim_cur value. If the rlim_cur and stack gap is bigger than rlim_max,
then the value is truncated to rlim_max.

PR: 253208
Reviewed by: kib
Obtained from: Semihalf
Sponsored by: Stormshield
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D31516
2021-10-15 10:21:47 +02:00
Warner Losh
cdccd11b36 forward declare struct thread
sys/sysctl.h moved struct thread forward declaration under #ifdef
_KERNEL and so this header fails when included from userland. Add a
forward declaration here.

Fixes:	     		99eefc727e
Sponsored by:		Netflix
2021-10-11 12:59:39 -06:00
Konstantin Belousov
174aad047e vm_fault: do not trigger OOM too early
Wakeup in vm_waitpfault() does not mean that the thread would get the
page on the next vm_page_alloc() call, other thread might steal the free
page we were waiting for. On the other hand, this wakeup might come much
earlier than just vm_pfault_oom_wait seconds, if the rate of the page
reclamation is high enough.

If wakeups come fast and we loose the allocation race enough times, OOM
could be undeservably triggered much earlier than vm_pfault_oom_attempts
x vm_pfault_oom_wait seconds.  Fix it by not counting the number of sleeps,
but measuring the time to th first allocation failure, and triggering OOM
when it was older than oom_attempts x oom_wait seconds.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D32287
2021-10-08 12:24:46 +03:00
Mitchell Horne
31991a5a45 minidump: De-duplicate is_dumpable()
The function is identical in each minidump implementation, so move it to
vm_phys.c. The only slight exception is powerpc where the function was
public, for use in moea64_scan_pmap().

Reviewed by:	kib, markj, imp (earlier version)
MFC after:	2 weeks
Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D31884
2021-09-29 16:41:52 -03:00
Gleb Smirnoff
183f8e1e57 Externalize nsw_cluster_max and initialize it early.
GEOM_ELI needs to know the value, cause it will soon have special
memory handling for IO operations associated with swap.

Move initialization to swap_pager_init(), which is executed at
SI_SUB_VM, unlike swap_pager_swap_init(), which would be executed
only when a swap is configured. GEOM_ELI might need the value at
SI_SUB_DRIVERS, when disks are tasted by GEOM.

Reviewed by:		kib
Differential Revision:	https://reviews.freebsd.org/D24400
2021-09-28 11:23:52 -07:00
Gleb Smirnoff
c6213beff4 Add flag BIO_SWAP to mark IOs that are associated with swap.
Submitted by:		jtl
Reviewed by:		kib
Differential Revision:	https://reviews.freebsd.org/D24400
2021-09-28 11:23:51 -07:00
Konstantin Belousov
bd3a668087 vm_page_startup: correct calculation of the starting page
Also avoid unneded calculations when phys segment end is the phys_avail[]
start.

Submitted by:	alc
Reviewed by:	markj
MFC after:	1 week
Fixes:	181bfb42fd
Differential revision:	https://reviews.freebsd.org/D32009
2021-09-19 21:27:55 +03:00
Mark Johnston
d6e77cda9b uma: Show the count of free slabs in each per-domain keg's sysctl tree
This is useful for measuring the number of pages that could be freed
from a NOFREE zone under memory pressure.

MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-09-17 14:19:05 -04:00
Konstantin Belousov
181bfb42fd vm_phys: do not ignore phys_avail[] segments that do not fit completely into vm_phys segments
If phys_avail[] segment only intersect with some vm_phys segment, add
pages from it to the free list that belong to the given vm_phys_seg,
instead of dropping them.

The vm_phys segments are generally result of subdivision of phys_avail
segments, for instance DMA32 or LOWMEM boundaries split them. On
amd64, after UEFI in-place kernel activation (copy_staging disable)
was enabled, we typically have a large phys_avail[] segment below 4G
which crosses LOWMEM (1M) boundary. With the current way of requiring
phys_avail[] fully fit into vm_phys_seg, this memory was ignored.

Reported by:	madpilot
Reviewed by:	markj
Discussed with:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D31958
2021-09-16 20:01:19 +03:00
Mark Johnston
686aa9287c swap_pager: Handle large swap_pager_reserve() requests
This interface is used solely by md(4) when the MD_RESERVE flag is
specified, as in `mdconfig -a -t swap -s 1G -o reserve`.  It
pre-allocates swap blocks for the entire object.

The number of blocks to be reserved is specified as a vm_size_t, but
swp_pager_getswapspace() can allocate at most INT_MAX blocks.  vm_size_t
also seems like the incorrect type to use here it refers only to the
size of the VM object, not the size of a mapping.  So:
- change the type of "size" in swap_pager_reserve() to vm_pindex_t, and
- clamp the requested number of blocks for a single
  swp_pager_getswapspace() call to INT_MAX.

Reported by:	syzkaller
Reviewed by:	dougm, alc, kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31875
2021-09-07 14:04:50 -04:00
Bjoern A. Zeeb
eccb516db8 vm: use __func__ for the correct function name
In fee2a2fa39 the KASSERTs in
vm_page_unwire_noq() changed from "vm_page_unwire" to "vm_page_unref".
While the former no longer was part of that function the latter does
not exist as a function and is highly confusing when hit when using
tools to lookup the functions and not doing a full-text search.
Use %s __func__ for printing the function name, as that will do the
right thing as code moves around and functions get renamed.

Hit:	while debugging a wired page leak with linuxkpi/iwlwifi
Sponsored by:	The FreeBSD Foundation
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D31635
2021-08-22 17:43:12 +00:00
Gordon Bergling
fa7a635f7e Fix a few typos in source code comments
- s/becase/because/

MFC after:	5 days
2021-08-14 09:06:09 +02:00
Mark Johnston
100949103a uma: Add KMSAN hooks
For now, just hook the allocation path: upon allocation, items are
marked as initialized (absent M_ZERO).  Some zones are exempted from
this when it would otherwise raise false positives.

Use kmsan_orig() to update the origin map for UMA and malloc(9)
allocations.  This allows KMSAN to print the return address when an
uninitialized UMA item is implicated in a report.  For example:
  panic: MSan: Uninitialized UMA memory from m_getm2+0x7fe

Sponsored by:	The FreeBSD Foundation
2021-08-10 21:27:54 -04:00
Mark Johnston
8978608832 amd64: Populate the KMSAN shadow maps and integrate with the VM
- During boot, allocate PDP pages for the shadow maps.  The region above
  KERNBASE is currently not shadowed.
- Create a dummy shadow for the vm page array.  For now, this array is
  not protected by the shadow map to help reduce kernel memory usage.
- Grow shadows when growing the kernel map.
- Increase the default kernel stack size when KMSAN is enabled.  As with
  KASAN, sanitizer instrumentation appears to create stack frames large
  enough that the default value is not sufficient.
- Disable UMA's use of the direct map when KMSAN is configured.  KMSAN
  cannot validate the direct map.
- Disable unmapped I/O when KMSAN configured.
- Lower the limit on paging buffers when KMSAN is configured.  Each
  buffer has a static MAXPHYS-sized allocation of KVA, which in turn
  eats 2*MAXPHYS of space in the shadow map.

Reviewed by:	alc, kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31295
2021-08-10 21:27:53 -04:00
Ka Ho Ng
de2e152959 Add vnode_pager_purge_range(9) KPI
This KPI is created in addition to the existing vnode_pager_setsize(9)
KPI. The KPI is intended for file systems that are able to turn a range
of file into sparse range, also known as hole-punching.

Sponsored by:	The FreeBSD Foundation
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D27194
2021-08-05 22:52:26 +08:00
Konstantin Belousov
0ef5eee9d9 Add vn_lktype_write()
and remove repetetive code that calculates vnode locking type for write.

Reviewed by:	khng, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D31405
2021-08-04 19:40:13 +03:00
Konstantin Belousov
041b7317f7 Add pmap_vm_page_alloc_check()
which is the place to put MD asserts about allocated pages.

On amd64, verify that allocated page does not belong to the kernel
(text, data) or early allocated pages.

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D31121
2021-07-31 16:53:42 +03:00
Mark Johnston
4e8e26a004 redzone: Raise a compile error if KASAN is configured
redzone(9) does some munging of the allocation to insert redzones before
and after a valid memory buffer, but KASAN does not know about this and
will raise false positives if both are configured.  Until this is fixed,
do not allow both to be configured.  Note that KASAN provides similar
checking on its own but currently does not force the creation of
redzones for all UMA allocations; this should be addressed as well.

Sponsored by:	The FreeBSD Foundation
2021-07-23 10:47:13 -04:00
Mark Johnston
b0dfc48684 uma: Fix a few problems with KASAN integration
- Ensure that all items returned by UMA are aligned to
  KASAN_SHADOW_SCALE (8).  This was true in practice since smaller
  alignments are not used by any consumers, but we should enforce it
  anyway.
- Use a non-zero code for marking redzones that appear naturally in
  items that are not a multiple of the scale factor in size.  Currently
  we do not modify keg layouts to force the creation of redzones.
- Use a non-zero code for marking freed per-CPU items, otherwise
  accesses of freed per-CPU items are not detected by the runtime.

Sponsored by:	The FreeBSD Foundation
2021-07-09 20:38:50 -04:00
Konstantin Belousov
5b10e79edb Un-staticise vm_page_init_page()
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30785
2021-06-17 16:58:44 +03:00
Mateusz Guzik
128e25842e vm: add another pager private flag
Move OBJ_SHADOWLIST around to let pager flags be next to each other.

Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D30258
2021-05-15 20:47:29 +00:00
Konstantin Belousov
28bc23ab92 tmpfs: dynamically register tmpfs pager
Remove OBJT_SWAP_TMPFS. Move tmpfs-specific swap pager bits into
tmpfs_subr.c.

There is no longer any code to directly support tmpfs in sys/vm, most
tmpfs knowledge is shared by non-anon swap object type implementation.
The tmpfs-specific methods are provided by registered tmpfs pager, which
inherits from the swap pager.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30168
2021-05-13 20:13:34 +03:00
Konstantin Belousov
b730fd30b7 vm: Add KPI to dynamically register pagers
Pager is allowed to inherit part of its implementation from the existing
pager, which is done by copying non-NULL virtual method slots.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30168
2021-05-13 20:12:29 +03:00
Konstantin Belousov
7079449b0b sys/vm: remove several other uses of OBJT_SWAP_TMPFS
Mostly in cases where OBJ_SWAP flag works as well, or by reversing the
condition so that object types can be listed.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30168
2021-05-13 20:10:35 +03:00
Konstantin Belousov
3e7a11ca21 vm_object_set_memattr(): handle all object types without listing them explicitly
This avoids the need to know all existing object types in advance, by the
cost of loosing the assert that unknown object type is handled in a sane
manner.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30168
2021-05-13 20:10:35 +03:00
Konstantin Belousov
00a3fe968b vm_object_kvme_type(): reimplement by embedding kvme_type into pagerops
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30168
2021-05-13 20:10:35 +03:00
Mark Johnston
9246b3090c fork: Suspend other threads if both RFPROC and RFMEM are not set
Otherwise, a multithreaded parent process may trigger races in
vm_forkproc() if one thread calls rfork() with RFMEM set and another
calls rfork() without RFMEM.

Also simplify vm_forkproc() a bit, vmspace_unshare() already checks to
see if the address space is shared.

Reported by:	syzbot+0aa7c2bec74c4066c36f@syzkaller.appspotmail.com
Reported by:	syzbot+ea84cb06937afeae609d@syzkaller.appspotmail.com
Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D30220
2021-05-13 08:33:23 -04:00
Mark Johnston
06d1fd9f42 swap_pager: Zero swap info before exporting to userspace
Otherwise padding bytes are leaked.

Reported by:	KMSAN
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-05-12 12:52:05 -04:00