Commit Graph

4664 Commits

Author SHA1 Message Date
Doug Moore
fa8a6585c7 vm_phys: avoid waste in multipage allocation
In vm_phys_alloc_contig, for an allocation bigger than the size of any
buddy queue free block, avoid examining any maximum-size free block
more than twice, by only starting to consider a sequence of adjacent
max-blocks starting at a max-block that does not follow another
max-block.  If that first max-block follows adjacent blocks of smaller
size, and if together they provide enough memory to reduce by one the
number of max-blocks required for this allocation, use them as part of
this allocation.

Reviewed by:	markj
Tested by:	pho
Discussed with:	alc
Differential Revision:	https://reviews.freebsd.org/D34815
2022-04-26 02:56:23 -05:00
John Baldwin
52526922ac vm_phys_init: Quiet unused but set warnings about npages.
npages is used in two optional cases:

- to conditionally create a separate DMA32 free list

- to index vm_page_array for VM_PHYSSEG_SPARSE

Add in more #ifdef's around npages statements.

Reviewed by:	alc, markj
Differential Revision:	https://reviews.freebsd.org/D34887
2022-04-18 12:06:14 -07:00
Mark Johnston
f82177b8cf vm: Initialize the transient buffer mapping arena with M_WAITOK
The wait flag is passed to UMA when allocating boundary tags for the
initial span, and UMA expects either M_WAITOK or M_NOWAIT to be present.

Reported by:	cperciva
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2022-04-14 15:46:14 -04:00
Mark Johnston
6fb7c42d59 vm: Move the "vm_wait in early boot" assertion to the proper place
The assertion was added in commit 1771e987ca.  After that, vm_wait()
and friends were refactored such that the actual sleep happens
elsewhere.  Now the assertion condition is not checked when
vm_wait_doms() is called directly, and it is checked even if we are not
going to sleep (because vm_page_count_min_set(wdoms) is false).

Reviewed by:	alc, kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D34909
2022-04-14 15:45:54 -04:00
John Baldwin
b8ebd99aa5 vm: Use __diagused for variables only used in KASSERT(). 2022-04-13 16:08:20 -07:00
John Baldwin
40cbcb996c vm_fault_dontneed: Inline value of variable used once in an assertion. 2022-04-13 16:08:19 -07:00
Enji Cooper
567378cc07 Fix OID format for vm.swap_reserved and vm.swap_total
The correct OID format for CTLTYPE_U64 is `QU` (`uquad_t`), not `A`
(text expressed via `char *`).

This issue was noticed while doing an sysctl tree walk using a
sysctl(9) consumer that relies on the OID format to intuit what the
type should be for a given sysctl.

MFC after:	1 month
Sponsored by:	DellEMC Isilon
Differential Revision: https://reviews.freebsd.org/D34877
2022-04-10 18:17:09 -07:00
John Baldwin
2e7838ae84 vm_phys_early_alloc: mem_index is only used under #ifdef NUMA.
Possibly mem_index should just reuse biggestone since this loop is
already reusing biggestsize.
2022-04-08 17:25:13 -07:00
John Baldwin
a7e1a58554 uma_zfree_smr: uz_flags is only used if NUMA is defined. 2022-04-08 17:25:13 -07:00
Gordon Bergling
f167c46e79 memguard(9): Fix two typos in source code comments
- s/comparsion/comparison/

MFC after:	3 days
2022-04-02 13:51:27 +02:00
Peter Jeremy
9a89977bf6
kern: Fix typo in kassert message.
- s/unepxected/unexpected/
MFC after:	3 days
2022-04-02 21:36:17 +11:00
Doug Moore
557dc337e6 vm_phys: check small blocks to finish allocation
In vm_phys_alloc_queues_contig, in the case that a sequence of
max-order blocks are sought to fulfill an allocation, a sequence is
ruled out if it does not have enough max-order blocks to satisfy the
allocation. However, there may be smaller blocks of free memory that
follow the last max-order block in the sequence, and they may be big
enough to complete the allocation request, so check for that
possibility before giving up on that block sequence.

Reviewed by:	markj
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D34724
2022-03-31 16:19:55 -05:00
Doug Moore
342056fa1c vm_phys: alloc pages without duplicating searches.
In the search for contiguous pages, as each page segment is examined,
check to see if the free list set for the next page segment differs
from the set for the current segment, and avoid a pointless search if
they do not differ.

Discussed with:	alc
Reviewed by:	markj
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D33947
2022-03-31 01:40:46 -05:00
Mark Johnston
d53927b0ba uma: Don't allow a limit to be set in a warm zone
The limit accounting in UMA does not tolerate this.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2022-03-30 15:42:18 -04:00
Mark Johnston
54361f9020 uma: Use the correct type for a return value
zone_alloc_bucket() returns a pointer, not a bool.

MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2022-03-30 15:42:05 -04:00
Brooks Davis
b1ad6a9000 syscallarg_t: Add a type for system call arguments
This more clearly differentiates system call arguments from integer
registers and return values. On current architectures it has no effect,
but on architectures where pointers are not integers (CHERI) and may
not even share registers (CHERI-MIPS) it is necessiary to differentiate
between system call arguments (syscallarg_t) and integer register values
(register_t).

Obtained from:	CheriBSD

Reviewed by:	imp, kib
Differential Revision:	https://reviews.freebsd.org/D33780
2022-03-28 19:43:03 +01:00
Eric van Gyzen
490b09f240 uma_zalloc_domain: call uma_zalloc_debug in multi-domain path
It was only called in the non-NUMA and single-domain paths.
Some of its assertions were duplicated in uma_zalloc_domain,
but some things were missed, especially memguard.

Reviewed by:	markj, rstone
MFC after:	1 week
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D34472
2022-03-25 20:10:38 -05:00
Eric van Gyzen
a8cbb835bf uma_zalloc: assert M_NOWAIT ^ M_WAITOK
The uma_zalloc functions expect exactly one of [M_NOWAIT, M_WAITOK].
If neither or both are passed, print an error and a stack dump.
Only do this ten times, to prevent livelock.  In the future, after
this exposes enough bad callers, this will be changed to a KASSERT().

Reviewed by:	rstone, markj
MFC after:	1 month
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D34452
2022-03-25 20:10:37 -05:00
Eric van Gyzen
cfbb5f8ce0 vm_ksubmap_init: pass M_WAITOK to vmem_init -> uma_zalloc_arg
uma_zalloc_arg expects exactly one of the two WAIT flags.  A future
commit will assert this.

Reviewed by:	rstone
MFC after:	1 month
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D34450
2022-03-25 20:10:37 -05:00
Mateusz Guzik
bb92cd7bcd vfs: NDFREE(&nd, NDF_ONLY_PNBUF) -> NDFREE_PNBUF(&nd) 2022-03-24 10:20:51 +00:00
Mark Johnston
389a3fa693 uma: Add UMA_ZONE_UNMANAGED
Allow a zone to opt out of cache size management.  In particular,
uma_reclaim() and uma_reclaim_domain() will not reclaim any memory from
the zone, nor will uma_timeout() purge cached items if the zone is idle.
This effectively means that the zone consumer has control over when
items are reclaimed from the cache.  In particular, uma_zone_reclaim()
will still reclaim cached items from an unmanaged zone.

Reviewed by:	hselasky, kib
MFC after:	3 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D34142
2022-02-15 09:25:34 -05:00
John Baldwin
becaf6433b Use vmspace->vm_stacktop in place of sv_usrstack in more places.
Reviewed by:	markj
Obtained from:	CheriBSD
Differential Revision:	https://reviews.freebsd.org/D34174
2022-02-14 10:57:30 -08:00
Konstantin Belousov
b51927b7b0 Revert "vm_pageout_scans: correct detection of active object"
This reverts commit 3de96d664a.

Problem is that it is possible to reach the state with ref_count ==
1 for the mapped non-anonymous object. For instance, anonymous posix
shmfd or linux shmfs object could be mapped, and then corresponding
file descriptor closed, dropping the object reference owned by the
shmfd/shmfs file.  Then the check in inactive scan assumes that the
object and page are not mapped and frees the page, while they are not.

PR:	261707
Discussed with:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	now
2022-02-10 16:55:10 +02:00
Robert Wing
c9e023541a pbuf_ctor(): lock the buffer with LK_NOWAIT
This LOR happens when reading from a file backed MD device:

lock order reversal:
 1st 0xfffffe00431eaac0 pbufwait (pbufwait, lockmgr) @ /cobra/src/sys/vm/vm_pager.c:471
 2nd 0xfffff80003f17930 ufs (ufs, lockmgr) @ /cobra/src/sys/dev/md/md.c:977
lock order pbufwait -> ufs attempted at:
    #0 0xffffffff80c78ead at witness_checkorder+0xbdd
    #1 0xffffffff80bd6a52 at lockmgr_lock_flags+0x182
    #2 0xffffffff80f52d5c at ffs_lock+0x6c
    #3 0xffffffff80d0f3f4 at _vn_lock+0x54
    #4 0xffffffff80708629 at mdstart_vnode+0x499
    #5 0xffffffff807060ec at md_kthread+0x20c
    #6 0xffffffff80bbfcd0 at fork_exit+0x80
    #7 0xffffffff810b809e at fork_trampoline+0xe

This LOR was previously blessed by witness before commit 531f8cfea0
("Use dedicated lock name for pbufs").

Instead of blessing ufs and pbufwait, use LK_NOWAIT to prevent recording
the lock order. LK_NOWAIT will be a nop here as the lock is dropped in
pbuf_dtor(). The takes the same approach as 5875b94c74 ("buf_alloc():
lock the buffer with LK_NOWAIT").

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D34183
2022-02-07 10:05:20 -09:00
Konstantin Belousov
0b8643eaf6 vmmeter(): Fix detection of the named swap objects
Noted and reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D33549
2022-02-02 11:39:58 +02:00
Konstantin Belousov
4cf9f5d807 vm_object: restore handling of shadow_count for all type of objects
instead of only OBJ_ANON objects that are backing, as it is now.
This is required for e.g. vm_meter is_object_active() detection, and
should be useful in some more cases.

Use refcount KPI for all objects, regardless of owning the object lock,
and the fact that currently OBJ_ANON cannot change for the live object.

Noted and reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D33549
2022-02-02 11:39:51 +02:00
Konstantin Belousov
d950c5898a vm/vm_extern.h, vm/vm_page.h: use sys/kassert.h
instead of fatty sys/systm.h.

Suggested by:	jhb
Reviewed by:	alc, imp, jhb (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D34089
2022-02-01 05:55:35 +02:00
Konstantin Belousov
f4cdb9d7c3 vm/vm_pager.h: use sys/systm.h header
it is needed for __read_mostly attribute definition, which right now
comes from vm/vm_page.h including sys/systm.h

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D34089
2022-02-01 05:55:35 +02:00
Konstantin Belousov
531f8cfea0 Use dedicated lock name for pbufs
Also remove a pointer to array variable, use array address directly.

Reviewed by:	markj, mckusick
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D34072
2022-01-31 04:46:14 +02:00
John Baldwin
29d481ae6a Make <vm/vm_extern.h> more self-contained.
Add a nested include of <sys/systm.h> for recently added assertions.
Without this, existing code (such as in drm-kmod) needs to be patched
to add the newly required header.

While here, rewrite the assertions using KASSERT().

Reviewed by:	dougm, alc, imp, kib
Differential Revision:	https://reviews.freebsd.org/D34070
2022-01-28 13:14:03 -08:00
Konstantin Belousov
3de96d664a vm_pageout_scans: correct detection of active object
For non-anonymous swap objects, there is always a reference from the
owner to the object to keep it from recycling.  Account for it when
deciding should we query pmap for hardware active references for the
page.

As result, we avoid unneeded calls to pmap_ts_referenced(), which for
non-mapped page means avoiding unneccessary lock and unlock of the pv list.

Reviewed by:	markj
Discussed with:	alc
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D33924
2022-01-22 19:34:32 +02:00
Doug Moore
0ce7909cd0 vm_phys: add essential segment bounds check
A lower-bound segment check is necessary in vm_phys_alloc_seg_contig.
Add one.

Reported by:	jenkins
Reviewed by:	alc
Fixes:	da92ecbc0d vm_phys: fix seg->end test in alloc_seg_contig
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D33945
2022-01-19 00:42:39 -06:00
Doug Moore
da92ecbc0d vm_phys: fix seg->end test in alloc_seg_contig
In vm_phys_alloc_seg_contig, in allocating multiple memory blocks for
a huge allocation, ensure that the end of the allocated range does not
exceed the upper segment limit.

Reorder a couple of checks to improve code layout.

Reviewed by:	alc
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D33870
2022-01-18 12:49:09 -06:00
Mark Johnston
46d35d415a fork: Copy the vm_stacktop field into the new vmspace
Fixes:	1811c1e957 ("exec: Reimplement stack address randomization")
Reported by:	pho
Reported by:	syzbot+0446312a51bc13ead834@syzkaller.appspotmail.com
Sponsored by:	The FreeBSD Foundation
2022-01-18 10:51:49 -05:00
Mark Johnston
1811c1e957 exec: Reimplement stack address randomization
The approach taken by the stack gap implementation was to insert a
random gap between the top of the fixed stack mapping and the true top
of the main process stack.  This approach was chosen so as to avoid
randomizing the previously fixed address of certain process metadata
stored at the top of the stack, but had some shortcomings.  In
particular, mlockall(2) calls would wire the gap, bloating the process'
memory usage, and RLIMIT_STACK included the size of the gap so small
(< several MB) limits could not be used.

There is little value in storing each process' ps_strings at a fixed
location, as only very old programs hard-code this address; consumers
were converted decades ago to use a sysctl-based interface for this
purpose.  Thus, this change re-implements stack address randomization by
simply breaking the convention of storing ps_strings at a fixed
location, and randomizing the location of the entire stack mapping.
This implementation is simpler and avoids the problems mentioned above,
while being unlikely to break compatibility anywhere the default ASLR
settings are used.

The kern.elfN.aslr.stack_gap sysctl is renamed to kern.elfN.aslr.stack,
and is re-enabled by default.

PR:		260303
Reviewed by:	kib
Discussed with:	emaste, mw
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D33704
2022-01-17 16:12:36 -05:00
Mark Johnston
a04ce833f9 uma: Avoid polling for an invalid SMR sequence number
Buckets in an SMR-enabled zone can legitimately be tagged with
SMR_SEQ_INVALID.  This effectively means that the zone destructor (if
any) was invoked on all items in the bucket, and the contained memory is
safe to reuse.  If the first bucket in the full bucket list was tagged
this way, UMA would unnecessarily poll per-CPU state before attempting
to fetch a full bucket from the list.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2022-01-14 15:38:02 -05:00
Mark Johnston
4a864f624a vm_pageout: Print a more accurate message to the console before an OOM kill
Previously we'd always print "out of swap space."  This can be
misleading, as there are other reasons an OOM kill can be triggered.  In
particular, it's entirely possible to trigger an OOM kill on a system
with plenty of free swap space.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D33810
2022-01-14 15:04:21 -05:00
Brooks Davis
0910a41ef3 Revert "syscallarg_t: Add a type for system call arguments"
Missed issues in truss on at least armv7 and powerpcspe need to be
resolved before recommit.

This reverts commit 3889fb8af0.
This reverts commit 1544e0f5d1.
2022-01-12 23:29:20 +00:00
Brooks Davis
1544e0f5d1 syscallarg_t: Add a type for system call arguments
This more clearly differentiates system call arguments from integer
registers and return values. On current architectures it has no effect,
but on architectures where pointers are not integers (CHERI) and may
not even share registers (CHERI-MIPS) it is necessiary to differentiate
between system call arguments (syscallarg_t) and integer register values
(register_t).

Obtained from:	CheriBSD

Reviewed by:	imp, kib
Differential Revision:	https://reviews.freebsd.org/D33780
2022-01-12 22:51:25 +00:00
Doug Moore
84e2ae64c5 vm_reserv: use enhanced bitstring for popmaps
vm_reserv.c uses its own bitstring implemenation for popmaps. Using
the bitstring_t type from a standard header eliminates the code
duplication, allows some bit-at-a-time operations to be replaced with
more efficient bitstring range operations, and, in
vm_reserv_test_contig, allows bit_ffc_area_at to more efficiently
search for a big-enough set of consecutive zero-bits.

Make bitstring changes improve the vm_reserv code.  Define a bit_ntest
method to test whether a range of bits is all set, or all clear.
Define bit_ff_at and bit_ff_area_at to implement the ffs and ffc
versions with a parameter to choose between set- and clear- bits.
Improve the area_at implementation.  Modify the bit_nset and
bit_nclear implementations to allow code optimization in the cases
when start or end are multiples of _BITSTR_BITS.

Add a few new cases to bitstring_test.

Discussed with:	alc
Reviewed by:	markj
Tested by:	pho (earlier version)
Differential Revision:	https://reviews.freebsd.org/D33312
2022-01-12 11:03:53 -06:00
Mark Johnston
c4a25e0713 vm_pageout: Group sysctl variables together with sysctl definitions
Fix some style bugs while here.  No functional change intended.

Reviewed by:	alc, kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D33811
2022-01-11 09:27:45 -05:00
Mark Johnston
43b3b8e52d swap_pager: uma_zcreate() doesn't fail
Remove always-false checks for UMA zone creation failure.  No functional
change intended.

Reviewed by:	alc, kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D33809
2022-01-11 09:27:45 -05:00
Doug Moore
ae13829ddc vm_addr_ok: add power2 invariant check
With INVARIANTS defined, have vm_addr_align_ok and vm_addr_bound_ok
panic when passed an alignment/boundary parameter that is not a power
of two.

Reviewed by:	alc
Suggested by:	kib, se
Differential Revision:	https://reviews.freebsd.org/D33725
2022-01-10 01:17:25 -06:00
Konstantin Belousov
c25a30e255 Dump page tracking no longer needed on mips
Reviewed by:	imp
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D33763
2022-01-06 06:00:39 +02:00
Konstantin Belousov
f54882a862 Remove special kstack allocation code for mips.
The arch required two-pages alignment due to single TLB entry caching
two consequtive mappings.

Reviewed by:	imp
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D33763
2022-01-06 04:43:56 +02:00
Doug Moore
f76916c095 vm_reserv: #include vm_extern.h explicitly, for arm.
Fixes:	c606ab59e7 vm_extern: use standard address checkers everywhere
2021-12-31 00:40:25 -06:00
Doug Moore
e6930b1c5f vm_phys: convert error back to warning
Move an assignment back to where it was before, to turn the
defined-but-not-used error back into a set-but-not-used warning.

Fixes:	01e115ab83 vm_phys: #include vm_extern
2021-12-31 00:23:46 -06:00
Doug Moore
01e115ab83 vm_phys: #include vm_extern
Arm64 and powerpc don't include vm_extern.h indirectly in vm_phys.c, which
means that for the sake of those architectures, it must be included explicitly.

Also, fix a set-unused warning that jenkins also found.

Reported by:	Jenkins
Fixes:	c606ab59e7 vm_extern: use standard address checkers everywhere
2021-12-30 23:31:18 -06:00
Doug Moore
c606ab59e7 vm_extern: use standard address checkers everywhere
Define simple functions for alignment and boundary checks and use them
everywhere instead of having slightly different implementations
scattered about. Define them in vm_extern.h and use them where
possible where vm_extern.h is included.

Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D33685
2021-12-30 22:09:08 -06:00
Gleb Smirnoff
841e0a8757 uma: with KTR trace allocs/frees from SMR zones 2021-12-29 23:08:33 -08:00