For non-anonymous swap objects, there is always a reference from the
owner to the object to keep it from recycling. Account for it when
deciding should we query pmap for hardware active references for the
page.
As result, we avoid unneeded calls to pmap_ts_referenced(), which for
non-mapped page means avoiding unneccessary lock and unlock of the pv list.
Reviewed by: markj
Discussed with: alc
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D33924
In vm_phys_alloc_seg_contig, in allocating multiple memory blocks for
a huge allocation, ensure that the end of the allocated range does not
exceed the upper segment limit.
Reorder a couple of checks to improve code layout.
Reviewed by: alc
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D33870
The approach taken by the stack gap implementation was to insert a
random gap between the top of the fixed stack mapping and the true top
of the main process stack. This approach was chosen so as to avoid
randomizing the previously fixed address of certain process metadata
stored at the top of the stack, but had some shortcomings. In
particular, mlockall(2) calls would wire the gap, bloating the process'
memory usage, and RLIMIT_STACK included the size of the gap so small
(< several MB) limits could not be used.
There is little value in storing each process' ps_strings at a fixed
location, as only very old programs hard-code this address; consumers
were converted decades ago to use a sysctl-based interface for this
purpose. Thus, this change re-implements stack address randomization by
simply breaking the convention of storing ps_strings at a fixed
location, and randomizing the location of the entire stack mapping.
This implementation is simpler and avoids the problems mentioned above,
while being unlikely to break compatibility anywhere the default ASLR
settings are used.
The kern.elfN.aslr.stack_gap sysctl is renamed to kern.elfN.aslr.stack,
and is re-enabled by default.
PR: 260303
Reviewed by: kib
Discussed with: emaste, mw
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33704
Buckets in an SMR-enabled zone can legitimately be tagged with
SMR_SEQ_INVALID. This effectively means that the zone destructor (if
any) was invoked on all items in the bucket, and the contained memory is
safe to reuse. If the first bucket in the full bucket list was tagged
this way, UMA would unnecessarily poll per-CPU state before attempting
to fetch a full bucket from the list.
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Previously we'd always print "out of swap space." This can be
misleading, as there are other reasons an OOM kill can be triggered. In
particular, it's entirely possible to trigger an OOM kill on a system
with plenty of free swap space.
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33810
Missed issues in truss on at least armv7 and powerpcspe need to be
resolved before recommit.
This reverts commit 3889fb8af0b611e3126dc250ebffb01805152104.
This reverts commit 1544e0f5d1f1e3b8c10a64cb899a936976ca7ea4.
This more clearly differentiates system call arguments from integer
registers and return values. On current architectures it has no effect,
but on architectures where pointers are not integers (CHERI) and may
not even share registers (CHERI-MIPS) it is necessiary to differentiate
between system call arguments (syscallarg_t) and integer register values
(register_t).
Obtained from: CheriBSD
Reviewed by: imp, kib
Differential Revision: https://reviews.freebsd.org/D33780
vm_reserv.c uses its own bitstring implemenation for popmaps. Using
the bitstring_t type from a standard header eliminates the code
duplication, allows some bit-at-a-time operations to be replaced with
more efficient bitstring range operations, and, in
vm_reserv_test_contig, allows bit_ffc_area_at to more efficiently
search for a big-enough set of consecutive zero-bits.
Make bitstring changes improve the vm_reserv code. Define a bit_ntest
method to test whether a range of bits is all set, or all clear.
Define bit_ff_at and bit_ff_area_at to implement the ffs and ffc
versions with a parameter to choose between set- and clear- bits.
Improve the area_at implementation. Modify the bit_nset and
bit_nclear implementations to allow code optimization in the cases
when start or end are multiples of _BITSTR_BITS.
Add a few new cases to bitstring_test.
Discussed with: alc
Reviewed by: markj
Tested by: pho (earlier version)
Differential Revision: https://reviews.freebsd.org/D33312
Remove always-false checks for UMA zone creation failure. No functional
change intended.
Reviewed by: alc, kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33809
With INVARIANTS defined, have vm_addr_align_ok and vm_addr_bound_ok
panic when passed an alignment/boundary parameter that is not a power
of two.
Reviewed by: alc
Suggested by: kib, se
Differential Revision: https://reviews.freebsd.org/D33725
The arch required two-pages alignment due to single TLB entry caching
two consequtive mappings.
Reviewed by: imp
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D33763
Move an assignment back to where it was before, to turn the
defined-but-not-used error back into a set-but-not-used warning.
Fixes: 01e115ab83a4 vm_phys: #include vm_extern
Arm64 and powerpc don't include vm_extern.h indirectly in vm_phys.c, which
means that for the sake of those architectures, it must be included explicitly.
Also, fix a set-unused warning that jenkins also found.
Reported by: Jenkins
Fixes: c606ab59e7f9 vm_extern: use standard address checkers everywhere
Define simple functions for alignment and boundary checks and use them
everywhere instead of having slightly different implementations
scattered about. Define them in vm_extern.h and use them where
possible where vm_extern.h is included.
Reviewed by: kib, markj
Differential Revision: https://reviews.freebsd.org/D33685
It is only called in the file that defines it, so make it static and
remove the declaration from the header.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D33688
Handle specially the boundary==0 case of vm_reserv_reclaim_config,
by turning off boundary adjustment in that case.
Reviewed by: alc
Tested by: pho, madpilot
fb38b29b5609 (page_alloc_br) vm_page: Remove extra test, dup code from page alloc
should have moved a comment block when it moved the function call that followed it.
Move the comment block now.
Function vm_reserv_reclaim_contig breaks a reservation with enough
free space to satisfy an allocation request and returns the free space
to the buddy allocator. Change the function to allocate the request
memory from the reservation before breaking it, and return that memory
to the caller. That avoids a second call to the buddy allocator and
guarantees successful allocation after breaking the reservation, where
that success is not currently guaranteed.
Reviewed by: alc, kib (previous version)
Differential Revision: https://reviews.freebsd.org/D33644
Fix a very recent change that introduced a page accounting error in
case of a reserveration being broken.
Reviewed by: alc
Fixes: fb38b29b5609 (page_alloc_br) vm_page: Remove extra test, dup code from page alloc
Differential Revision: https://reviews.freebsd.org/D33645
Extract code common to functions vm_page_alloc_contig_domain and
vm_page_alloc_noobj_contig_domain into a new function. Do so in a way
that eliminates a bound-to-fail reservation test after a reservation
is broken by a call from vm_page_alloc_contig_domain.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D33551
Summary:
One was required to press a key to continue after every 18 lines of
output. This requirement had been in the "show vmopag" command since it
was introduced, which was many years before paging was added to DDB.
With paging, this explict key check is no longer necessary.
Obtained from: Juniper Networks, Inc.
MFC after: 1 week
Test Plan:
Run "show vmopag" from db> prompt and see that it does not need additional
keypresses other than the ones needed for the pager.
Subscribers: imp, #contributor_reviews_base
Differential Revision: https://reviews.freebsd.org/D33550
Commit 867c27c23a5c modified the NFS client so that
it does IO_APPEND writes directly to the NFS server,
bypassing the buffer cache. However, this could result
in stale data in client pages when the file is mmap(2)'d.
As such, the NFS client needs to call is_object_active()
to check if the file is mmap(2)'d.
This patch renames is_object_active() to vm_object_is_active(),
moves it to sys/vm/vm_object.c and makes it global, so that
the NFS client can call it in a future commit.
Reviewed by: kib
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D33520
Function vm_reserv_test_contig has incorrectly used its alignment
and boundary parameters to find a well-positioned range of empty pages
in a reservation. Consequently, a reservation could be broken
mistakenly when it was unable to provide a satisfactory set of pages.
Rename the function, correct the errors, and add assertions to detect
the error in case it appears again.
Reviewed by: alc, markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D33344
vm_map_wire() works by calling vm_fault(VM_FAULT_WIRE) on each page in
the rage. (For largepage mappings, it calls vm_fault() once per large
page.)
A pager's populate method may return more than one page to be mapped.
If VM_FAULT_WIRE is also specified, we'd wire each page in the run, not
just the fault page. Consider an object with two pages mapped in a
vm_map_entry, and suppose vm_map_wire() is called on the entry. Then,
the first vm_fault() would allocate and wire both pages, and the second
would encounter a valid page upon lookup and wire it again in the
regular fault handler. So the second page is wired twice and will be
leaked when the object is destroyed.
Fix the problem by modify vm_fault_populate() to wire only the fault
page. Also modify the error handler for pmap_enter(psind=1) to not test
fs->wired, since it must be false.
PR: 260347
Reviewed by: alc, kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33416
In vm_reserv_init, set all the marker popmap bits in vm_reserv_init,
and not just the bits of the first popmap entry.
Reviewed by: markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D33258
A page must not become invalid while vm_fault_soft_fast() is attempting
to map unbusied pages for reading.
Note that all callers hold the object write lock already, and
vm_page_set_invalid() asserts the object write lock.
Reviewed by: kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33250
The flag requests skipping the heuristic which tries to avoid leaving
system with more allocated memory than available from RAM and remanining
swap.
Reviewed by: markj
Discussed with: alc
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D33165
For compatibility, add a placeholder pointer to the start of the
added struct swapoff_new_args, and use it to distinguish old vs. new
style of syscall invocation.
Reviewed by: markj
Discussed with: alc
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D33165
When swap is turned off due to system shutdown or reboot, ignore the
check. Problem is that the check is not accurate by any means, free
page count can legitimately be low while system still able to page in
everything from the swap. Then, we turn swap off if swapping on
real file or some non-standard geom provider, and typically panic
when system appears to actually need to unavailable page.
For syscall, it is better to be safe than sorry.
Reported and tested by: peterj
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D33147
VOP_STRATEGY() requires locked vnode. Note that we lock the swap vnode
while pages are busy, but this would only cause real LoR if pages belong
to the swap vnode, which must not be the case for correct use.
Reported and tested by: peterj
Reviewed by: markj
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D33119
to cover VOP_GETATTR() call in sys_swapon(). Move locking from inside
swapongeom() and swaponvp() into sys_swapon().
Reported by and tested by: peterj
Reviewed by: markj
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D33119
Rather than overloading the meanings of the Mach statuses, introduce a
new set for use internally in the fault code. This makes the control
flow easier to follow and provides some extra error checking when a
fault status variable is used in a switch statement.
vm_fault_lookup() and vm_fault_relookup() continue to use Mach statuses
for now, as there isn't much benefit to converting them and they
effectively pass through a status from vm_map_lookup().
Obtained from: jeff (object_concurrency patches)
Reviewed by: kib
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D33017
This makes it easier to factor out pieces of vm_fault(). No functional
change intended.
Obtained from: jeff (object_concurrency patches)
Reviewed by: kib
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D33016
When constructing the set of dumpable pages, use the bitset provided by
the state argument, rather than assuming vm_page_dump invariably. For
normal kernel minidumps this will be a pointer to vm_page_dump, but when
dumping the live system it will not.
To do this, the functions in vm_dumpset.h are extended to accept the
desired bitset as an argument. Note that this provided bitset is assumed
to be derived from vm_page_dump, and therefore has the same size.
Reviewed by: kib, markj, jhb
MFC after: 2 weeks
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D31992
4.3 BSD's mmap took an int len and long pos. Reject negative lengths
and in freebsd32 sign-extend pos correctly rather than mis-handling
negative positions as large positive ones.
Reviewed by: kib