Finish updating comments to reflect new locking protocols introduced
over the past year. In particular, vm_page_lock is now effectively
unused.
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25868
These functions were introduced before UMA started ensuring that freed
memory gets placed in domain-local caches. They no longer serve any
purpose since UMA now provides their functionality by default. Remove
them to simplyify the kernel memory allocator interfaces a bit.
Reviewed by: cem, kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25937
Use atomic(9) to load the lock state. Some places were doing this
already, so it was inconsistent. In initialization code, the lock state
is still initialized with plain stores.
Reviewed by: alc, kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25861
vm_page_xbusy_claim() could clobber the waiter bit. For its original
use, kernel memory pages, this was not a problem since nothing would
ever block on the busy lock for such pages. r363607 introduced a new
use where this could in principle be a problem.
Fix the problem by using atomic_cmpset to update the lock owner. Since
this macro is defined only for INVARIANTS kernels the extra overhead
doesn't seem prohibitive.
Reported by: vangyzen
Reviewed by: alc, kib, vangyzen
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25859
vm_page_assert_xbusied() asserts that the busying thread is the current
thread. For some uses of vm_page_free_invalid() (e.g., error handling
in vnode_pager_generic_getpages_done()), this condition might not hold.
Reported by: Jenkins via trasz
Reviewed by: chs, kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25828
swap size by dividing a value, which was always a multiple of 64, by
64. Remove the code that reduced max swap size down to that cap.
Eliminate the distinction between BLIST_BMAP_RADIX and
BLIST_META_RADIX. Call them both BLIST_RADIX.
Make improvments to the blist self-test code to silence compiler
warnings and to test larger blists.
Reported by: jmallett
Reviewed by: alc
Discussed with: kib
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D25736
The code did not subtract from the global counter if per-uid reservation
failed.
Cleanup highlights:
- load overcommit once
- move per-uid manipulation to dedicated routines
- don't fetch wire count if requested size is below the limit
- convert return type from int to bool
- ifdef the routines with _KERNEL to keep vm.h compilable by userspace
Reviewed by: kib (previous version)
Differential Revision: https://reviews.freebsd.org/D25787
The counter keeps being updated all the time and variables read afterwards
share the cacheline. Note this still fundamentally does not scale and needs
to be replaced, in the meantime gets a bandaid.
brk1_processes -t 52 ops/s:
before: 8598298
after: 9098080
Rather than marking the read ahead/behind pages valid even though they were
not initialized, free them using the new function vm_page_free_invalid().
Reviewed by: markj, kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D25430
that might be wired. If the page is wired then it cannot be freed now,
but the thread that eventually unwires it will free it at that point.
Reviewed by: markj, kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D25430
Suppose a thread is running on a CPU in a NUMA domain with no physical
RAM. When an item is freed to a first-touch zone, it ends up in the
cross-domain bucket. When the bucket is full, it gets placed in another
domain's bucket queue. However, when allocating an item, UMA will
always go to the keg upon a per-CPU cache miss because the empty
domain's bucket queue will always be empty. This means that a non-empty
domain's bucket queues can grow very rapidly on such systems. For
example, it can easily cause mbuf allocation failures when the zone
limit is reached.
Change cache_alloc() to follow a round-robin policy when running on an
empty domain.
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25355
All vm_object_page_remove() callers, except
linux_invalidate_mapping_pages() in the LinuxKPI, free swap space when
removing a range of pages from an object. The LinuxKPI case appears to
be an unintentional omission that could result in leaked swap blocks, so
unconditionally free swap space in vm_object_page_remove() to protect
against similar bugs in the future.
Reviewed by: alc, kib
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25329
FreeBSD madvise(2) directly. While some of the flag values match,
most don't.
PR: kern/230160
Reported by: markj
Reviewed by: markj
Discussed with: brooks, kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25272
Functions which take untrusted user ranges must validate against the
bounds of the map, and also check for wraparound. Instead of having the
same logic duplicated in a number of places, add a function to check.
Reviewed by: dougm, kib
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D25328
The lock-protected iteration is trivially avoidable.
This removes a serialisation point from Linux binaries (which end up calling
here from the sysinfo syscall).
Instead, just skip marking pages valid if the read fails. Future
attempts to access such pages will notice that they are not marked valid
and try to read them from disk again.
Reviewed by: kib, markj
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D25138
As with r361769 (man page), PROT_* are properly called protections, not
permissions.
MFC after: 1 week
MFC with: r361769
Sponsored by: The FreeBSD Foundation
- Add vm_phys_early_add_seg(), complementing vm_phys_early_alloc(), to
ensure that segments registered during hammer_time() are placed in the
right domain. Otherwise, since the SRAT is not parsed at that point,
we just add them to domain 0, which may be incorrect and results in a
domain with only several MB worth of memory.
- Fix uma_startup1() to try allocating memory for zones from any domain.
If domain 0 is unpopulated, the allocation will simply fail, resulting
in a page fault slightly later during boot.
- Change _vm_phys_domain() to return -1 for addresses not covered by the
affinity table, and change vm_phys_early_alloc() to handle wildcard
domains. This is necessary on amd64, where the page array is dense
and pmap_page_array_startup() may allocate page table pages for
non-existent page frames.
Reported and tested by: Rafael Kitover <rkitover@gmail.com>
Reviewed by: cem (earlier version), kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25001
The list of arches list there matches the list of arches where
default VM_NRESERVLEVEL > 0. Before sparc64 removal, that was the
only arch that defined VM_NRESERVLEVEL > 0 to help with cache coloring,
but did not implemented superpages. Now it can be simplified.
Submitted by: alc
Reviewed by: markj
Otherwise anything counted before SI_SUB_VM_CONF is discarded. However,
it is useful to be able to see stats from allocations done early during
boot.
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D24756
__builtin_unreachable doesn't raise any compile-time warnings/errors on its
own, so problems with its usage can't be easily detected. While it would be
nice for this situation to change and compilers to at least add a warning
for trivial cases where local state means the instruction can't be reached,
this isn't the case at the moment and likely will not happen.
This commit adds an __assert_unreachable, whose intent is incredibly clear:
it asserts that this instruction is unreachable. On INVARIANTS builds, it's
a panic(), and on non-INVARIANTS it expands to __unreachable().
Existing users of __unreachable() are converted to __assert_unreachable,
to improve debuggability if this assumption is violated.
Reviewed by: mjg
Differential Revision: https://reviews.freebsd.org/D23793
Summary:
POWER9 supports two MMU formats: traditional hashed page tables, and Radix
page tables, similar to what's presesnt on most other architectures. The
PowerISA also specifies a process table -- a table of page table pointers--
which on the POWER9 is only available with the Radix MMU, so we can take
advantage of it with the Radix MMU driver.
Written by Matt Macy.
Differential Revision: https://reviews.freebsd.org/D19516
A concurrent unlocked lookup can wire the page after
vm_page_release_locked() releases the last wiring, in which case
vm_page_release_locked() must not free the page. Once the xbusy lock is
acquired, that, the object lock and the fact that the page is unmapped
ensure that the wire count cannot increase, so re-check for new wirings
after the page is xbusied.
Update the comment above vm_page_wired() to reflect the new
synchronization rules.
Reported by: glebius
Reviewed by: alc, jeff, kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D24592
Previously we allocated a separate VM object for each kernel stack.
However, fully constructed kernel stacks are cached by UMA, so there is
no harm in using a single global object for all stacks. This reduces
memory consumption and makes it easier to define a memory allocation
policy for kernel stack pages, with the aim of reducing physical memory
fragmentation.
Add a global kstack_object, and use the stack KVA address to index into
the object like we do with kernel_object.
Reviewed by: kib
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D24473
kmem_alloc_attr_domain() and kmem_alloc_contig_domain() duplicated each
other's page allocation and reclamation logic. Place it in a single
function to make it easier to add additional consumers. No functional
change intended.
Reviewed by: jeff, kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D24475
This simplifies some planned changes. No functional change intended.
Reviewed by: kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D24474
vm_page_acquire_unlocked() relies on type-stability of vm_page
structures and assumes that the listq linkage pointers always point to a
vm_page or are NULL. QUEUE_MACRO_DEBUG_TRASH breaks that assumption, so
add an explicit check for a trashed queue pointer before dereferencing.
Reported and tested by: pho
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D24472
This makes it easier to write libkvm programs that access UMA data
structures.
- Remove a couple of unused slab functions and make others local to
uma_core.c. Similarly move SLAB_BITSETS, which affects the layout of
slab structures, to uma_core.c.
- Stop defining the slab structures under _KERNEL. There's no real
reason they can't be visible to userspace like the rest of UMA's
structures are.
- Group KEG_ASSERT_COLD with other keg macros.
- Convert an assertion about MAXMEMDOM to use _Static_assert.
No functional change intended.
Discussed with: jeff
Reviewed by: rlibby
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D23980
The intent is to provide a header that can be included by other headers
without introducing too much pollution. smr.h depends on various
headers and will likely grow over time, but is less likely to be
required by system headers.
Rename SMR_TYPE_DECLARE() to SMR_POINTER():
- One might use SMR to protect more than just pointers; it
could be used for resizeable arrays, for example, so TYPE seems too
generic.
- It is useful to be able to define anonymous SMR-protected pointer
types and the _DECLARE suffix makes that look wrong.
Reviewed by: jeff, mjg, rlibby
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D23988
This presents an extensible interface to the generic mmap(2)
implementation via a struct pointer intended to use a designated
initializer or compount literal. We take advantage of the mandatory
zeroing of fields not listed in the initializer.
Remove kern_mmap_fpcheck() and use kern_mmap_req().
The motivation for this change is a desire to keep the core
implementation from growing an ever-increasing number of arguments
that must be specified in the correct order for the lowest-level
implementations. In CheriBSD we have already added two more arguments.
Reviewed by: kib
Discussed with: kevans
Obtained from: CheriBSD
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D23164