entry, when that entry has been seen already, keep the
already-looked-up value in a variable and use that instead of looking
it up again.
Approved by: alc, markj (earlier version), kib (earlier version)
Differential Revision: https://reviews.freebsd.org/D22348
so that we avoid the hashtables. The hashtable is now only required if
a zone is created with OFFPAGE specified initially, not internally. This
flag signals to UMA that it can't touch the allocated memory and so
can't store a slab pointer in the containing page.
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D22453
reudundant complicated checks and additional locking required only for
anonymous memory. Introduce vm_object_allocate_anon() to create these
objects. DEFAULT and SWAP objects now have the correct settings for
non-anonymous consumers and so individual consumers need not modify the
default flags to create super-pages and avoid ONEMAPPING/NOSPLIT.
Reviewed by: alc, dougm, kib, markj
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D22119
only. Rename it swp_pager_meta_lookup. Stop checking for obj->type
== swap there and assert it instead. Make the caller responsible for
the obj->type check.
Move the meta_ctl 'pop' functionality to swap_pager_unswapped, the
only place that uses it, and assume obj->type == swap there too.
Assisted by: ota_j.email.ne.jp
Reviewed by: kib
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D22437
We currently have the per-domain partially populated reservation queues
and the per-domain queue locks. Define a new per-domain padded
structure to contain both of them. This puts the queue fields and lock
in the same cache line and avoids the false sharing within the old queue
array.
Also fix field packing in the reservation structure. In many places we
assume that a domain index fits in 8 bits, so we can do the same there
as well. This reduces the size of the structure by 8 bytes.
Update some comments while here. No functional change intended.
Reviewed by: dougm, kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22391
We are now out of aflags bits, whereas the "flags" field only makes use
of five of its sixteen bits, so narrow "flags" to eight bits. I have no
intention of adding a new aflag in the near future, but would like to
combine the aflags, queue and act_count fields into a single atomically
updated word. This will allow vm_page_pqstate_cmpset() to become much
simpler and is a step towards eliminating the use of the page lock array
in updating per-page queue state.
The change modifies the layout of struct vm_page, so bump
__FreeBSD_version.
Reviewed by: alc, dougm, jeff, kib
Sponsored by: Netflix, Intel
Differential Revision: https://reviews.freebsd.org/D22397
entries are stabilized, repeatedly verifies the same entry. Check each
entry in turn.
Reviewed by: kib (code only), alc
Tested by: pho
MFC after: 7 days
Differential Revision: https://reviews.freebsd.org/D22405
around entry->{next,prev} when those are used for ordered list
traversal, and use those wrapper functions everywhere. Where the next
field is used for maintaining a stack of deferred operations, #define
defer_next to make that different usage clearer, and then use the
'right' pointer instead of 'next' for that purpose.
Approved by: markj
Tested by: pho (as part of a larger patch)
Differential Revision: https://reviews.freebsd.org/D22347
exploits the sparsity of allocated blocks in a range, without
issuing an "are you there?" query for every block in the range.
swap_pager_copy() is not so smart. Modify the implementation
of swap_pager_meta_free() slightly so that swap_pager_copy()
can use that smarter implementation too.
Based on an observation of: Yoshihiro Ota (ota_j.email.ne.jp)
Reviewed by: kib,alc
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D22280
The r354367 is reverted since it is subsumed by this, more complete, approach.
Suggested by: markj
Reviewed by: alc. glebius, markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D22242
consistency checking slows performance dramatically. This change
reduces the number of assertions checked by completely walking the
vm_map tree only when the write-lock is released, and only then if the
number of modifications to the tree since the last walk exceeds the
number of tree nodes.
Reviewed by: alc, kib
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D22163
Before the page busy code was converted to make direct use of
sleepqueues, this was handled by _sleep().
Reported by: glebius
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Since r354156 we may call release_page() without the page's object lock
held, specifically following the page copy during a CoW fault.
release_page() must therefore unbusy the page only after scheduling the
requeue, to avoid racing with a free of the page. Previously, the
object lock prevented this race from occurring.
Add some assertions that were helpful in tracking this down.
Reported by: pho, syzkaller
Tested by: pho
Reviewed by: alc, jeff, kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22234
Early counter mock can be only used on BSP for amd64, when APs try to
update it that causes random memory corruption.
N.B. This is a temporary patch to plug the corruption for now, while
a proper solution for handling cache zones in zone_foreach() is being
developed.
In collaboration with: pho
Reviewed by: markj
Sponsored by: The FreeBSD Foundation, Mellanox Technologies
flag and use the same system.
This enables further fault locking improvements by allowing more faults to
proceed with a shared lock.
Reviewed by: kib
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D22116
Certain consumers still need to guarantee a stable reference so we can not
switch entirely to atomics yet. Exclusive lock holders can still modify
and examine the refcount without using the ref api.
Reviewed by: kib
Tested by: pho
Sponsored by: Netflix, Intel
Differential Revision: https://reviews.freebsd.org/D21598
When it is set to 0 (the default), a heavy Netflix-style web workload
suffers from heavy lock contention on the vm page free queue called from
vm_page_zone_{import,release}() as the buckets are frequently drained.
When setting the maxcache, this contention goes away.
We should eventually try to autotune this, as well as make this
zone eligable for uma_reclaim().
Reviewed by: alc, markj
Not Objected to by: jeff
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D22112
r353890 introduced a case where we may call release_page() with
fs.m == NULL, since the fault handler may now lock the vnode prior
to allocating a page for a page-in.
Reported by: jhb
Reviewed by: kib
MFC with: r353890
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22120
We now assert that a page is busy when updating its validity-tracking
state, but bogus_page is not busied during a getpages operation.
Reported by: syzkaller
Reviewed by: alc, kib
Discussed with: jeff
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22124
A caller that does not guarantee that a page's identity won't change
while sleeping for a busy lock must specify either NOWAIT or WAITFAIL.
Reported by: syzkaller
Reviewed by: alc, kib
Discussed with: jeff
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22124
except for filesystems that set the MNTK_VMSETSIZE_BUG, Set the flag for ZFS.
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D21883
The flag specifies that vm_fault() handler should check the vnode'
vm_object size under the vnode lock. It is converted into the object'
OBJ_SIZEVNLOCK flag in vnode_pager_alloc().
Tested by: pho
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D21883
The correctness of per-CPU cache accounting in that function is
dependent on reading per-CPU pointers exactly once. Ensure that
the compiler does not emit multiple loads of those pointers.
Reported and tested by: pho
Reviewed by: kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22081
In low memory conditions a significant number of pages may end up stuck
in the caches, and currently these caches cannot be reaped, leading to
spurious memory allocation failures and OOM kills. So:
- Take into account the fact that we may cache up to two full buckets
of pages per CPU, not just one.
- Increase the amount of RAM required per CPU to enable the caches.
This is a temporary measure until the page cache management policy is
improved.
PR: 241048
Reported and tested by: Kevin Oberman <rkoberman@gmail.com>
Reviewed by: alc, kib
Discussed with: jeff
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22040
With an upcoming change the amd64 kernel will map preloaded files RW
instead of RWX, so the kernel linker must adjust protections
appropriately using pmap_change_prot().
Reviewed by: kib
MFC after: 1 month
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D21860
After r352110 the page lock no longer protects a page's identity, so
there is no purpose in locking the page in pmap_mincore(). Instead,
if vm.mincore_mapped is set to the non-default value of 0, re-lookup
the page after acquiring its object lock, which holds the page's
identity stable.
The change removes the last callers of vm_page_pa_tryrelock(), so
remove it.
Reviewed by: kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D21823
Atomics are used for page busy and valid state when the shared busy is
held. The details of the locking protocol and valid and dirty
synchronization are in the updated vm_page.h comments.
Reviewed by: kib, markj
Tested by: pho
Sponsored by: Netflix, Intel
Differential Revision: https://reviews.freebsd.org/D21594
busy acquires while held.
This allows code that would need to acquire and release a very large number
of page busy locks to use the old mechanism where busy is only checked and
not held. This comes at the cost of false positives but never false
negatives which the single consumer, vm_fault_soft_fast(), handles.
Reviewed by: kib
Tested by: pho
Sponsored by: Netflix, Intel
Differential Revision: https://reviews.freebsd.org/D21592
This persists busy state across operations like rename and replace.
Reviewed by: kib, markj
Tested by: pho
Sponsored by: Netflix, Intel
Differential Revision: https://reviews.freebsd.org/D21549
This is the first in a series of patches that promotes the page busy field
to a first class lock that no longer requires the object lock for
consistency.
Reviewed by: kib, markj
Tested by: pho
Sponsored by: Netflix, Intel
Differential Revision: https://reviews.freebsd.org/D21548
vm_map_free_{left,right} rather than re-implementing them. Use the
VM_MAP_FOREACH macro where applicable. Fix some indentation.
Suggested by: kib (in a comment on D21964)
Tested by: pho (as part of D21964)
Differential Revision: https://reviews.freebsd.org/D22011
Based on POWER9BSD implementation, with all POWER9 specific code removed and
addition of new methods in PPC64 MMU interface, to isolate platform specific
code. Currently, the new methods are implemented on pseries and PowerNV
(D21643).
Reviewed by: jhibbits
Differential Revision: https://reviews.freebsd.org/D21551
The TDP_NOFAULTING flag should be checked in vm_fault(), not in
vm_fault_trap(). Otherwise kernel accesses to userspace, like
vn_io_fault(), enter vm locking when it should not.
Reported and tested by: pho
Reviewed by: alc, markj
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Differential revision: https://reviews.freebsd.org/D21992
cy@ points out that I got parameter order backwards between definition and
invocation of the helper function. He is totally correct. The earlier
version of this patch predated the XFree column so this is one I introduced,
rather than the original author.
Submitted by: cy
Reported by: cy
X-MFC-With: r353429
Add /i option for machine-parseable CSV output. This allows ready copy/
pasting into more sophisticated tooling outside of DDB.
Add total zone size ("Memory Use") as a new column for UMA.
For both, sort the displayed list on size (print the largest zones/types
first). This is handy for quickly diagnosing "where has my memory gone?" at
a high level.
Submitted by: Emily Pettigrew <Emily.Pettigrew AT isilon.com> (earlier version)
Sponsored by: Dell EMC Isilon