The macro RB_INITIALIZER ignores its argument, but is documented to
require "&head" as argument to initialize "head". So using
"_vm_phys_fictitious_tree" as the argument to initialize
"vm_phys_fictitious_tree" is an inconsequential error, corrected here.
Discussed with: alc
vm_page_remove() rather than !vm_page_wired() as the condition for free.
When this changed back to wired the busy lock was leaked.
Reported by: pho
Reviewed by: markj
removed from objects including calls to free. Pages must not be xbusy
when freed and not on an object. Strengthen assertions to match these
expectations. In practice very little code had to change busy handling
to meet these rules but we can now make stronger guarantees to busy
holders and avoid conditionally dropping busy in free.
Refine vm_page_remove() and vm_page_replace() semantics now that we have
stronger guarantees about busy state. This removes redundant and
potentially problematic code that has proliferated.
Discussed with: markj
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D22822
When allocating a replacement page we must clear VPO_UNMANAGED since we
only ever reclaim pages from managed objects. vm_page_replace() does
not handle this for us.
Sprinkle some assertions to help catch this sort of issue.
Reported by: pho
Reviewed by: alc, kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22868
Eliminate recursion from most thread_lock consumers. Return from
sched_add() without the thread_lock held. This eliminates unnecessary
atomics and lock word loads as well as reducing the hold time for
scheduler locks. This will eventually allow for lockless remote adds.
Discussed with: kib
Reviewed by: jhb
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D22626
that if fault fails to progress and needs to restart the loop it must free
the page it is working on and allocate again on restart. Resolve the few
places that need to be modified to support this condition and simply
deactivate the page. Presently, we only permit this when fault restarts
for busy contention. This has an added benefit of removing some object
trylocking in this case.
While here consolidate some page cleanup logic into fault_page_free() and
fault_page_release() to reduce redundant code and automate some teardown.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D22653
an exclusive object lock.
Previously swap space was freed on a best effort basis when a page that
had valid swap was dirtied, thus invalidating the swap copy. This may be
done inconsistently and requires the object lock which is not always
convenient.
Instead, track when swap space is present. The first dirty is responsible
for deleting space or setting PGA_SWAP_FREE which will trigger background
scans to free the swap space.
Simplify the locking in vm_fault_dirty() now that we can reliably identify
the first dirty.
Discussed with: alc, kib, markj
Differential Revision: https://reviews.freebsd.org/D22654
require the object lock to synchronize collapse. Other swap objects such
as tmpfs do not.
Reported by: mjg
Reviewed by: kib, markj
Differential Revision: https://reviews.freebsd.org/D22747
exec_map_first_page(). This will also enable pagein clustering for other
interested consumers (tmpfs, md, etc).
Discussed with: alc
Approved by: kib
Differential Revision: https://reviews.freebsd.org/D22731
Recently (r355315) the size of the struct uma_slab bitset field us_free
became dynamic instead of conservative. Now, make the debug bitset
size dynamic too. The debug bitset is INVARIANTS-only, so in fact we
don't care too much about the space savings that results from this, but
enabling minimally-sized slabs on INVARIANTS builds is still important
in order to be able to test new slab layouts effectively.
Reviewed by: jeff (previous version), markj (previous version)
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D22759
uma_startup2() sets booted = BOOT_BUCKETS after calling bucket_init(),
but before that assignment, startup_alloc() will use pages from the
reserved pool, so the bucket zones themselves are still allocated using
startup pages.
Reviewed by: rlibby
Reported by: Jenkins via lwhsu
Differential Revision: https://reviews.freebsd.org/D22797
This helps with a bootstrapping problem in upcoming work.
We don't first enable buckets until uma_startup2(), so we can delay
bucket creation until then. The other two paths to bucket_enable() are
both later, one in the pageout daemon (SI_SUB_KTHREAD_PAGE vs SI_SUB_VM)
and one in uma_timeout() (first activated in uma_startup3()). Note that
although some bucket functions are accessible before uma_startup2()
(e.g. bucket_select() in zone_ctor()), none of them inspect ubz_zone.
Discussed with: jeff
Reviewed by: markj
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D22765
Recently (r355315) the size of the struct uma_slab bitset field us_free
became dynamic instead of conservative. Now, make the debug bitset
size dynamic too. The debug bitset is INVARIANTS-only, so in fact we
don't care too much about the space savings that results from this, but
enabling minimally-sized slabs on INVARIANTS builds is still important
in order to be able to test new slab layouts effectively.
Reviewed by: jeff, markj
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D22759
Introduce primitives vm_page_astate_load() and vm_page_astate_fcmpset()
to operate on the 32-bit per-page atomic state. Modify
vm_page_pqstate_fcmpset() to use them. No functional change intended.
Introduce PGA_QUEUE_OP_MASK, a subset of PGA_QUEUE_STATE_MASK that only
includes queue operation flags. This will be used in subsequent
patches.
Reviewed by: alc, jeff, kib
Sponsored by: Netflix, Intel
Differential Revision: https://reviews.freebsd.org/D22753
vm_swapout_object_deactivate_pages() is renamed to
vm_swapout_object_deactivate(), and the loop body is moved into the new
vm_swapout_object_deactivate_page(). This makes the code a bit easier
to follow and is in preparation for some functional changes.
Reviewed by: jeff, kib
Sponsored by: Netflix, Intel
Differential Revision: https://reviews.freebsd.org/D22651
This is a 32-bit structure embedded in each vm_page, consisting mostly
of page queue state. The use of a structure makes it easy to store a
snapshot of a page's queue state in a stack variable and use cmpset
loops to update that state without requiring the page lock.
This change merely adds the structure and updates references to atomic
state fields. No functional change intended.
Reviewed by: alc, jeff, kib
Sponsored by: Netflix, Intel
Differential Revision: https://reviews.freebsd.org/D22650
to its successor in cases where examining a map entry requires a
helper like kvm_read_all. Use that method, with kvm_read_all, to fix
procstat_getfiles_kvm, which tries to find the successor now without
using such a helper. This addresses a problem introduced by r355491.
Reviewed by: markj (previous version)
Discussed with: kib
Differential Revision: https://reviews.freebsd.org/D22728
The current vnode layout is not smp-friendly by having frequently read data
avoidably sharing cachelines with very frequently modified fields. In
particular v_iflag inspected for VI_DOOMED can be found in the same line with
v_usecount. Instead make it available in the same cacheline as the v_op, v_data
and v_type which all get read all the time.
v_type is avoidably 4 bytes while the necessary data will easily fit in 1.
Shrinking it frees up 3 bytes, 2 of which get used here to introduce a new
flag field with a new value: VIRF_DOOMED.
Reviewed by: kib, jeff
Differential Revision: https://reviews.freebsd.org/D22715
zones would never be freed. In the case of tmpfs this was not true. While
here test for the right bit to disable the keg related sysctls for zones
that don't have kegs.
Reported by: pho
Reviewed by: rlibby
Differential Revision: https://reviews.freebsd.org/D22655
A valid reference is all that is required. If we race with a deallocation
we will harmlessly misidentify the type of an already dead object.
Reviewed by: kib, markj
Differential Revision: https://reviews.freebsd.org/D22636
embedded slabs but also is an opportunity to tidy up code and add
accessor inlines.
Reviewed by: markj, rlibby
Differential Revision: https://reviews.freebsd.org/D22609
space. Where the vm_map tree now has null pointers, store pointers to
next and previous entries in right and left fields, making the binary
tree threaded. Have the predecessor and successor functions compute
what the prev and next fields previously stored.
Reviewed by: markj, kib (previous version)
Tested by: pho (previous version)
Differential Revision: https://reviews.freebsd.org/D21964
Summary:
This matches r351198 from amd64. This only applies to AIM64 and Book-E.
On AIM64 it short-circuits with one domain, to behave similar to
existing. Otherwise it will allocate 16MB huge pages to hold the page
array, across all NUMA domains. On the first domain it will shift the
page array base up, to "upper-align" the page array in that domain, so
as to reduce the number of pages from the next domain appearing in this
domain. After the first domain, subsequent domains will be allocated in
full 16MB pages, until the final domain, which can be short. This means
some inner domains may have pages accounted in earlier domains.
On Book-E the page array is setup at MMU bootstrap time so that it's
always mapped in TLB1, on both 32-bit and 64-bit. This reduces the TLB0
overhead for touching the vm_page_array, which reduces up to one TLB
miss per array access.
Since page_range (vm_page_startup()) is no longer used on Book-E but is on
32-bit AIM, mark the variable as potentially unused, rather than using a
nasty #if defined() list.
Reviewed by: luporl
Differential Revision: https://reviews.freebsd.org/D21449
Suppose that the map entry is wired, so that we later assign
fault_type = entry->protection. Suppose further that we jump back to
RetryLookup. Then fault_type will no longer contain the original
fault protection mask, but instead that of the wired entry.
Submitted by: Wuyang Chung <wuyang.chung1@gmail.com>
Reviewed by: kib
MFC after: 3 days
Github PR: https://github.com/freebsd/freebsd/pull/419
Differential Revision: https://reviews.freebsd.org/D22683
If the starting pindex is equal to object->size, there is nothing to do.
This was harmless since the rest of vm_map_pmap_enter() has no effect
when psize == 0.
Submitted by: Wuyang Chung <wuyang.chung1@gmail.com>
Reviewed by: alc, dougm, kib
MFC after: 1 week
Github PR: https://github.com/freebsd/freebsd/pull/417
Differential Revision: https://reviews.freebsd.org/D22678
These are cast to uma_import and uma_release functions. Use the signature
for these in the zone functions.
This was found with an experimental Kernel CFI. It will complain if the
signature is different than what a function pointer expects. The
simplest way to fix these is to correct the signature.
Reviewed by: rlibby
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D22671
tightening constraints on busy as a precursor to lockless page lookup and
should largely be a NOP for these cases.
Reviewed by: alc, kib, markj
Differential Revision: https://reviews.freebsd.org/D22611
The handle value is stable for all shadow objects in the inheritance
chain. This allows to avoid descending the shadow chain to get to the
bottom of it in vm_map_entry_set_vnode_text(), and eliminate
corresponding object relocking which appeared to be contending.
Change vm_object_allocate_anon() and vm_object_shadow() to handle more
of the cred/charge initialization for the new shadow object, in
addition to set up the handle.
Reported by: jeff
Reviewed by: alc (previous version), jeff (previous version)
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differrential revision: https://reviews.freebsd.org/D22541
broken in r355082. Reduce some locking in nearby related object type
checks.
Reviewed by: kib, markj
Differential Revision: https://reviews.freebsd.org/D22565
unset until the object is recycled so this check is stable. Now that we
can acquire the ref without a lock it is not necessary to group these
operations and we can avoid it entirely in many cases.
Reviewed by: kib, markj
Differential Revision: https://reviews.freebsd.org/D22565
(e.g. root->left = NULL) to affect the behavior of that function. This
change stops that data manipulation, and instead calls a pair of
functions, one for the left direction and the other for the right,
with the function called depending whether or not we currently null
the root child in that direction to control the behavior of
vm_map_splay_merge.
Reviewed by: kib
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D22589
union members in vm_page.h to store the zone and slab. Remove some nearby
dead code.
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D22564