This is a 32-bit structure embedded in each vm_page, consisting mostly
of page queue state. The use of a structure makes it easy to store a
snapshot of a page's queue state in a stack variable and use cmpset
loops to update that state without requiring the page lock.
This change merely adds the structure and updates references to atomic
state fields. No functional change intended.
Reviewed by: alc, jeff, kib
Sponsored by: Netflix, Intel
Differential Revision: https://reviews.freebsd.org/D22650
to its successor in cases where examining a map entry requires a
helper like kvm_read_all. Use that method, with kvm_read_all, to fix
procstat_getfiles_kvm, which tries to find the successor now without
using such a helper. This addresses a problem introduced by r355491.
Reviewed by: markj (previous version)
Discussed with: kib
Differential Revision: https://reviews.freebsd.org/D22728
The current vnode layout is not smp-friendly by having frequently read data
avoidably sharing cachelines with very frequently modified fields. In
particular v_iflag inspected for VI_DOOMED can be found in the same line with
v_usecount. Instead make it available in the same cacheline as the v_op, v_data
and v_type which all get read all the time.
v_type is avoidably 4 bytes while the necessary data will easily fit in 1.
Shrinking it frees up 3 bytes, 2 of which get used here to introduce a new
flag field with a new value: VIRF_DOOMED.
Reviewed by: kib, jeff
Differential Revision: https://reviews.freebsd.org/D22715
zones would never be freed. In the case of tmpfs this was not true. While
here test for the right bit to disable the keg related sysctls for zones
that don't have kegs.
Reported by: pho
Reviewed by: rlibby
Differential Revision: https://reviews.freebsd.org/D22655
A valid reference is all that is required. If we race with a deallocation
we will harmlessly misidentify the type of an already dead object.
Reviewed by: kib, markj
Differential Revision: https://reviews.freebsd.org/D22636
embedded slabs but also is an opportunity to tidy up code and add
accessor inlines.
Reviewed by: markj, rlibby
Differential Revision: https://reviews.freebsd.org/D22609
space. Where the vm_map tree now has null pointers, store pointers to
next and previous entries in right and left fields, making the binary
tree threaded. Have the predecessor and successor functions compute
what the prev and next fields previously stored.
Reviewed by: markj, kib (previous version)
Tested by: pho (previous version)
Differential Revision: https://reviews.freebsd.org/D21964
Summary:
This matches r351198 from amd64. This only applies to AIM64 and Book-E.
On AIM64 it short-circuits with one domain, to behave similar to
existing. Otherwise it will allocate 16MB huge pages to hold the page
array, across all NUMA domains. On the first domain it will shift the
page array base up, to "upper-align" the page array in that domain, so
as to reduce the number of pages from the next domain appearing in this
domain. After the first domain, subsequent domains will be allocated in
full 16MB pages, until the final domain, which can be short. This means
some inner domains may have pages accounted in earlier domains.
On Book-E the page array is setup at MMU bootstrap time so that it's
always mapped in TLB1, on both 32-bit and 64-bit. This reduces the TLB0
overhead for touching the vm_page_array, which reduces up to one TLB
miss per array access.
Since page_range (vm_page_startup()) is no longer used on Book-E but is on
32-bit AIM, mark the variable as potentially unused, rather than using a
nasty #if defined() list.
Reviewed by: luporl
Differential Revision: https://reviews.freebsd.org/D21449
Suppose that the map entry is wired, so that we later assign
fault_type = entry->protection. Suppose further that we jump back to
RetryLookup. Then fault_type will no longer contain the original
fault protection mask, but instead that of the wired entry.
Submitted by: Wuyang Chung <wuyang.chung1@gmail.com>
Reviewed by: kib
MFC after: 3 days
Github PR: https://github.com/freebsd/freebsd/pull/419
Differential Revision: https://reviews.freebsd.org/D22683
If the starting pindex is equal to object->size, there is nothing to do.
This was harmless since the rest of vm_map_pmap_enter() has no effect
when psize == 0.
Submitted by: Wuyang Chung <wuyang.chung1@gmail.com>
Reviewed by: alc, dougm, kib
MFC after: 1 week
Github PR: https://github.com/freebsd/freebsd/pull/417
Differential Revision: https://reviews.freebsd.org/D22678
These are cast to uma_import and uma_release functions. Use the signature
for these in the zone functions.
This was found with an experimental Kernel CFI. It will complain if the
signature is different than what a function pointer expects. The
simplest way to fix these is to correct the signature.
Reviewed by: rlibby
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D22671
tightening constraints on busy as a precursor to lockless page lookup and
should largely be a NOP for these cases.
Reviewed by: alc, kib, markj
Differential Revision: https://reviews.freebsd.org/D22611
The handle value is stable for all shadow objects in the inheritance
chain. This allows to avoid descending the shadow chain to get to the
bottom of it in vm_map_entry_set_vnode_text(), and eliminate
corresponding object relocking which appeared to be contending.
Change vm_object_allocate_anon() and vm_object_shadow() to handle more
of the cred/charge initialization for the new shadow object, in
addition to set up the handle.
Reported by: jeff
Reviewed by: alc (previous version), jeff (previous version)
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differrential revision: https://reviews.freebsd.org/D22541
broken in r355082. Reduce some locking in nearby related object type
checks.
Reviewed by: kib, markj
Differential Revision: https://reviews.freebsd.org/D22565
unset until the object is recycled so this check is stable. Now that we
can acquire the ref without a lock it is not necessary to group these
operations and we can avoid it entirely in many cases.
Reviewed by: kib, markj
Differential Revision: https://reviews.freebsd.org/D22565
(e.g. root->left = NULL) to affect the behavior of that function. This
change stops that data manipulation, and instead calls a pair of
functions, one for the left direction and the other for the right,
with the function called depending whether or not we currently null
the root child in that direction to control the behavior of
vm_map_splay_merge.
Reviewed by: kib
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D22589
union members in vm_page.h to store the zone and slab. Remove some nearby
dead code.
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D22564
more statistcs than are exported via the ABI stable vmstat interface.
Rename uz_count to uz_bucket_size because even I was confused by the
name after returning to the source years later.
Reviewed by: rlibby
Differential Revision: https://reviews.freebsd.org/D22554
On INVARIANTS kernels, UMA has a use-after-free detection mechanism.
This mechanism previously required that all of the ctor/dtor/uminit/fini
arguments to uma_zcreate() be NULL in order to function. Now, it only
requires that uminit and fini be NULL; now, the trash ctor and dtor will
be called in addition to any supplied ctor or dtor.
Also do a little refactoring for readability of the resulting logic.
This enables use-after-free detection for more zones, and will allow for
simplification of some callers that worked around the previous
restriction (see kern_mbuf.c).
Reviewed by: jeff, markj
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D20722
omit the object lock if we are above a certain threshold. Hold only a
single vnode reference when the vnode object has any ref > 0. This
allows us to only lock the object and vnode on 0-1 and 1-0 transitions.
Differential Revision: https://reviews.freebsd.org/D22452
make sense after many partial refactors. Attempt to make a smaller cache
footprint for the fast path.
Reviewed by: markj, rlibby
Differential Revision: https://reviews.freebsd.org/D22470
Regression from r352174. In the vm_page_rename() failure case we forgot
to unlock the vm object locks before sleeping and reacquiring them.
Reviewed by: jeff
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D22542
'entry'. Where 'entry' is used to identify the starting point for
iteration, use 'first_entry'. These are the naming conventions used in
most of the vm_map.c code. Where VM_MAP_ENTRY_FOREACH can be used, do
so. Squeeze a few lines to fit in 80 columns. Where lines are being
modified for these reasons, look to remove style(9) violations.
Reviewed by: alc, markj
Differential Revision: https://reviews.freebsd.org/D22458
Note that the change in vm_object_collapse() is arguably a correctness
fix. We must not collapse into content-identity carrying objects.
Reviewed by: jeff
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D22467
Record as much bits from curthread into busy_lock as fits. Low bits
for struct thread * representation are zero due to struct and zone
alignment, and they leave space for busy flags (perhaps except
statically allocated thread0). Upper bits are not very interesting
for assert, and in most practical situations recorded value should
allow to manually identify the owner with certainity.
Assert that unbusy is performed by the owner, except few places where
unbusy is done in io completion handler. For this case, add
_unchecked variants of asserts and unbusy primitives.
Reviewed by: markj (previous version)
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D22298
Stop subtracting 1024/200 from vmd_page_count/200. I cannot see how
such precise accounting can make a difference on modern systems.
Add some explanation of what the page daemon does and how it handles
memory shortages.
Reviewed by: dougm
Discussed with: jeff, kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22396
Use the UMA reclaim thread to asynchronously drain all caches if
there is a severe shortage in a domain. Otherwise we only trigger UMA
reclamation every 10s even when the system has completely run out of
memory.
Stop entirely draining the caches when one domain falls below its min
threshold. In some workloads it is normal for one NUMA domain to end
up being nearly depleted by kernel memory allocations, for example for
the ZFS ARC. The domainset iterators skip domains below the
vmd_min_free theshold on the first iteration, so we should allow that
mechanism to limit further depletion of the domain's free pages before
taking the extreme step of calling uma_reclaim(UMA_RECLAIM_DRAIN_CPU).
Discussed with: jeff
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22395
- Remove the cnt == 1 check. UMA passes cnt == 1 when it has disabled
per-CPU caching. In this case we might as well just allocate a single
page and return it to the caller, since the caller is going to do
exactly that anyway if the UMA cache allocation attempt fails.
- Don't replenish caches if the domain is severely short on free pages.
With large buckets we may otherwise quickly exacerbate a situation
where the page daemon is failing to keep up.
- Don't replenish caches if the calling thread belongs to the page
daemon, which should avoid creating extra memory pressure when it is
trying to free memory. Virtually all such allocations while occur in
the context of laundering, where the laundry thread must allocate
slabs for various swap and I/O-related UMA zones.
Reviewed by: kib
Discussed with: alc, jeff
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22394
In r353734 the use of the page caches was limited to systems with a
relatively large amount of RAM per CPU. This was to mitigate some
issues reported with the system not able to keep up with memory pressure
in cases where it had been able to do so prior to the addition of the
direct free pool cache. This change re-enables those caches.
The change modifies uma_zone_set_maxcache(), which was introduced
specifically for the page cache zones. Rather than using it to limit
only the full bucket cache, have it also set uz_count_max to provide an
upper bound on the per-CPU cache size that is consistent with the number
of items requested. Remove its return value since it has no use.
Enable the page cache zones unconditionally, and limit them to 0.1% of
the domain's pages. The limit can be overridden by the
vm.pgcache_zone_max tunable as before.
Change the item size parameter passed to uma_zcache_create() to the
correct size, and stop setting UMA_ZONE_MAXBUCKET. This allows the page
cache buckets to be adaptively sized, like the rest of UMA's caches.
This also causes the initial bucket size to be small, so only systems
which benefit from large caches will get them.
Reviewed by: gallatin, jeff
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22393
We were not properly handling the case where the trylock of the
reservaton fails, in which case we could leak reservation lock.
Introduce a marker reservation to implement precise scanning in
vm_reserv_reclaim_contig(). Before, a race could result in early
termination of the scan in rare situations. Use the marker's lock to
serialize scans of the partpop queue so that a global marker structure
can be used. Modify vm_reserv_reclaim_inactive() to handle the presence
of a marker while minimizing the hold time of domain-global locks.
Reviewed by: alc, jeff, kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22392
entry, when that entry has been seen already, keep the
already-looked-up value in a variable and use that instead of looking
it up again.
Approved by: alc, markj (earlier version), kib (earlier version)
Differential Revision: https://reviews.freebsd.org/D22348
so that we avoid the hashtables. The hashtable is now only required if
a zone is created with OFFPAGE specified initially, not internally. This
flag signals to UMA that it can't touch the allocated memory and so
can't store a slab pointer in the containing page.
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D22453