A concurrent unlocked lookup can wire the page after
vm_page_release_locked() releases the last wiring, in which case
vm_page_release_locked() must not free the page. Once the xbusy lock is
acquired, that, the object lock and the fact that the page is unmapped
ensure that the wire count cannot increase, so re-check for new wirings
after the page is xbusied.
Update the comment above vm_page_wired() to reflect the new
synchronization rules.
Reported by: glebius
Reviewed by: alc, jeff, kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D24592
vm_page_acquire_unlocked() relies on type-stability of vm_page
structures and assumes that the listq linkage pointers always point to a
vm_page or are NULL. QUEUE_MACRO_DEBUG_TRASH breaks that assumption, so
add an explicit check for a trashed queue pointer before dereferencing.
Reported and tested by: pho
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D24472
lookup pages. These variants will fall back to their locked counterparts
if the page is not present.
Discussed with: kib, markj
Differential Revision: https://reviews.freebsd.org/D23449
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.
This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.
Mark all obvious cases as MPSAFE. All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT
Approved by: kib (mentor, blanket)
Commented by: kib, gallatin, melifaro
Differential Revision: https://reviews.freebsd.org/D23718
virtual address or physical page allocation need to be marked with this
flag.
Reviewed by: markj
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D23712
After sleeping through a memory shortage, we must return NULL rather
than retry.
Discussed with: jeff
Reported by: pho
Sponsored by: The FreeBSD Foundation
potential bugs that access freed pages as well as providing a path
towards lockless page lookup.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D23444
Update vm_page_scan_contig() and vm_page_reclaim_run() to stop using
vm_page_change_lock(). It has no use after r356157. Remove
vm_page_change_lock() now that it has no users.
Remove an unncessary check for wirings in vm_page_scan_contig(), which
was previously checking twice. The check is racy until
vm_page_reclaim_run() ensures that the page is unmapped, so one check is
sufficient.
Reviewed by: jeff, kib (previous versions)
Tested by: pho (previous version)
Differential Revision: https://reviews.freebsd.org/D23279
The vnode pager does not want the object lock held. Moving this out allows
further object lock scope reduction in callers. While here add some missing
paging in progress calls and an assert. The object handle is now protected
explicitly with pip.
Reviewed by: kib, markj
Differential Revision: https://reviews.freebsd.org/D23033
ordering to allocate early pages in the same way boot pages were but only
as needed. After the KVA allocator has started up we allocate the KVA that
we consumed during boot. This also makes the boot pages freeable since they
have vm_page structures allocated with the rest of memory.
Parts of this patch were written and tested by markj.
Reviewed by: glebius, markj
Differential Revision: https://reviews.freebsd.org/D23102
respectively. The tunable controls how big is the size of per-cpu
vm page cache. Previously the value was split for all CPUs in system,
so configuring same value on machines with different count of CPUs
yielded in different cache size available to a particular CPU.
Reviewed by: markj
Obtained from: Netflix
MD_UMA_SMALL_ALLOC. This is unusual but not impossible. Fix the alignemnt
of zones while here. This was already correct because uz_cpu strongly
aligned the zone structure but the specified alignment did not match
reality and involved redundant defines.
Reviewed by: markj, rlibby
Differential Revision: https://reviews.freebsd.org/D23046
We now set PGA_DEQUEUE on a managed page when it is wired after
allocation, and vm_page_mvqueue() ignores pages with this flag set,
ensuring that they do not end up in the page queues. However, this is
not sufficient for managed fictitious pages or pages managed by the
TTM. In particular, the TTM makes use of the plinks.q queue linkage
fields for its own purposes.
PR: 242961
Reported and tested by: Greg V <greg@unrelenting.technology>
This fixes a regression in r356155, introduced at the last minute. In
particular, we must clear PGA_REQUEUE_HEAD before inserting into any
queue besides PQ_INACTIVE since that operation is implemented only for
PQ_INACTIVE.
Reported by: pho, Jenkins via lwhsu
The previous series of patches orphaned some vm_page functions, so
remove them.
Reviewed by: dougm, kib
Sponsored by: Netflix, Intel
Differential Revision: https://reviews.freebsd.org/D22886
With the previous reviews, the page lock is no longer required in order
to perform queue operations on a page. It is also no longer needed in
the page queue scans. This change effectively eliminates remaining uses
of the page lock and also the false sharing caused by multiple pages
sharing a page lock.
Reviewed by: jeff
Tested by: pho
Sponsored by: Netflix, Intel
Differential Revision: https://reviews.freebsd.org/D22885
Some recent work aims to remove the use of the page lock for
synchronizing updates to page queue state. This change adds a mechanism
to preserve the existing behaviour of lazily dequeuing wired pages,
which was previously synchronized using the page lock.
Handle this by setting PGA_DEQUEUE when a managed page's wire count
transitions from 0 to 1. When the page daemon encounters a page with a
flag in PGA_QUEUE_OP_MASK set, it creates a batch queue entry for that
page, but in so doing it does not modify the page itself and thus racing
with a concurrent free of the page is harmless. The flag is advisory;
the page daemon still checks for wirings after acquiring the object and
page xbusy locks.
vm_page_unwire_managed() now clears PGA_DEQUEUE on a 1->0 transition.
It must do this before dropping the reference to avoid a use-after-free
but also handles races with concurrent wirings to ensure that
PGA_DEQUEUE is not left unset on a wired page.
Reviewed by: jeff
Tested by: pho
Sponsored by: Netflix, Intel
Differential Revision: https://reviews.freebsd.org/D22882
This is in preparation for eliminating the use of the vm_page lock for
protecting queue state operations.
Introduce the vm_page_pqstate_commit_*() functions. These functions act
as helpers around vm_page_astate_fcmpset() and are specialized for
specific types of operations. vm_page_pqstate_commit() wraps these
functions.
Convert a number of routines to use these new helpers. Use
vm_page_release_toq() in vm_page_unwire() and vm_page_release() to
atomically release a wiring reference and release the page into a queue.
This has the side effect that vm_page_unwire() will leave the page in
the active queue if it is already present there.
Convert the page queue scans to use the new helpers. Simplify
vm_pageout_reinsert_inactive(), which requeues pages that were found to
be busy during an inactive queue scan, to avoid duplicating the work of
vm_pqbatch_process_page(). In particular, if PGA_REQUEUE or
PGA_REQUEUE_HEAD is set, let that be handled during batch processing.
Reviewed by: jeff
Tested by: pho
Sponsored by: Netflix, Intel
Differential Revision: https://reviews.freebsd.org/D22770
Differential Revision: https://reviews.freebsd.org/D22771
Differential Revision: https://reviews.freebsd.org/D22772
Differential Revision: https://reviews.freebsd.org/D22773
Differential Revision: https://reviews.freebsd.org/D22776
allocate them with VM_ALLOC_NOOBJ which means they are not busy. For now
move the busy assert for the new page in vm_page_replace into the public
api and out of the private api used by contig reclaim. Fix another issue
where we would leak busy if the page could not be removed from pmap.
Reported by: pho
Discussed with: markj
removed from objects including calls to free. Pages must not be xbusy
when freed and not on an object. Strengthen assertions to match these
expectations. In practice very little code had to change busy handling
to meet these rules but we can now make stronger guarantees to busy
holders and avoid conditionally dropping busy in free.
Refine vm_page_remove() and vm_page_replace() semantics now that we have
stronger guarantees about busy state. This removes redundant and
potentially problematic code that has proliferated.
Discussed with: markj
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D22822
When allocating a replacement page we must clear VPO_UNMANAGED since we
only ever reclaim pages from managed objects. vm_page_replace() does
not handle this for us.
Sprinkle some assertions to help catch this sort of issue.
Reported by: pho
Reviewed by: alc, kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22868
an exclusive object lock.
Previously swap space was freed on a best effort basis when a page that
had valid swap was dirtied, thus invalidating the swap copy. This may be
done inconsistently and requires the object lock which is not always
convenient.
Instead, track when swap space is present. The first dirty is responsible
for deleting space or setting PGA_SWAP_FREE which will trigger background
scans to free the swap space.
Simplify the locking in vm_fault_dirty() now that we can reliably identify
the first dirty.
Discussed with: alc, kib, markj
Differential Revision: https://reviews.freebsd.org/D22654
exec_map_first_page(). This will also enable pagein clustering for other
interested consumers (tmpfs, md, etc).
Discussed with: alc
Approved by: kib
Differential Revision: https://reviews.freebsd.org/D22731
This is a 32-bit structure embedded in each vm_page, consisting mostly
of page queue state. The use of a structure makes it easy to store a
snapshot of a page's queue state in a stack variable and use cmpset
loops to update that state without requiring the page lock.
This change merely adds the structure and updates references to atomic
state fields. No functional change intended.
Reviewed by: alc, jeff, kib
Sponsored by: Netflix, Intel
Differential Revision: https://reviews.freebsd.org/D22650
Summary:
This matches r351198 from amd64. This only applies to AIM64 and Book-E.
On AIM64 it short-circuits with one domain, to behave similar to
existing. Otherwise it will allocate 16MB huge pages to hold the page
array, across all NUMA domains. On the first domain it will shift the
page array base up, to "upper-align" the page array in that domain, so
as to reduce the number of pages from the next domain appearing in this
domain. After the first domain, subsequent domains will be allocated in
full 16MB pages, until the final domain, which can be short. This means
some inner domains may have pages accounted in earlier domains.
On Book-E the page array is setup at MMU bootstrap time so that it's
always mapped in TLB1, on both 32-bit and 64-bit. This reduces the TLB0
overhead for touching the vm_page_array, which reduces up to one TLB
miss per array access.
Since page_range (vm_page_startup()) is no longer used on Book-E but is on
32-bit AIM, mark the variable as potentially unused, rather than using a
nasty #if defined() list.
Reviewed by: luporl
Differential Revision: https://reviews.freebsd.org/D21449
Record as much bits from curthread into busy_lock as fits. Low bits
for struct thread * representation are zero due to struct and zone
alignment, and they leave space for busy flags (perhaps except
statically allocated thread0). Upper bits are not very interesting
for assert, and in most practical situations recorded value should
allow to manually identify the owner with certainity.
Assert that unbusy is performed by the owner, except few places where
unbusy is done in io completion handler. For this case, add
_unchecked variants of asserts and unbusy primitives.
Reviewed by: markj (previous version)
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D22298
- Remove the cnt == 1 check. UMA passes cnt == 1 when it has disabled
per-CPU caching. In this case we might as well just allocate a single
page and return it to the caller, since the caller is going to do
exactly that anyway if the UMA cache allocation attempt fails.
- Don't replenish caches if the domain is severely short on free pages.
With large buckets we may otherwise quickly exacerbate a situation
where the page daemon is failing to keep up.
- Don't replenish caches if the calling thread belongs to the page
daemon, which should avoid creating extra memory pressure when it is
trying to free memory. Virtually all such allocations while occur in
the context of laundering, where the laundry thread must allocate
slabs for various swap and I/O-related UMA zones.
Reviewed by: kib
Discussed with: alc, jeff
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22394
In r353734 the use of the page caches was limited to systems with a
relatively large amount of RAM per CPU. This was to mitigate some
issues reported with the system not able to keep up with memory pressure
in cases where it had been able to do so prior to the addition of the
direct free pool cache. This change re-enables those caches.
The change modifies uma_zone_set_maxcache(), which was introduced
specifically for the page cache zones. Rather than using it to limit
only the full bucket cache, have it also set uz_count_max to provide an
upper bound on the per-CPU cache size that is consistent with the number
of items requested. Remove its return value since it has no use.
Enable the page cache zones unconditionally, and limit them to 0.1% of
the domain's pages. The limit can be overridden by the
vm.pgcache_zone_max tunable as before.
Change the item size parameter passed to uma_zcache_create() to the
correct size, and stop setting UMA_ZONE_MAXBUCKET. This allows the page
cache buckets to be adaptively sized, like the rest of UMA's caches.
This also causes the initial bucket size to be small, so only systems
which benefit from large caches will get them.
Reviewed by: gallatin, jeff
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22393