freebsd-dev

Author	SHA1	Message	Date
Scott Long	ffc568ba8b	Revert r362998, r326999 while a better compatibility strategy is devised.	2020-07-09 22:38:36 +00:00
Scott Long	b302c2e5c9	Migrate the feature of excluding RAM pages to use "excludelist" as its nomenclature. MFC after: 1 week	2020-07-07 20:33:11 +00:00
Konstantin Belousov	ee06cffcd2	vm_page_free_prep(): correct description of the required page and object state. Reviewed by: markj Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D25482	2020-06-27 02:31:39 +00:00
Mark Johnston	a9ea09e548	Re-check for wirings after busying the page in vm_page_release_locked(). A concurrent unlocked lookup can wire the page after vm_page_release_locked() releases the last wiring, in which case vm_page_release_locked() must not free the page. Once the xbusy lock is acquired, that, the object lock and the fact that the page is unmapped ensure that the wire count cannot increase, so re-check for new wirings after the page is xbusied. Update the comment above vm_page_wired() to reflect the new synchronization rules. Reported by: glebius Reviewed by: alc, jeff, kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D24592	2020-04-28 13:51:41 +00:00
Mark Johnston	70e68b19a4	Handle trashed queue pointers in vm_page_acquire_unlocked(). vm_page_acquire_unlocked() relies on type-stability of vm_page structures and assumes that the listq linkage pointers always point to a vm_page or are NULL. QUEUE_MACRO_DEBUG_TRASH breaks that assumption, so add an explicit check for a trashed queue pointer before dereferencing. Reported and tested by: pho Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D24472	2020-04-20 14:45:17 +00:00
Bryan Drewery	adc0388117	Remove dead code leftover from r331018. Sponsored by: Dell EMC	2020-03-31 01:12:53 +00:00
Konstantin Belousov	a7c55b3e1b	ddb show pginfo: print pages reference value in hex. It is more useful this way after the VPRC_ flags were introduced. Sponsored by: The FreeBSD Foundation	2020-03-28 12:21:52 +00:00
Jeff Roberson	d1105e9441	Check for busy or wired in vm_page_relookup(). Some callers will only keep a page wired and expect it to still be present. Reported by: delphij@FreeBSD.org Reviewed by: kib	2020-03-11 22:25:45 +00:00
Mark Johnston	d869a17e62	Use COUNTER_U64_DEFINE_EARLY() in places where it simplifies things. Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D23978	2020-03-06 19:10:00 +00:00
Mark Johnston	1ed42f6fdd	Avoid doubly wiring a newly allocated page in vm_page_grab_valid(). This fixes a regression from r358363. Reported by: manu, jbeich Tested by: jbeich	2020-03-01 22:09:11 +00:00
Jeff Roberson	6be21eb778	Provide a lock free alternative to resolve bogus pages. This is not likely to be much of a perf win, just a nice code simplification. Reviewed by: markj, kib Differential Revision: https://reviews.freebsd.org/D23866	2020-02-28 21:42:48 +00:00
Jeff Roberson	3f39f80ab3	Support the NOCREAT flag for grab_valid_unlocked. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D23865	2020-02-28 20:32:35 +00:00
Jeff Roberson	c49be4f1c6	Add unlocked grab* function variants that use lockless radix code to lookup pages. These variants will fall back to their locked counterparts if the page is not present. Discussed with: kib, markj Differential Revision: https://reviews.freebsd.org/D23449	2020-02-27 02:37:27 +00:00
Pawel Biernacki	7029da5c36	Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many) r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly marked). Use it in preparation for a general review of all nodes. This is non-functional change that adds annotations to SYSCTL_NODE and SYSCTL_PROC nodes using one of the soon-to-be-required flags. Mark all obvious cases as MPSAFE. All entries that haven't been marked as MPSAFE before are by default marked as NEEDGIANT Approved by: kib (mentor, blanket) Commented by: kib, gallatin, melifaro Differential Revision: https://reviews.freebsd.org/D23718	2020-02-26 14:26:36 +00:00
Ryan Libby	eaa17d4291	sys/vm: quiet -Wwrite-strings Discussed with: kib Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D23796	2020-02-23 03:32:04 +00:00
Jeff Roberson	6c5f36ff30	Eliminate some unnecessary uses of UMA_ZONE_VM. Only zones involved in virtual address or physical page allocation need to be marked with this flag. Reviewed by: markj Tested by: pho Differential Revision: https://reviews.freebsd.org/D23712	2020-02-19 08:17:27 +00:00
Jeff Roberson	f212367b42	Refactor _vm_page_busy_sleep to reduce the delta between the various sleep routines and introduce a variant that supports lockless sleep. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D23612	2020-02-17 01:08:00 +00:00
Mateusz Guzik	23ed568caa	vm: remove no longer needed atomic_load_ptr casts	2020-02-14 23:16:29 +00:00
Mark Johnston	06ef60525f	Fix handling of WAITFAIL in vm_page_grab() and vm_page_grab_pages(). After sleeping through a memory shortage, we must return NULL rather than retry. Discussed with: jeff Reported by: pho Sponsored by: The FreeBSD Foundation	2020-02-13 23:18:35 +00:00
Jeff Roberson	ee9e43f8dd	Add an explicit busy state for free pages. This improves behavior with potential bugs that access freed pages as well as providing a path towards lockless page lookup. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D23444	2020-02-04 20:33:01 +00:00
Mark Johnston	f0a273c00f	Remove a couple of lingering usages of the page lock. Update vm_page_scan_contig() and vm_page_reclaim_run() to stop using vm_page_change_lock(). It has no use after r356157. Remove vm_page_change_lock() now that it has no users. Remove an unncessary check for wirings in vm_page_scan_contig(), which was previously checking twice. The check is racy until vm_page_reclaim_run() ensures that the page is unmapped, so one check is sufficient. Reviewed by: jeff, kib (previous versions) Tested by: pho (previous version) Differential Revision: https://reviews.freebsd.org/D23279	2020-02-01 18:23:51 +00:00
Jeff Roberson	d6e13f3b4d	Don't hold the object lock while calling getpages. The vnode pager does not want the object lock held. Moving this out allows further object lock scope reduction in callers. While here add some missing paging in progress calls and an assert. The object handle is now protected explicitly with pip. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D23033	2020-01-19 23:47:32 +00:00
Jeff Roberson	a81c400e75	Simplify VM and UMA startup by eliminating boot pages. Instead use careful ordering to allocate early pages in the same way boot pages were but only as needed. After the KVA allocator has started up we allocate the KVA that we consumed during boot. This also makes the boot pages freeable since they have vm_page structures allocated with the rest of memory. Parts of this patch were written and tested by markj. Reviewed by: glebius, markj Differential Revision: https://reviews.freebsd.org/D23102	2020-01-16 05:01:21 +00:00
Gleb Smirnoff	9328cbc047	Always multiple vm.pgcache_zone_max to number of CPUs, and rename it respectively. The tunable controls how big is the size of per-cpu vm page cache. Previously the value was split for all CPUs in system, so configuring same value on machines with different count of CPUs yielded in different cache size available to a particular CPU. Reviewed by: markj Obtained from: Netflix	2020-01-10 19:32:08 +00:00
Jeff Roberson	79c9f9429a	Fix uma boot pages calculations on NUMA machines that also don't have MD_UMA_SMALL_ALLOC. This is unusual but not impossible. Fix the alignemnt of zones while here. This was already correct because uz_cpu strongly aligned the zone structure but the specified alignment did not match reality and involved redundant defines. Reviewed by: markj, rlibby Differential Revision: https://reviews.freebsd.org/D23046	2020-01-06 02:51:19 +00:00
Mark Johnston	758b2c02bb	Restore a vm_page_wired() check in vm_page_mvqueue() after r356156. We now set PGA_DEQUEUE on a managed page when it is wired after allocation, and vm_page_mvqueue() ignores pages with this flag set, ensuring that they do not end up in the page queues. However, this is not sufficient for managed fictitious pages or pages managed by the TTM. In particular, the TTM makes use of the plinks.q queue linkage fields for its own purposes. PR: 242961 Reported and tested by: Greg V <greg@unrelenting.technology>	2019-12-29 20:01:03 +00:00
Mark Johnston	9b888dd9bd	Clear queue op flags in vm_page_mvqueue(). This fixes a regression in r356155, introduced at the last minute. In particular, we must clear PGA_REQUEUE_HEAD before inserting into any queue besides PQ_INACTIVE since that operation is implemented only for PQ_INACTIVE. Reported by: pho, Jenkins via lwhsu	2019-12-29 15:39:43 +00:00
Mark Johnston	727150ff03	Remove some unused functions. The previous series of patches orphaned some vm_page functions, so remove them. Reviewed by: dougm, kib Sponsored by: Netflix, Intel Differential Revision: https://reviews.freebsd.org/D22886	2019-12-28 19:04:29 +00:00
Mark Johnston	9f5632e6c8	Remove page locking for queue operations. With the previous reviews, the page lock is no longer required in order to perform queue operations on a page. It is also no longer needed in the page queue scans. This change effectively eliminates remaining uses of the page lock and also the false sharing caused by multiple pages sharing a page lock. Reviewed by: jeff Tested by: pho Sponsored by: Netflix, Intel Differential Revision: https://reviews.freebsd.org/D22885	2019-12-28 19:04:00 +00:00
Mark Johnston	b7f30bff2f	Generalize lazy dequeue logic for wired pages. Some recent work aims to remove the use of the page lock for synchronizing updates to page queue state. This change adds a mechanism to preserve the existing behaviour of lazily dequeuing wired pages, which was previously synchronized using the page lock. Handle this by setting PGA_DEQUEUE when a managed page's wire count transitions from 0 to 1. When the page daemon encounters a page with a flag in PGA_QUEUE_OP_MASK set, it creates a batch queue entry for that page, but in so doing it does not modify the page itself and thus racing with a concurrent free of the page is harmless. The flag is advisory; the page daemon still checks for wirings after acquiring the object and page xbusy locks. vm_page_unwire_managed() now clears PGA_DEQUEUE on a 1->0 transition. It must do this before dropping the reference to avoid a use-after-free but also handles races with concurrent wirings to ensure that PGA_DEQUEUE is not left unset on a wired page. Reviewed by: jeff Tested by: pho Sponsored by: Netflix, Intel Differential Revision: https://reviews.freebsd.org/D22882	2019-12-28 19:03:46 +00:00
Mark Johnston	f3f38e2580	Start implementing queue state updates using fcmpset loops. This is in preparation for eliminating the use of the vm_page lock for protecting queue state operations. Introduce the vm_page_pqstate_commit_*() functions. These functions act as helpers around vm_page_astate_fcmpset() and are specialized for specific types of operations. vm_page_pqstate_commit() wraps these functions. Convert a number of routines to use these new helpers. Use vm_page_release_toq() in vm_page_unwire() and vm_page_release() to atomically release a wiring reference and release the page into a queue. This has the side effect that vm_page_unwire() will leave the page in the active queue if it is already present there. Convert the page queue scans to use the new helpers. Simplify vm_pageout_reinsert_inactive(), which requeues pages that were found to be busy during an inactive queue scan, to avoid duplicating the work of vm_pqbatch_process_page(). In particular, if PGA_REQUEUE or PGA_REQUEUE_HEAD is set, let that be handled during batch processing. Reviewed by: jeff Tested by: pho Sponsored by: Netflix, Intel Differential Revision: https://reviews.freebsd.org/D22770 Differential Revision: https://reviews.freebsd.org/D22771 Differential Revision: https://reviews.freebsd.org/D22772 Differential Revision: https://reviews.freebsd.org/D22773 Differential Revision: https://reviews.freebsd.org/D22776	2019-12-28 19:03:32 +00:00
Mark Johnston	5541eb27d6	Remove some stale comments from the page allocator. Since r352110 the page lock is not required to wire pages in any context.	2019-12-27 23:19:21 +00:00
Jeff Roberson	ff5ce8a7a5	Fix a pair of bugs introduced in r356002. When we reclaim physical pages we allocate them with VM_ALLOC_NOOBJ which means they are not busy. For now move the busy assert for the new page in vm_page_replace into the public api and out of the private api used by contig reclaim. Fix another issue where we would leak busy if the page could not be removed from pmap. Reported by: pho Discussed with: markj	2019-12-27 01:50:16 +00:00
Jeff Roberson	7e1b379e1e	Don't unnecessarily relock the vm object after sleeps. This results in a surprising amount of object contention on loop restarts in fault. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D22821	2019-12-24 18:38:06 +00:00
Jeff Roberson	3cf3b4e641	Make page busy state deterministic on free. Pages must be xbusy when removed from objects including calls to free. Pages must not be xbusy when freed and not on an object. Strengthen assertions to match these expectations. In practice very little code had to change busy handling to meet these rules but we can now make stronger guarantees to busy holders and avoid conditionally dropping busy in free. Refine vm_page_remove() and vm_page_replace() semantics now that we have stronger guarantees about busy state. This removes redundant and potentially problematic code that has proliferated. Discussed with: markj Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D22822	2019-12-22 06:56:44 +00:00
Mark Johnston	d07c571806	Fix VPO_UNMANAGED handling in vm_page_reclaim_run() after r353540. When allocating a replacement page we must clear VPO_UNMANAGED since we only ever reclaim pages from managed objects. vm_page_replace() does not handle this for us. Sprinkle some assertions to help catch this sort of issue. Reported by: pho Reviewed by: alc, kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D22868	2019-12-21 19:04:05 +00:00
Jeff Roberson	a808177864	Add a deferred free mechanism for freeing swap space that does not require an exclusive object lock. Previously swap space was freed on a best effort basis when a page that had valid swap was dirtied, thus invalidating the swap copy. This may be done inconsistently and requires the object lock which is not always convenient. Instead, track when swap space is present. The first dirty is responsible for deleting space or setting PGA_SWAP_FREE which will trigger background scans to free the swap space. Simplify the locking in vm_fault_dirty() now that we can reliably identify the first dirty. Discussed with: alc, kib, markj Differential Revision: https://reviews.freebsd.org/D22654	2019-12-15 03:15:06 +00:00
Jeff Roberson	af00971419	Handle pagein clustering in vm_page_grab_valid() so that it can be used by exec_map_first_page(). This will also enable pagein clustering for other interested consumers (tmpfs, md, etc). Discussed with: alc Approved by: kib Differential Revision: https://reviews.freebsd.org/D22731	2019-12-15 02:00:32 +00:00
Mark Johnston	5cff1f4dc3	Introduce vm_page_astate. This is a 32-bit structure embedded in each vm_page, consisting mostly of page queue state. The use of a structure makes it easy to store a snapshot of a page's queue state in a stack variable and use cmpset loops to update that state without requiring the page lock. This change merely adds the structure and updates references to atomic state fields. No functional change intended. Reviewed by: alc, jeff, kib Sponsored by: Netflix, Intel Differential Revision: https://reviews.freebsd.org/D22650	2019-12-10 18:14:50 +00:00
Jeff Roberson	cff8481de4	It is safe to wire a page while the object is busy. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D22636	2019-12-08 01:49:53 +00:00
Jeff Roberson	2306558c54	It is now safe to rename a page that is still on a queue. Allowing this is necessary for a forthcoming patch. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D22636	2019-12-08 01:49:03 +00:00
Jeff Roberson	fb1d575ceb	Reduce duplication in grab functions by providing allocflags based inlines. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D22635	2019-12-08 01:16:22 +00:00
Justin Hibbits	caef3e1280	powerpc/pmap: NUMA-ize vm_page_array on powerpc Summary: This matches r351198 from amd64. This only applies to AIM64 and Book-E. On AIM64 it short-circuits with one domain, to behave similar to existing. Otherwise it will allocate 16MB huge pages to hold the page array, across all NUMA domains. On the first domain it will shift the page array base up, to "upper-align" the page array in that domain, so as to reduce the number of pages from the next domain appearing in this domain. After the first domain, subsequent domains will be allocated in full 16MB pages, until the final domain, which can be short. This means some inner domains may have pages accounted in earlier domains. On Book-E the page array is setup at MMU bootstrap time so that it's always mapped in TLB1, on both 32-bit and 64-bit. This reduces the TLB0 overhead for touching the vm_page_array, which reduces up to one TLB miss per array access. Since page_range (vm_page_startup()) is no longer used on Book-E but is on 32-bit AIM, mark the variable as potentially unused, rather than using a nasty #if defined() list. Reviewed by: luporl Differential Revision: https://reviews.freebsd.org/D21449	2019-12-07 03:34:03 +00:00
Jeff Roberson	9b78b1f433	Use a precise bit count for the slab free items in UMA. This significantly shrinks embedded slab structures. Reviewed by: markj, rlibby (prior version) Differential Revision: https://reviews.freebsd.org/D22584	2019-12-02 22:44:34 +00:00
Konstantin Belousov	b631c36f0d	Record part of the owner struct thread pointer into busy_lock. Record as much bits from curthread into busy_lock as fits. Low bits for struct thread * representation are zero due to struct and zone alignment, and they leave space for busy flags (perhaps except statically allocated thread0). Upper bits are not very interesting for assert, and in most practical situations recorded value should allow to manually identify the owner with certainity. Assert that unbusy is performed by the owner, except few places where unbusy is done in io completion handler. For this case, add _unchecked variants of asserts and unbusy primitives. Reviewed by: markj (previous version) Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D22298	2019-11-24 19:12:23 +00:00
Mark Johnston	bf0d60af92	Update the checks in vm_page_zone_import(). - Remove the cnt == 1 check. UMA passes cnt == 1 when it has disabled per-CPU caching. In this case we might as well just allocate a single page and return it to the caller, since the caller is going to do exactly that anyway if the UMA cache allocation attempt fails. - Don't replenish caches if the domain is severely short on free pages. With large buckets we may otherwise quickly exacerbate a situation where the page daemon is failing to keep up. - Don't replenish caches if the calling thread belongs to the page daemon, which should avoid creating extra memory pressure when it is trying to free memory. Virtually all such allocations while occur in the context of laundering, where the laundry thread must allocate slabs for various swap and I/O-related UMA zones. Reviewed by: kib Discussed with: alc, jeff MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D22394	2019-11-22 16:31:10 +00:00
Mark Johnston	003cf08ba9	Revise the page cache size policy. In r353734 the use of the page caches was limited to systems with a relatively large amount of RAM per CPU. This was to mitigate some issues reported with the system not able to keep up with memory pressure in cases where it had been able to do so prior to the addition of the direct free pool cache. This change re-enables those caches. The change modifies uma_zone_set_maxcache(), which was introduced specifically for the page cache zones. Rather than using it to limit only the full bucket cache, have it also set uz_count_max to provide an upper bound on the per-CPU cache size that is consistent with the number of items requested. Remove its return value since it has no use. Enable the page cache zones unconditionally, and limit them to 0.1% of the domain's pages. The limit can be overridden by the vm.pgcache_zone_max tunable as before. Change the item size parameter passed to uma_zcache_create() to the correct size, and stop setting UMA_ZONE_MAXBUCKET. This allows the page cache buckets to be adaptively sized, like the rest of UMA's caches. This also causes the initial bucket size to be small, so only systems which benefit from large caches will get them. Reviewed by: gallatin, jeff MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D22393	2019-11-22 16:30:47 +00:00
Andrew Turner	09a65f9ff5	As with r354905 use uint16_t to store aflags on the stack and as function arguments as the aflags size in vm_page_t has increased. Sponsored by: DARPA, AFRL	2019-11-20 18:00:43 +00:00
Andrew Turner	ad216bc10d	Use atomic_load_16 to load aflags as it's a uint16_t after r354820. Sponsored by: DARPA, AFRL	2019-11-20 17:49:58 +00:00
Jeff Roberson	7f935055d3	Remove unnecessary object locking from the vnode pager. Recent changes to busy/valid/dirty locking make these acquires redundant. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D22186	2019-11-19 23:30:09 +00:00

1 2 3 4 5 ...

771 Commits