freebsd-dev

Author	SHA1	Message	Date
Mark Johnston	01f04471f4	Don't increment addl_page_shortage for wired pages. Such pages are dequeued as they're encountered during the inactive queue scan, so by the time we get to the active queue scan, they should have already been subtracted from the inactive queue length. Reviewed by: alc Differential Revision: https://reviews.freebsd.org/D15479	2018-05-18 16:59:58 +00:00
Mark Johnston	36f8fe9bbb	Get rid of vm_pageout_page_queued(). vm_page_queue(), added in r333256, generalizes vm_pageout_page_queued(), so use it instead. No functional change intended. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D15402	2018-05-13 13:00:59 +00:00
Mark Johnston	1b5c869d64	Fix some races introduced in r332974. With r332974, when performing a synchronized access of a page's "queue" field, one must first check whether the page is logically dequeued. If so, then the page lock does not prevent the page from being removed from its page queue. Intoduce vm_page_queue(), which returns the page's logical queue index. In some cases, direct access to the "queue" field is still required, but such accesses should be confined to sys/vm. Reported and tested by: pho Reviewed by: kib Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D15280	2018-05-04 17:17:30 +00:00
Mark Johnston	5cd29d0f3c	Improve VM page queue scalability. Currently both the page lock and a page queue lock must be held in order to enqueue, dequeue or requeue a page in a given page queue. The queue locks are a scalability bottleneck in many workloads. This change reduces page queue lock contention by batching queue operations. To detangle the page and page queue locks, per-CPU batch queues are used to reference pages with pending queue operations. The requested operation is encoded in the page's aflags field with the page lock held, after which the page is enqueued for a deferred batch operation. Page queue scans are similarly optimized to minimize the amount of work performed with a page queue lock held. Reviewed by: kib, jeff (previous versions) Tested by: pho Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D14893	2018-04-24 21:15:54 +00:00
Mark Johnston	64b3893010	Initialize marker pages in vm_page_domain_init(). They were previously initialized by the corresponding page daemon threads, but for vmd_inacthead this may be too late if vm_page_deactivate_noreuse() is called during boot. Reported and tested by: cperciva Reviewed by: alc, kib MFC after: 1 week	2018-04-19 14:09:44 +00:00
Mark Johnston	c098768e4d	Ensure the background laundering threshold is positive after a scan. The division added in r331732 meant that we wouldn't attempt a background laundering until at least v_free_target - v_free_min clean pages had been freed by the page daemon since the last laundering. If the inactive queue is depleted but not completely empty (e.g., because it contains busy pages), it can thus take a long time to meet this threshold. Restore the pre-r331732 behaviour of using a non-zero background laundering threshold if at least one inactive queue scan has elapsed since the last attempt at background laundering. Submitted by: tijl (original version)	2018-04-02 15:07:41 +00:00
Mark Johnston	6068486258	Fix the background laundering mechanism after r329882. Rather than using the number of inactive queue scans as a metric for how many clean pages are being freed by the page daemon, have the page daemon keep a running counter of the number of pages it has freed, and have the laundry thread use that when computing the background laundering threshold. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D14884	2018-03-29 14:27:40 +00:00
Jeff Roberson	30fbfdda6c	Eliminate pageout wakeup races. Take another step towards lockless vmd_free_count manipulation. Reduce the scope of the free lock by using a pageout lock to synchronize sleep and wakeup. Only trigger the pageout daemon on transitions between states. Drive all wakeup operations directly as side-effects from freeing memory rather than requiring an additional function call. Reviewed by: markj, kib Tested by: pho Sponsored by: Netflix, Dell/EMC Isilon Differential Revision: https://reviews.freebsd.org/D14612	2018-03-15 19:23:07 +00:00
Mark Johnston	3b8cf4acf0	Give the 0th domain's page daemon thread a consistent name. Page daemon threads for other domains show up in ps(1) output as "pagedaemon/domN", so let that be the case for domain 0 as well. Submitted by: Kevin Bowling <kevin.bowling@kev009.com> MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D14518	2018-02-27 16:51:09 +00:00
Mark Johnston	59d3150b58	Restore the pre-r329882 inactive page shortage computation. With r329882, in the absence of a free page shortage we would only take len(PQ_INACTIVE)+len(PQ_LAUNDRY) into account when deciding whether to aggressively scan PQ_ACTIVE. Previously we would also include the number of free pages in this computation, ensuring that we wouldn't scan PQ_ACTIVE with plenty of free memory available. The change in behaviour was most noticeable immediately after booting, when PQ_INACTIVE and PQ_LAUNDRY are nearly empty. Reviewed by: jeff	2018-02-24 20:47:22 +00:00
Jeff Roberson	5f8cd1c0bf	Add a generic Proportional Integral Derivative (PID) controller algorithm and use it to regulate page daemon output. This provides much smoother and more responsive page daemon output, anticipating demand and avoiding pageout stalls by increasing the number of pages to match the workload. This is a reimplementation of work done by myself and mlaier at Isilon. Reviewed by: bsdimp Tested by: pho Sponsored by: Netflix, Dell/EMC Isilon Differential Revision: https://reviews.freebsd.org/D14402	2018-02-23 22:51:51 +00:00
Konstantin Belousov	2c0f13aa59	vm_wait() rework. Make vm_wait() take the vm_object argument which specifies the domain set to wait for the min condition pass. If there is no object associated with the wait, use curthread' policy domainset. The mechanics of the wait in vm_wait() and vm_wait_domain() is supplied by the new helper vm_wait_doms(), which directly takes the bitmask of the domains to wait for passing min condition. Eliminate pagedaemon_wait(). vm_domain_clear() handles the same operations. Eliminate VM_WAIT and VM_WAITPFAULT macros, the direct functions calls are enough. Eliminate several control state variables from vm_domain, unneeded after the vm_wait() conversion. Scetched and reviewed by: jeff Tested by: pho Sponsored by: The FreeBSD Foundation, Mellanox Technologies Differential revision: https://reviews.freebsd.org/D14384	2018-02-20 10:13:13 +00:00
Mark Johnston	1d3a1bcfac	Dequeue wired pages lazily. Previously, wiring a page would cause it to be removed from its page queue. In the common case, unwiring causes it to be enqueued at the tail of that page queue. This change modifies vm_page_wire() to not dequeue the page, thus avoiding the highly contended page queue locks. Instead, vm_page_unwire() takes care of requeuing the page as a single operation, and the page daemon dequeues wired pages as they are encountered during a queue scan to avoid needlessly revisiting them later. For pages in PQ_ACTIVE we do even better, since a requeue is unnecessary. The change improves scalability for some common workloads. For instance, threads wiring pages into the buffer cache no longer need to modify global page queues, and unwiring is usually done by the bufspace thread, so concurrency is not as much of an issue. As another example, many sysctl handlers wire the output buffer to avoid faults on copyout, and since the buffer is likely to be in PQ_ACTIVE, we now entirely avoid modifying the page queue in this case. The change also adds a block comment describing some properties of struct vm_page's reference counters, and the busy lock. Reviewed by: jeff Discussed with: alc, kib MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D11943	2018-02-07 16:57:10 +00:00
Jeff Roberson	e2068d0bcd	Use per-domain locks for vm page queue free. Move paging control from global to per-domain state. Protect reservations with the free lock from the domain that they belong to. Refactor to make vm domains more of a first class object. Reviewed by: markj, kib, gallatin Tested by: pho Sponsored by: Netflix, Dell/EMC Isilon Differential Revision: https://reviews.freebsd.org/D14000	2018-02-06 22:10:07 +00:00
Jeff Roberson	b6715dab8f	Move VM_NUMA_ALLOC and DEVICE_NUMA under the single global config option NUMA. Sponsored by: Netflix, Dell/EMC Isilon Discussed with: jhb	2018-01-14 03:36:03 +00:00
Alan Cox	5c515efc88	After r327168, the variable "vm_pageout_wanted" can be static. MFC after: 2 weeks	2017-12-29 17:02:22 +00:00
Mark Johnston	65ef323137	Ensure that pass > 0 when starting a scan with vm_pages_needed == 1. Otherwise the page daemon will not reclaim pages and thus will not wake threads sleeping in VM_WAIT. Reported and tested by: pho Reviewed by: alc, kib X-MFC with: r327168 Differential Revision: https://reviews.freebsd.org/D13640	2017-12-26 16:29:39 +00:00
Mark Johnston	280d15cd0a	Fix two problems with the page daemon control loop. Both issues caused the page daemon to erroneously go to sleep when applications are consuming free pages at a high rate, leaving the application threads blocked in VM_WAIT. 1) After completing an inactive queue scan, concurrent allocations may have prevented the page daemon from meeting the v_free_min threshold. In this case, the page daemon was going to sleep even when the inactive queue contained plenty of clean pages. 2) pagedaemon_wakeup() may be called without the free queues lock held. This can lead to a lost wakeup if a call occurs after the page daemon clears vm_pageout_wanted but before going to sleep. Fix 1) by ensuring that we start a new inactive queue scan immediately if v_free_count < v_free_min after a prior scan. Fix 2) by adding a new subroutine, pagedaemon_wait(), called from vm_wait() and vm_waitpfault(). It wakes up the page daemon if either vm_pages_needed or vm_pageout_wanted is false, and atomically sleeps on v_free_count. Reported by: jeff Reviewed by: alc MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D13424	2017-12-24 19:45:16 +00:00
Mark Johnston	cb35676e66	Use a dedicated counter for inactive queue scans. The laundry thread keeps track of the number of inactive queue scans performed by the page daemon, and was previously using the v_pdwakeups counter to count them. However, in some cases the inactive queue may be scanned multiple times after a single wakeup, so it's more accurate to use a dedicated counter. Reviewed by: alc, kib (previous version) MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D13422	2017-12-11 15:33:24 +00:00
Mark Johnston	82e2d06a27	Fix the act_scan_laundry_weight mechanism. r292392 modified the active queue scan to weigh clean pages differently from dirty pages when attempting to meet the inactive queue target. When r306706 was merged into the PQ_LAUNDRY branch, this mechanism was broken. Fix it by scalaing the correct page shortage variable. Reviewed by: alc, kib MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D13423	2017-12-09 15:47:26 +00:00
Mark Johnston	6eebec8343	Use unique wait messages in the page daemon control loop. Discussed with: alc MFC after: 1 week	2017-12-06 18:36:54 +00:00
Pedro F. Giffuni	796df753f4	SPDX: Consider code from Carnegie-Mellon University. Interesting cases, most likely from CMU Mach sources.	2017-11-30 15:48:35 +00:00
Mark Johnston	57cd81a357	Verify the object/vnode association after vget() in vm_pageout_clean(). It's theoretically possible for the vnode and object to be disassociated while locks are dropped around the vget() call, in which case we shouldn't proceed with laundering. Noted and reviewed by: kib MFC after: 1 week	2017-11-29 19:47:09 +00:00
Pedro F. Giffuni	df57947f08	spdx: initial adoption of licensing ID tags. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point. Initially, only tag files that use BSD 4-Clause "Original" license. RelNotes: yes Differential Revision: https://reviews.freebsd.org/D13133	2017-11-18 14:26:50 +00:00
Mark Johnston	e0b2fc3a51	Allow various page daemon parameters to be set from loader.conf. MFC after: 1 week	2017-11-08 19:55:17 +00:00
Konstantin Belousov	ac04195ba6	Move swapout code into vm/vm_swapout.c. There is no NO_SWAPPING #ifdef left in the code. Requested by: alc Reviewed by: alc, markj Sponsored by: The FreeBSD Foundation MFC after: 3 weeks Differential revision: https://reviews.freebsd.org/D12663	2017-10-20 09:10:49 +00:00
Mark Johnston	aed9aaaa76	Synchronize page laundering with pmap_extract_and_hold(). Before r207410, the hold count of a page in a page queue was protected by the queue lock, and, before laundering a page, the page daemon removed managed writeable mappings of the page before releasing the queue lock. This ensured that other threads could not concurrently create transient writeable mappings using pmap_extract_and_hold() on a user map, as is done for example by vmapbuf(). With that revision, however, a race can allow the creation of such a mapping, meaning that the page might be modified as it is being laundered, potentially resulting in it being marked clean when its contents do not match those given to the pager. Close the race by using the page lock to synchronize the hold count check in vm_pageout_cluster() with the removal of writeable managed mappings. Reported by: alc Reviewed by: alc, kib MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D12084	2017-08-28 22:10:15 +00:00
Alan Cox	e22415906d	Increase the pageout cluster size to 32 pages. Decouple the pageout cluster size from the size of the hash table entry used by the swap pager for mapping (object, pindex) to a block on the swap device(s), and keep the size of a hash table entry at its current size. Eliminate a pointless macro. Reviewed by: kib, markj (an earlier version) MFC after: 4 weeks Differential Revision: https://reviews.freebsd.org/D11305	2017-06-24 17:10:33 +00:00
Alan Cox	3e78e98337	The variable "breakout" is used like a Boolean, so actually define it as one. Reviewed by: kib MFC after: 5 days	2017-06-05 18:07:56 +00:00
Gleb Smirnoff	83c9dea1ba	- Remove 'struct vmmeter' from 'struct pcpu', leaving only global vmmeter in place. To do per-cpu stats, convert all fields that previously were maintained in the vmmeters that sit in pcpus to counter(9). - Since some vmmeter stats may be touched at very early stages of boot, before we have set up UMA and we can do counter_u64_alloc(), provide an early counter mechanism: o Leave one spare uint64_t in struct pcpu, named pc_early_dummy_counter. o Point counter(9) fields of vmmeter to pcpu[0].pc_early_dummy_counter, so that at early stages of boot, before counters are allocated we already point to a counter that can be safely written to. o For sparc64 that required a whole dummy pcpu[MAXCPU] array. Further related changes: - Don't include vmmeter.h into pcpu.h. - vm.stats.vm.v_swappgsout and vm.stats.vm.v_swappgsin changed to 64-bit, to match kernel representation. - struct vmmeter hidden under _KERNEL, and only vmstat(1) is an exclusion. This is based on benno@'s 4-year old patch: https://lists.freebsd.org/pipermail/freebsd-arch/2013-July/014471.html Reviewed by: kib, gallatin, marius, lidl Differential Revision: https://reviews.freebsd.org/D10156	2017-04-17 17:34:47 +00:00
Andriy Gapon	9b43bc27c4	call vm_lowmem hook in uma_reclaim_worker A comment near kmem_reclaim() implies that we already did that. Calling the hook is useful, because some handlers, e.g. ARC, might be able to release significant amounts of KVA. Now that we have more than one place where vm_lowmem hook is called, use this change as an opportunity to introduce flags that describe a reason for calling the hook. No handler makes use of the flags yet. Reviewed by: markj, kib MFC after: 1 week Sponsored by: Panzura Differential Revision: https://reviews.freebsd.org/D9764	2017-02-25 16:39:21 +00:00
Andriy Gapon	937c1b0757	try to fix RACCT_RSS accounting There could be a race between the vm daemon setting RACCT_RSS based on the vm space and vmspace_exit (called from exit1) resetting RACCT_RSS to zero. In that case we can get a zombie process with non-zero RACCT_RSS. If the process is jailed, that may break accounting for the jail. There could be other consequences. Fix this race in the vm daemon by updating RACCT_RSS only when a process is in the normal state. Also, make accounting a little bit more accurate by refreshing the page resident count after calling vm_pageout_map_deactivate_pages(). Finally, add an assert that the RSS is zero when a process is reaped. PR: 210315 Reviewed by: trasz Differential Revision: https://reviews.freebsd.org/D9464	2017-02-14 13:54:05 +00:00
Mark Johnston	b1fd102ee7	Add a page queue for holding dirty anonymous unswappable pages. On systems without a configured swap device, an attempt to launder pages from a swap object will always fail and result in the page being reactivated. This means that the page daemon will continuously scan pages that can never be evicted. With this change, anonymous pages are instead moved to PQ_UNSWAPPABLE after a failed laundering attempt when no swap devices are configured. PQ_UNSWAPPABLE is not scanned unless a swap device is configured, so unreferenced unswappable pages are excluded from the page daemon's workload. Reviewed by: alc	2017-01-03 00:05:44 +00:00
Mark Johnston	99e6e1930c	Release laundered vnode pages to the head of the inactive queue. The swap pager enqueues laundered pages near the head of the inactive queue to avoid another trip through LRU before reclamation. This change adds support for this behaviour to the vnode pager and makes use of it in UFS and ext2fs. Some ioflag handling is consolidated into a common subroutine so that this support can be easily extended to other filesystems which make use of the buffer cache. No changes are needed for ZFS since its putpages routine always undirties the pages before returning, and the laundry thread requeues the pages appropriately in this case. Reviewed by: alc, kib Differential Revision: https://reviews.freebsd.org/D8589	2016-11-23 17:53:07 +00:00
Alan Cox	ebcddc7217	Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty pages, specificially, dirty pages that have passed once through the inactive queue. A new, dedicated thread is responsible for both deciding when to launder pages and actually laundering them. The new policy uses the relative sizes of the inactive and laundry queues to determine whether to launder pages at a given point in time. In general, this leads to more intelligent swapping behavior, since the laundry thread will avoid pageouts when the marginal benefit of doing so is low. Previously, without a dedicated queue for dirty pages, the page daemon didn't have the information to determine whether pageout provides any benefit to the system. Thus, the previous policy often resulted in small but steadily increasing amounts of swap usage when the system is under memory pressure, even when the inactive queue consisted mostly of clean pages. This change addresses that issue, and also paves the way for some future virtual memory system improvements by removing the last source of object-cached clean pages, i.e., PG_CACHE pages. The new laundry thread sleeps while waiting for a request from the page daemon thread(s). A request is raised by setting the variable vm_laundry_request and waking the laundry thread. We request launderings for two reasons: to try and balance the inactive and laundry queue sizes ("background laundering"), and to quickly make up for a shortage of free pages and clean inactive pages ("shortfall laundering"). When background laundering is requested, the laundry thread computes the number of page daemon wakeups that have taken place since the last laundering. If this number is large enough relative to the ratio of the laundry and (global) inactive queue sizes, we will launder vm_background_launder_target pages at vm_background_launder_rate KB/s. Otherwise, the laundry thread goes back to sleep without doing any work. When scanning the laundry queue during background laundering, reactivated pages are counted towards the laundry thread's target. In contrast, shortfall laundering is requested when an inactive queue scan fails to meet its target. In this case, the laundry thread attempts to launder enough pages to meet v_free_target within 0.5s, which is the inactive queue scan period. A laundry request can be latched while another is currently being serviced. In particular, a shortfall request will immediately preempt a background laundering. This change also redefines the meaning of vm_cnt.v_reactivated and removes the functions vm_page_cache() and vm_page_try_to_cache(). The new meaning of vm_cnt.v_reactivated now better reflects its name. It represents the number of inactive or laundry pages that are returned to the active queue on account of a reference. In collaboration with: markj Reviewed by: kib Tested by: pho Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D8302	2016-11-09 18:48:37 +00:00
Bryan Drewery	28323add09	Fix improper use of "its". Sponsored by: Dell EMC Isilon	2016-11-08 23:59:41 +00:00
Alan Cox	70cf3ced3c	Make the page daemon's notion of what kind of pass is being performed by vm_pageout_scan() local to vm_pageout_worker(). There is no reason to store the pass in the NUMA domain structure. Reviewed by: kib MFC after: 3 weeks	2016-10-05 17:32:06 +00:00
Alan Cox	e57dd910e6	Change vm_pageout_scan() to return a value indicating whether the free page target was met. Previously, vm_pageout_worker() itself checked the length of the free page queues to determine whether vm_pageout_scan(pass >= 1)'s inactive queue scan freed enough pages to meet the free page target. Specifically, vm_pageout_worker() used vm_paging_needed(). The trouble with vm_paging_needed() is that it compares the length of the free page queues to the wakeup threshold for the page daemon, which is much lower than the free page target. Consequently, vm_pageout_worker() could conclude that the inactive queue scan succeeded in meeting its free page target when in fact it did not; and rather than immediately triggering an all-out laundering pass over the inactive queue, vm_pageout_worker() would go back to sleep waiting for the free page count to fall below the page daemon wakeup threshold again, at which point it will perform another limited (pass == 1) scan over the inactive queue. Changing vm_pageout_worker() to use vm_page_count_target() instead of vm_paging_needed() won't work because any page allocations that happen concurrently with the inactive queue scan will result in the free page count being below the target at the end of a successful scan. Instead, having vm_pageout_scan() return a value indicating success or failure is the most straightforward fix. Reviewed by: kib, markj MFC after: 3 weeks Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D8111	2016-10-05 16:15:26 +00:00
Alan Cox	791444089f	Correct errors and clean up the comments on the active queue scan. Eliminate some unnecessary blank lines. Reviewed by: kib, markj MFC after: 1 week	2016-08-12 03:22:58 +00:00
Alan Cox	f0edf3f806	Correct a spelling error.	2016-08-05 16:44:11 +00:00
Alan Cox	248fe642a7	Clean up the comments and code style in and around vm_pageout_cluster(). In particular, fix factual, grammatical, and spelling errors in various comments, and remove comments that are out of place in this function. Reviewed by: kib, markj MFC after: 3 weeks Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D7410	2016-08-04 16:20:12 +00:00
Alan Cox	87ff568c26	Restore the historical behavior of "sysctl vm.swap_idle_enabled=1". Prior to r254304, we had separate functions for reclamation and laundering (vm_pageout_scan) versus updating usage information, i.e., "reference bits", on active pages (vm_pageout_page_stats), and we only performed vm_req_vmdaemon(VM_SWAP_IDLE) if vm_pages_needed was true. However, since r254303, if vm_swap_idle_enabled was "1", we have performed vm_req_vmdaemon(VM_SWAP_IDLE) regardless of whether we are short of free pages. This was unintended and too aggressive, so I suspect no one uses this feature. With this change, we restore the historical behavior and only perform vm_req_vmdaemon(VM_SWAP_IDLE) when we are short of free pages. Reviewed by: kib, markj	2016-08-01 17:25:07 +00:00
Alan Cox	793172ea88	Remove a probe declaration that has been unused since r292469, when vm_pageout_grow_cache() was replaced. MFC after: 3 days	2016-07-29 16:43:51 +00:00
Alan Cox	f095d1bbc7	Remove any mention of cache (PG_CACHE) pages from the comments in vm_pageout_scan(). That function has not cached pages since r284376. MFC after: 3 days	2016-07-28 22:30:48 +00:00
Mark Johnston	3ac8f842ea	De-pluralize "queues" where appropriate in the pagedaemon code. MFC after: 1 week	2016-07-27 17:11:03 +00:00
Alan Cox	a766ffd061	Update a comment to reflect r284376. MFC after: 3 days	2016-07-27 03:49:00 +00:00
Mark Johnston	44be0a8ea5	Correct a comment - each page queue has its own lock. Reviewed by: alc MFC after: 3 days	2016-07-23 21:03:25 +00:00
Mark Johnston	20c58db95a	Make vm_pageout_wakeup_thresh a u_int rather than an int. It's a threshold for v_free_count, which is of type u_int. This also lets us get rid of a cast in vm_paging_needed(). Reviewed by: alc MFC after: 1 week	2016-07-20 00:09:22 +00:00
Konstantin Belousov	95e2409a33	Fix a LOR between vnode locks and allproc_lock. There is an order between covered vnode lock and allproc_lock, which is established by calling mountcheckdirs() while owning the covered vnode lock. mountcheckdirs() iterates over the processes, protected by allproc_lock. This order is needed and seems to be not avoidable. On the other hand, various VM daemons also need to iterate over all processes, and they lock and unlock user maps. Since unlock of the user map may trigger processing of the deferred map entries, it causes vnode locking to occur. Or, when vmspace is freed, dropping references on the vnode-backed object also lock vnodes. We get reverted order comparing with the mount/unmount order. For VM daemons, there is no need to own allproc_lock while we operate on vmspaces. If the process is held, it serves as the marker for allproc list, which allows to continue the iteration. Add _PHOLD_LITE() macro, similar to _PHOLD(), but not causing swap-in of the kernel stacks. It is used instead of _PHOLD() in vm code, since e.g. calling faultin() in OOM conditions only exaggerates the problem. Modernize comment describing PHOLD. Reported by: lists@yamagi.org Tested by: pho (previous version) Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 3 week Approved by: re (gjb) Differential revision: https://reviews.freebsd.org/D6679	2016-06-22 20:15:37 +00:00
Alan Cox	56ce06907c	The flag "vm_pages_needed" has long served two distinct purposes: (1) to indicate that threads are waiting for free pages to become available and (2) to indicate whether a wakeup call has been sent to the page daemon. The trouble is that a single flag cannot really serve both purposes, because we have two distinct targets for when to wakeup threads waiting for free pages versus when the page daemon has completed its work. In particular, the flag will be cleared by vm_page_free() before the page daemon has met its target, and this can lead to the OOM killer being invoked prematurely. To address this problem, a new flag "vm_pageout_wanted" is introduced. Discussed with: jeff Reviewed by: kib, markj Tested by: markj Sponsored by: EMC / Isilon Storage Division	2016-05-27 19:15:45 +00:00

1 2 3 4 5 ...

470 Commits