2005-01-07 02:29:27 +00:00
|
|
|
/*-
|
2017-11-30 15:48:35 +00:00
|
|
|
* SPDX-License-Identifier: (BSD-4-Clause AND MIT-CMU)
|
2017-11-18 14:26:50 +00:00
|
|
|
*
|
1994-05-25 09:21:21 +00:00
|
|
|
* Copyright (c) 1991 Regents of the University of California.
|
|
|
|
* All rights reserved.
|
|
|
|
* Copyright (c) 1994 John S. Dyson
|
|
|
|
* All rights reserved.
|
|
|
|
* Copyright (c) 1994 David Greenman
|
|
|
|
* All rights reserved.
|
2005-08-10 00:17:36 +00:00
|
|
|
* Copyright (c) 2005 Yahoo! Technologies Norway AS
|
|
|
|
* All rights reserved.
|
1994-05-24 10:09:53 +00:00
|
|
|
*
|
|
|
|
* This code is derived from software contributed to Berkeley by
|
|
|
|
* The Mach Operating System project at Carnegie-Mellon University.
|
|
|
|
*
|
|
|
|
* Redistribution and use in source and binary forms, with or without
|
|
|
|
* modification, are permitted provided that the following conditions
|
|
|
|
* are met:
|
|
|
|
* 1. Redistributions of source code must retain the above copyright
|
|
|
|
* notice, this list of conditions and the following disclaimer.
|
|
|
|
* 2. Redistributions in binary form must reproduce the above copyright
|
|
|
|
* notice, this list of conditions and the following disclaimer in the
|
|
|
|
* documentation and/or other materials provided with the distribution.
|
|
|
|
* 3. All advertising materials mentioning features or use of this software
|
2000-03-27 20:41:17 +00:00
|
|
|
* must display the following acknowledgement:
|
1994-05-24 10:09:53 +00:00
|
|
|
* This product includes software developed by the University of
|
|
|
|
* California, Berkeley and its contributors.
|
|
|
|
* 4. Neither the name of the University nor the names of its contributors
|
|
|
|
* may be used to endorse or promote products derived from this software
|
|
|
|
* without specific prior written permission.
|
|
|
|
*
|
|
|
|
* THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
|
|
|
|
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
|
|
|
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
|
|
|
* ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
|
|
|
|
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
|
|
|
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
|
|
|
|
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
|
|
|
|
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
|
|
|
|
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
|
|
|
|
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
|
|
|
* SUCH DAMAGE.
|
|
|
|
*
|
1994-08-02 07:55:43 +00:00
|
|
|
* from: @(#)vm_pageout.c 7.4 (Berkeley) 5/7/91
|
1994-05-24 10:09:53 +00:00
|
|
|
*
|
|
|
|
*
|
|
|
|
* Copyright (c) 1987, 1990 Carnegie-Mellon University.
|
|
|
|
* All rights reserved.
|
|
|
|
*
|
|
|
|
* Authors: Avadis Tevanian, Jr., Michael Wayne Young
|
These changes embody the support of the fully coherent merged VM buffer cache,
much higher filesystem I/O performance, and much better paging performance. It
represents the culmination of over 6 months of R&D.
The majority of the merged VM/cache work is by John Dyson.
The following highlights the most significant changes. Additionally, there are
(mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to
support the new VM/buffer scheme.
vfs_bio.c:
Significant rewrite of most of vfs_bio to support the merged VM buffer cache
scheme. The scheme is almost fully compatible with the old filesystem
interface. Significant improvement in the number of opportunities for write
clustering.
vfs_cluster.c, vfs_subr.c
Upgrade and performance enhancements in vfs layer code to support merged
VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff.
vm_object.c:
Yet more improvements in the collapse code. Elimination of some windows that
can cause list corruption.
vm_pageout.c:
Fixed it, it really works better now. Somehow in 2.0, some "enhancements"
broke the code. This code has been reworked from the ground-up.
vm_fault.c, vm_page.c, pmap.c, vm_object.c
Support for small-block filesystems with merged VM/buffer cache scheme.
pmap.c vm_map.c
Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of
kernel PTs.
vm_glue.c
Much simpler and more effective swapping code. No more gratuitous swapping.
proc.h
Fixed the problem that the p_lock flag was not being cleared on a fork.
swap_pager.c, vnode_pager.c
Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the
code doesn't need it anymore.
machdep.c
Changes to better support the parameter values for the merged VM/buffer cache
scheme.
machdep.c, kern_exec.c, vm_glue.c
Implemented a seperate submap for temporary exec string space and another one
to contain process upages. This eliminates all map fragmentation problems
that previously existed.
ffs_inode.c, ufs_inode.c, ufs_readwrite.c
Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on
busy buffers.
Submitted by: John Dyson and David Greenman
1995-01-09 16:06:02 +00:00
|
|
|
*
|
1994-05-24 10:09:53 +00:00
|
|
|
* Permission to use, copy, modify and distribute this software and
|
|
|
|
* its documentation is hereby granted, provided that both the copyright
|
|
|
|
* notice and this permission notice appear in all copies of the
|
|
|
|
* software, derivative works or modified versions, and any portions
|
|
|
|
* thereof, and that both notices appear in supporting documentation.
|
These changes embody the support of the fully coherent merged VM buffer cache,
much higher filesystem I/O performance, and much better paging performance. It
represents the culmination of over 6 months of R&D.
The majority of the merged VM/cache work is by John Dyson.
The following highlights the most significant changes. Additionally, there are
(mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to
support the new VM/buffer scheme.
vfs_bio.c:
Significant rewrite of most of vfs_bio to support the merged VM buffer cache
scheme. The scheme is almost fully compatible with the old filesystem
interface. Significant improvement in the number of opportunities for write
clustering.
vfs_cluster.c, vfs_subr.c
Upgrade and performance enhancements in vfs layer code to support merged
VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff.
vm_object.c:
Yet more improvements in the collapse code. Elimination of some windows that
can cause list corruption.
vm_pageout.c:
Fixed it, it really works better now. Somehow in 2.0, some "enhancements"
broke the code. This code has been reworked from the ground-up.
vm_fault.c, vm_page.c, pmap.c, vm_object.c
Support for small-block filesystems with merged VM/buffer cache scheme.
pmap.c vm_map.c
Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of
kernel PTs.
vm_glue.c
Much simpler and more effective swapping code. No more gratuitous swapping.
proc.h
Fixed the problem that the p_lock flag was not being cleared on a fork.
swap_pager.c, vnode_pager.c
Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the
code doesn't need it anymore.
machdep.c
Changes to better support the parameter values for the merged VM/buffer cache
scheme.
machdep.c, kern_exec.c, vm_glue.c
Implemented a seperate submap for temporary exec string space and another one
to contain process upages. This eliminates all map fragmentation problems
that previously existed.
ffs_inode.c, ufs_inode.c, ufs_readwrite.c
Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on
busy buffers.
Submitted by: John Dyson and David Greenman
1995-01-09 16:06:02 +00:00
|
|
|
*
|
|
|
|
* CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS"
|
|
|
|
* CONDITION. CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND
|
1994-05-24 10:09:53 +00:00
|
|
|
* FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE.
|
These changes embody the support of the fully coherent merged VM buffer cache,
much higher filesystem I/O performance, and much better paging performance. It
represents the culmination of over 6 months of R&D.
The majority of the merged VM/cache work is by John Dyson.
The following highlights the most significant changes. Additionally, there are
(mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to
support the new VM/buffer scheme.
vfs_bio.c:
Significant rewrite of most of vfs_bio to support the merged VM buffer cache
scheme. The scheme is almost fully compatible with the old filesystem
interface. Significant improvement in the number of opportunities for write
clustering.
vfs_cluster.c, vfs_subr.c
Upgrade and performance enhancements in vfs layer code to support merged
VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff.
vm_object.c:
Yet more improvements in the collapse code. Elimination of some windows that
can cause list corruption.
vm_pageout.c:
Fixed it, it really works better now. Somehow in 2.0, some "enhancements"
broke the code. This code has been reworked from the ground-up.
vm_fault.c, vm_page.c, pmap.c, vm_object.c
Support for small-block filesystems with merged VM/buffer cache scheme.
pmap.c vm_map.c
Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of
kernel PTs.
vm_glue.c
Much simpler and more effective swapping code. No more gratuitous swapping.
proc.h
Fixed the problem that the p_lock flag was not being cleared on a fork.
swap_pager.c, vnode_pager.c
Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the
code doesn't need it anymore.
machdep.c
Changes to better support the parameter values for the merged VM/buffer cache
scheme.
machdep.c, kern_exec.c, vm_glue.c
Implemented a seperate submap for temporary exec string space and another one
to contain process upages. This eliminates all map fragmentation problems
that previously existed.
ffs_inode.c, ufs_inode.c, ufs_readwrite.c
Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on
busy buffers.
Submitted by: John Dyson and David Greenman
1995-01-09 16:06:02 +00:00
|
|
|
*
|
1994-05-24 10:09:53 +00:00
|
|
|
* Carnegie Mellon requests users of this software to return to
|
|
|
|
*
|
|
|
|
* Software Distribution Coordinator or Software.Distribution@CS.CMU.EDU
|
|
|
|
* School of Computer Science
|
|
|
|
* Carnegie Mellon University
|
|
|
|
* Pittsburgh PA 15213-3890
|
|
|
|
*
|
|
|
|
* any improvements or extensions that they make and grant Carnegie the
|
|
|
|
* rights to redistribute these changes.
|
|
|
|
*/
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The proverbial page-out daemon.
|
|
|
|
*/
|
|
|
|
|
2003-06-11 23:50:51 +00:00
|
|
|
#include <sys/cdefs.h>
|
|
|
|
__FBSDID("$FreeBSD$");
|
|
|
|
|
1998-09-29 17:33:59 +00:00
|
|
|
#include "opt_vm.h"
|
2015-11-22 02:01:01 +00:00
|
|
|
|
1994-05-24 10:09:53 +00:00
|
|
|
#include <sys/param.h>
|
1994-05-25 09:21:21 +00:00
|
|
|
#include <sys/systm.h>
|
1995-03-16 18:17:34 +00:00
|
|
|
#include <sys/kernel.h>
|
2002-11-21 09:17:56 +00:00
|
|
|
#include <sys/eventhandler.h>
|
2001-05-01 08:13:21 +00:00
|
|
|
#include <sys/lock.h>
|
|
|
|
#include <sys/mutex.h>
|
1994-05-25 09:21:21 +00:00
|
|
|
#include <sys/proc.h>
|
1999-07-01 13:21:46 +00:00
|
|
|
#include <sys/kthread.h>
|
2000-09-07 01:33:02 +00:00
|
|
|
#include <sys/ktr.h>
|
2007-06-26 18:24:05 +00:00
|
|
|
#include <sys/mount.h>
|
2011-04-06 16:24:24 +00:00
|
|
|
#include <sys/racct.h>
|
1994-05-25 09:21:21 +00:00
|
|
|
#include <sys/resourcevar.h>
|
2002-10-12 05:32:24 +00:00
|
|
|
#include <sys/sched.h>
|
2014-10-03 20:34:55 +00:00
|
|
|
#include <sys/sdt.h>
|
1995-02-14 06:14:28 +00:00
|
|
|
#include <sys/signalvar.h>
|
Split the pagequeues per NUMA domains, and split pageademon process
into threads each processing queue in a single domain. The structure
of the pagedaemons and queues is kept intact, most of the changes come
from the need for code to find an owning page queue for given page,
calculated from the segment containing the page.
The tie between NUMA domain and pagedaemon thread/pagequeue split is
rather arbitrary, the multithreaded daemon could be allowed for the
single-domain machines, or one domain might be split into several page
domains, to further increase concurrency.
Right now, each pagedaemon thread tries to reach the global target,
precalculated at the start of the pass. This is not optimal, since it
could cause excessive page deactivation and freeing. The code should
be changed to re-check the global page deficit state in the loop after
some number of iterations.
The pagedaemons reach the quorum before starting the OOM, since one
thread inability to meet the target is normal for split queues. Only
when all pagedaemons fail to produce enough reusable pages, OOM is
started by single selected thread.
Launder is modified to take into account the segments layout with
regard to the region for which cleaning is performed.
Based on the preliminary patch by jeff, sponsored by EMC / Isilon
Storage Division.
Reviewed by: alc
Tested by: pho
Sponsored by: The FreeBSD Foundation
2013-08-07 16:36:38 +00:00
|
|
|
#include <sys/smp.h>
|
2015-08-20 20:28:51 +00:00
|
|
|
#include <sys/time.h>
|
1995-04-09 06:03:56 +00:00
|
|
|
#include <sys/vnode.h>
|
1995-12-07 12:48:31 +00:00
|
|
|
#include <sys/vmmeter.h>
|
2013-02-20 10:38:34 +00:00
|
|
|
#include <sys/rwlock.h>
|
2001-03-28 11:52:56 +00:00
|
|
|
#include <sys/sx.h>
|
This commit does a couple of things:
Re-enables the RSS limiting, and the routine is now tail-recursive,
making it much more safe (eliminates the possiblity of kernel stack
overflow.) Also, the RSS limiting is a little more intelligent about
finding the likely objects that are pushing the process over the limit.
Added some sysctls that help with VM system tuning.
New sysctl features:
1) Enable/disable lru pageout algorithm.
vm.pageout_algorithm = 0, default algorithm that works
well, especially using X windows and heavy
memory loading. Can have adverse effects,
sometimes slowing down program loading.
vm.pageout_algorithm = 1, close to true LRU. Works much
better than clock, etc. Does not work as well as
the default algorithm in general. Certain memory
"malloc" type benchmarks work a little better with
this setting.
Please give me feedback on the performance results
associated with these.
2) Enable/disable swapping.
vm.swapping_enabled = 1, default.
vm.swapping_enabled = 0, useful for cases where swapping
degrades performance.
The config option "NO_SWAPPING" is still operative, and
takes precedence over the sysctl. If "NO_SWAPPING" is
specified, the sysctl still exists, but "vm.swapping_enabled"
is hard-wired to "0".
Each of these can be changed "on the fly."
1996-06-26 05:39:27 +00:00
|
|
|
#include <sys/sysctl.h>
|
1994-05-24 10:09:53 +00:00
|
|
|
|
|
|
|
#include <vm/vm.h>
|
1995-12-07 12:48:31 +00:00
|
|
|
#include <vm/vm_param.h>
|
|
|
|
#include <vm/vm_object.h>
|
1994-05-24 10:09:53 +00:00
|
|
|
#include <vm/vm_page.h>
|
1995-12-07 12:48:31 +00:00
|
|
|
#include <vm/vm_map.h>
|
1994-05-24 10:09:53 +00:00
|
|
|
#include <vm/vm_pageout.h>
|
NOTE: libkvm, w, ps, 'top', and any other utility which depends on struct
proc or any VM system structure will have to be rebuilt!!!
Much needed overhaul of the VM system. Included in this first round of
changes:
1) Improved pager interfaces: init, alloc, dealloc, getpages, putpages,
haspage, and sync operations are supported. The haspage interface now
provides information about clusterability. All pager routines now take
struct vm_object's instead of "pagers".
2) Improved data structures. In the previous paradigm, there is constant
confusion caused by pagers being both a data structure ("allocate a
pager") and a collection of routines. The idea of a pager structure has
escentially been eliminated. Objects now have types, and this type is
used to index the appropriate pager. In most cases, items in the pager
structure were duplicated in the object data structure and thus were
unnecessary. In the few cases that remained, a un_pager structure union
was created in the object to contain these items.
3) Because of the cleanup of #1 & #2, a lot of unnecessary layering can now
be removed. For instance, vm_object_enter(), vm_object_lookup(),
vm_object_remove(), and the associated object hash list were some of the
things that were removed.
4) simple_lock's removed. Discussion with several people reveals that the
SMP locking primitives used in the VM system aren't likely the mechanism
that we'll be adopting. Even if it were, the locking that was in the code
was very inadequate and would have to be mostly re-done anyway. The
locking in a uni-processor kernel was a no-op but went a long way toward
making the code difficult to read and debug.
5) Places that attempted to kludge-up the fact that we don't have kernel
thread support have been fixed to reflect the reality that we are really
dealing with processes, not threads. The VM system didn't have complete
thread support, so the comments and mis-named routines were just wrong.
We now use tsleep and wakeup directly in the lock routines, for instance.
6) Where appropriate, the pagers have been improved, especially in the
pager_alloc routines. Most of the pager_allocs have been rewritten and
are now faster and easier to maintain.
7) The pagedaemon pageout clustering algorithm has been rewritten and
now tries harder to output an even number of pages before and after
the requested page. This is sort of the reverse of the ideal pagein
algorithm and should provide better overall performance.
8) Unnecessary (incorrect) casts to caddr_t in calls to tsleep & wakeup
have been removed. Some other unnecessary casts have also been removed.
9) Some almost useless debugging code removed.
10) Terminology of shadow objects vs. backing objects straightened out.
The fact that the vm_object data structure escentially had this
backwards really confused things. The use of "shadow" and "backing
object" throughout the code is now internally consistent and correct
in the Mach terminology.
11) Several minor bug fixes, including one in the vm daemon that caused
0 RSS objects to not get purged as intended.
12) A "default pager" has now been created which cleans up the transition
of objects to the "swap" type. The previous checks throughout the code
for swp->pg_data != NULL were really ugly. This change also provides
the rudiments for future backing of "anonymous" memory by something
other than the swap pager (via the vnode pager, for example), and it
allows the decision about which of these pagers to use to be made
dynamically (although will need some additional decision code to do
this, of course).
13) (dyson) MAP_COPY has been deprecated and the corresponding "copy
object" code has been removed. MAP_COPY was undocumented and non-
standard. It was furthermore broken in several ways which caused its
behavior to degrade to MAP_PRIVATE. Binaries that use MAP_COPY will
continue to work correctly, but via the slightly different semantics
of MAP_PRIVATE.
14) (dyson) Sharing maps have been removed. It's marginal usefulness in a
threads design can be worked around in other ways. Both #12 and #13
were done to simplify the code and improve readability and maintain-
ability. (As were most all of these changes)
TODO:
1) Rewrite most of the vnode pager to use VOP_GETPAGES/PUTPAGES. Doing
this will reduce the vnode pager to a mere fraction of its current size.
2) Rewrite vm_fault and the swap/vnode pagers to use the clustering
information provided by the new haspage pager interface. This will
substantially reduce the overhead by eliminating a large number of
VOP_BMAP() calls. The VOP_BMAP() filesystem interface should be
improved to provide both a "behind" and "ahead" indication of
contiguousness.
3) Implement the extended features of pager_haspage in swap_pager_haspage().
It currently just says 0 pages ahead/behind.
4) Re-implement the swap device (swstrategy) in a more elegant way, perhaps
via a much more general mechanism that could also be used for disk
striping of regular filesystems.
5) Do something to improve the architecture of vm_object_collapse(). The
fact that it makes calls into the swap pager and knows too much about
how the swap pager operates really bothers me. It also doesn't allow
for collapsing of non-swap pager objects ("unnamed" objects backed by
other pagers).
1995-07-13 08:48:48 +00:00
|
|
|
#include <vm/vm_pager.h>
|
Split the pagequeues per NUMA domains, and split pageademon process
into threads each processing queue in a single domain. The structure
of the pagedaemons and queues is kept intact, most of the changes come
from the need for code to find an owning page queue for given page,
calculated from the segment containing the page.
The tie between NUMA domain and pagedaemon thread/pagequeue split is
rather arbitrary, the multithreaded daemon could be allowed for the
single-domain machines, or one domain might be split into several page
domains, to further increase concurrency.
Right now, each pagedaemon thread tries to reach the global target,
precalculated at the start of the pass. This is not optimal, since it
could cause excessive page deactivation and freeing. The code should
be changed to re-check the global page deficit state in the loop after
some number of iterations.
The pagedaemons reach the quorum before starting the OOM, since one
thread inability to meet the target is normal for split queues. Only
when all pagedaemons fail to produce enough reusable pages, OOM is
started by single selected thread.
Launder is modified to take into account the segments layout with
regard to the region for which cleaning is performed.
Based on the preliminary patch by jeff, sponsored by EMC / Isilon
Storage Division.
Reviewed by: alc
Tested by: pho
Sponsored by: The FreeBSD Foundation
2013-08-07 16:36:38 +00:00
|
|
|
#include <vm/vm_phys.h>
|
2018-02-06 22:10:07 +00:00
|
|
|
#include <vm/vm_pagequeue.h>
|
1994-10-09 01:52:19 +00:00
|
|
|
#include <vm/swap_pager.h>
|
1995-12-07 12:48:31 +00:00
|
|
|
#include <vm/vm_extern.h>
|
2002-03-20 04:02:59 +00:00
|
|
|
#include <vm/uma.h>
|
1994-05-24 10:09:53 +00:00
|
|
|
|
1995-08-28 09:19:25 +00:00
|
|
|
/*
|
|
|
|
* System initialization
|
|
|
|
*/
|
|
|
|
|
|
|
|
/* the kernel process "vm_pageout"*/
|
2002-03-19 22:20:14 +00:00
|
|
|
static void vm_pageout(void);
|
2014-08-28 19:50:08 +00:00
|
|
|
static void vm_pageout_init(void);
|
Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty
pages, specificially, dirty pages that have passed once through the inactive
queue. A new, dedicated thread is responsible for both deciding when to
launder pages and actually laundering them. The new policy uses the
relative sizes of the inactive and laundry queues to determine whether to
launder pages at a given point in time. In general, this leads to more
intelligent swapping behavior, since the laundry thread will avoid pageouts
when the marginal benefit of doing so is low. Previously, without a
dedicated queue for dirty pages, the page daemon didn't have the information
to determine whether pageout provides any benefit to the system. Thus, the
previous policy often resulted in small but steadily increasing amounts of
swap usage when the system is under memory pressure, even when the inactive
queue consisted mostly of clean pages. This change addresses that issue,
and also paves the way for some future virtual memory system improvements by
removing the last source of object-cached clean pages, i.e., PG_CACHE pages.
The new laundry thread sleeps while waiting for a request from the page
daemon thread(s). A request is raised by setting the variable
vm_laundry_request and waking the laundry thread. We request launderings
for two reasons: to try and balance the inactive and laundry queue sizes
("background laundering"), and to quickly make up for a shortage of free
pages and clean inactive pages ("shortfall laundering"). When background
laundering is requested, the laundry thread computes the number of page
daemon wakeups that have taken place since the last laundering. If this
number is large enough relative to the ratio of the laundry and (global)
inactive queue sizes, we will launder vm_background_launder_target pages at
vm_background_launder_rate KB/s. Otherwise, the laundry thread goes back
to sleep without doing any work. When scanning the laundry queue during
background laundering, reactivated pages are counted towards the laundry
thread's target.
In contrast, shortfall laundering is requested when an inactive queue scan
fails to meet its target. In this case, the laundry thread attempts to
launder enough pages to meet v_free_target within 0.5s, which is the
inactive queue scan period.
A laundry request can be latched while another is currently being
serviced. In particular, a shortfall request will immediately preempt a
background laundering.
This change also redefines the meaning of vm_cnt.v_reactivated and removes
the functions vm_page_cache() and vm_page_try_to_cache(). The new meaning
of vm_cnt.v_reactivated now better reflects its name. It represents the
number of inactive or laundry pages that are returned to the active queue
on account of a reference.
In collaboration with: markj
Reviewed by: kib
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8302
2016-11-09 18:48:37 +00:00
|
|
|
static int vm_pageout_clean(vm_page_t m, int *numpagedout);
|
2015-04-07 02:18:52 +00:00
|
|
|
static int vm_pageout_cluster(vm_page_t m);
|
Rework the test which raises OOM condition. Right now, the code
checks for the swap space consumption plus checks that the amount of
the free pages exceeds some limit, in case pagedeamon did not coped
with the page shortage in one of the late passes. This is wrong
because it does not account for the presence of the reclamaible pages
in the queues which are not selectable for reclaim immediately. E.g.,
on the swap-less systems, large active queue easily triggered OOM.
Instead, only raise OOM when pagedaemon is unable to produce a free
page in several back-to-back passes. Track the failed passes per
pagedaemon thread.
The number of passes to trigger OOM was selected empirically and
tested both on small (32M-64M i386 VM) and large (32G amd64)
configurations. If the specifics of the load require tuning, sysctl
vm.pageout_oom_seq sets the number of back-to-back passes which must
fail before OOM is raised. Each pass takes 1/2 of seconds. Less the
value, more sensible the pagedaemon is to the page shortage.
In future, some heuristic to calculate the value of the tunable might
be designed based on the system configuration and load. But before it
can be done, the i/o system must be fixed to reliably time-out
pagedaemon writes, even if waiting for the memory to proceed. Then,
code can account for the in-flight page-outs and postpone OOM until
all of them finished, which should reduce the need in tuning. Right
now, ignoring the in-flight writes and the counter allows to break
deadlocks due to write path doing sleepable memory allocations.
Reported by: Dmitry Sivachenko, bde, many others
Tested by: pho, bde, tuexen (arm)
Reviewed by: alc
Discussed with: bde, imp
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
2015-11-16 06:26:26 +00:00
|
|
|
static void vm_pageout_mightbe_oom(struct vm_domain *vmd, int page_shortage,
|
|
|
|
int starting_page_shortage);
|
2003-09-19 05:03:45 +00:00
|
|
|
|
2014-08-28 19:50:08 +00:00
|
|
|
SYSINIT(pagedaemon_init, SI_SUB_KTHREAD_PAGE, SI_ORDER_FIRST, vm_pageout_init,
|
|
|
|
NULL);
|
|
|
|
|
1995-08-28 09:19:25 +00:00
|
|
|
struct proc *pageproc;
|
|
|
|
|
|
|
|
static struct kproc_desc page_kp = {
|
|
|
|
"pagedaemon",
|
|
|
|
vm_pageout,
|
|
|
|
&pageproc
|
|
|
|
};
|
2014-08-28 19:50:08 +00:00
|
|
|
SYSINIT(pagedaemon, SI_SUB_KTHREAD_PAGE, SI_ORDER_SECOND, kproc_start,
|
2008-03-16 10:58:09 +00:00
|
|
|
&page_kp);
|
1995-08-28 09:19:25 +00:00
|
|
|
|
2014-10-03 20:34:55 +00:00
|
|
|
SDT_PROVIDER_DEFINE(vm);
|
|
|
|
SDT_PROBE_DEFINE(vm, , , vm__lowmem_scan);
|
|
|
|
|
Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty
pages, specificially, dirty pages that have passed once through the inactive
queue. A new, dedicated thread is responsible for both deciding when to
launder pages and actually laundering them. The new policy uses the
relative sizes of the inactive and laundry queues to determine whether to
launder pages at a given point in time. In general, this leads to more
intelligent swapping behavior, since the laundry thread will avoid pageouts
when the marginal benefit of doing so is low. Previously, without a
dedicated queue for dirty pages, the page daemon didn't have the information
to determine whether pageout provides any benefit to the system. Thus, the
previous policy often resulted in small but steadily increasing amounts of
swap usage when the system is under memory pressure, even when the inactive
queue consisted mostly of clean pages. This change addresses that issue,
and also paves the way for some future virtual memory system improvements by
removing the last source of object-cached clean pages, i.e., PG_CACHE pages.
The new laundry thread sleeps while waiting for a request from the page
daemon thread(s). A request is raised by setting the variable
vm_laundry_request and waking the laundry thread. We request launderings
for two reasons: to try and balance the inactive and laundry queue sizes
("background laundering"), and to quickly make up for a shortage of free
pages and clean inactive pages ("shortfall laundering"). When background
laundering is requested, the laundry thread computes the number of page
daemon wakeups that have taken place since the last laundering. If this
number is large enough relative to the ratio of the laundry and (global)
inactive queue sizes, we will launder vm_background_launder_target pages at
vm_background_launder_rate KB/s. Otherwise, the laundry thread goes back
to sleep without doing any work. When scanning the laundry queue during
background laundering, reactivated pages are counted towards the laundry
thread's target.
In contrast, shortfall laundering is requested when an inactive queue scan
fails to meet its target. In this case, the laundry thread attempts to
launder enough pages to meet v_free_target within 0.5s, which is the
inactive queue scan period.
A laundry request can be latched while another is currently being
serviced. In particular, a shortfall request will immediately preempt a
background laundering.
This change also redefines the meaning of vm_cnt.v_reactivated and removes
the functions vm_page_cache() and vm_page_try_to_cache(). The new meaning
of vm_cnt.v_reactivated now better reflects its name. It represents the
number of inactive or laundry pages that are returned to the active queue
on account of a reference.
In collaboration with: markj
Reviewed by: kib
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8302
2016-11-09 18:48:37 +00:00
|
|
|
/* Pagedaemon activity rates, in subdivisions of one second. */
|
|
|
|
#define VM_LAUNDER_RATE 10
|
2018-02-23 22:51:51 +00:00
|
|
|
#define VM_INACT_SCAN_RATE 10
|
1995-08-28 09:19:25 +00:00
|
|
|
|
Rework the test which raises OOM condition. Right now, the code
checks for the swap space consumption plus checks that the amount of
the free pages exceeds some limit, in case pagedeamon did not coped
with the page shortage in one of the late passes. This is wrong
because it does not account for the presence of the reclamaible pages
in the queues which are not selectable for reclaim immediately. E.g.,
on the swap-less systems, large active queue easily triggered OOM.
Instead, only raise OOM when pagedaemon is unable to produce a free
page in several back-to-back passes. Track the failed passes per
pagedaemon thread.
The number of passes to trigger OOM was selected empirically and
tested both on small (32M-64M i386 VM) and large (32G amd64)
configurations. If the specifics of the load require tuning, sysctl
vm.pageout_oom_seq sets the number of back-to-back passes which must
fail before OOM is raised. Each pass takes 1/2 of seconds. Less the
value, more sensible the pagedaemon is to the page shortage.
In future, some heuristic to calculate the value of the tunable might
be designed based on the system configuration and load. But before it
can be done, the i/o system must be fixed to reliably time-out
pagedaemon writes, even if waiting for the memory to proceed. Then,
code can account for the in-flight page-outs and postpone OOM until
all of them finished, which should reduce the need in tuning. Right
now, ignoring the in-flight writes and the counter allows to break
deadlocks due to write path doing sleepable memory allocations.
Reported by: Dmitry Sivachenko, bde, many others
Tested by: pho, bde, tuexen (arm)
Reviewed by: alc
Discussed with: bde, imp
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
2015-11-16 06:26:26 +00:00
|
|
|
static int vm_pageout_oom_seq = 12;
|
Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty
pages, specificially, dirty pages that have passed once through the inactive
queue. A new, dedicated thread is responsible for both deciding when to
launder pages and actually laundering them. The new policy uses the
relative sizes of the inactive and laundry queues to determine whether to
launder pages at a given point in time. In general, this leads to more
intelligent swapping behavior, since the laundry thread will avoid pageouts
when the marginal benefit of doing so is low. Previously, without a
dedicated queue for dirty pages, the page daemon didn't have the information
to determine whether pageout provides any benefit to the system. Thus, the
previous policy often resulted in small but steadily increasing amounts of
swap usage when the system is under memory pressure, even when the inactive
queue consisted mostly of clean pages. This change addresses that issue,
and also paves the way for some future virtual memory system improvements by
removing the last source of object-cached clean pages, i.e., PG_CACHE pages.
The new laundry thread sleeps while waiting for a request from the page
daemon thread(s). A request is raised by setting the variable
vm_laundry_request and waking the laundry thread. We request launderings
for two reasons: to try and balance the inactive and laundry queue sizes
("background laundering"), and to quickly make up for a shortage of free
pages and clean inactive pages ("shortfall laundering"). When background
laundering is requested, the laundry thread computes the number of page
daemon wakeups that have taken place since the last laundering. If this
number is large enough relative to the ratio of the laundry and (global)
inactive queue sizes, we will launder vm_background_launder_target pages at
vm_background_launder_rate KB/s. Otherwise, the laundry thread goes back
to sleep without doing any work. When scanning the laundry queue during
background laundering, reactivated pages are counted towards the laundry
thread's target.
In contrast, shortfall laundering is requested when an inactive queue scan
fails to meet its target. In this case, the laundry thread attempts to
launder enough pages to meet v_free_target within 0.5s, which is the
inactive queue scan period.
A laundry request can be latched while another is currently being
serviced. In particular, a shortfall request will immediately preempt a
background laundering.
This change also redefines the meaning of vm_cnt.v_reactivated and removes
the functions vm_page_cache() and vm_page_try_to_cache(). The new meaning
of vm_cnt.v_reactivated now better reflects its name. It represents the
number of inactive or laundry pages that are returned to the active queue
on account of a reference.
In collaboration with: markj
Reviewed by: kib
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8302
2016-11-09 18:48:37 +00:00
|
|
|
|
2013-08-13 21:56:16 +00:00
|
|
|
static int vm_pageout_update_period;
|
2013-01-28 12:08:29 +00:00
|
|
|
static int disable_swap_pageouts;
|
2013-08-19 23:54:24 +00:00
|
|
|
static int lowmem_period = 10;
|
2017-01-03 00:05:44 +00:00
|
|
|
static int swapdev_enabled;
|
1997-12-05 05:41:06 +00:00
|
|
|
|
Add vm.panic_on_oom sysctl, which enables those who would rather panic than
kill a process, when the system runs out of memory. Defaults to off.
Usually, this is most useful when the OOM condition is due to mismanagement
of memory, on a system where the applications in question don't respond well
to being killed.
In theory, if the system is properly managed, it shouldn't be possible to
hit this condition. If it does, the panic can be more desirable for some
users (since it can be a good means of finding the root cause) rather than
killing the largest process and continuing on its merry way.
As kib@ mentions in the differential, there is also protect(1), which uses
procctl(PROC_SPROTECT) to ensure that some processes are immune. However,
a panic approach is still useful in some environments. This is primarily
intended as a development/debugging tool.
Differential Revision: D1627
Reviewed by: kib
MFC after: 1 week
2015-01-24 17:32:45 +00:00
|
|
|
static int vm_panic_on_oom = 0;
|
|
|
|
|
|
|
|
SYSCTL_INT(_vm, OID_AUTO, panic_on_oom,
|
|
|
|
CTLFLAG_RWTUN, &vm_panic_on_oom, 0,
|
|
|
|
"panic on out of memory instead of killing the largest process");
|
|
|
|
|
2013-08-13 21:56:16 +00:00
|
|
|
SYSCTL_INT(_vm, OID_AUTO, pageout_update_period,
|
2017-11-08 19:55:17 +00:00
|
|
|
CTLFLAG_RWTUN, &vm_pageout_update_period, 0,
|
2013-08-13 21:56:16 +00:00
|
|
|
"Maximum active LRU update period");
|
|
|
|
|
2017-11-08 19:55:17 +00:00
|
|
|
SYSCTL_INT(_vm, OID_AUTO, lowmem_period, CTLFLAG_RWTUN, &lowmem_period, 0,
|
2013-08-19 23:54:24 +00:00
|
|
|
"Low memory callback period");
|
|
|
|
|
1997-12-06 02:23:36 +00:00
|
|
|
SYSCTL_INT(_vm, OID_AUTO, disable_swapspace_pageouts,
|
2017-11-08 19:55:17 +00:00
|
|
|
CTLFLAG_RWTUN, &disable_swap_pageouts, 0, "Disallow swapout of dirty pages");
|
1997-12-04 19:00:56 +00:00
|
|
|
|
2001-12-20 22:42:27 +00:00
|
|
|
static int pageout_lock_miss;
|
|
|
|
SYSCTL_INT(_vm, OID_AUTO, pageout_lock_miss,
|
|
|
|
CTLFLAG_RD, &pageout_lock_miss, 0, "vget() lock misses during pageout");
|
|
|
|
|
Rework the test which raises OOM condition. Right now, the code
checks for the swap space consumption plus checks that the amount of
the free pages exceeds some limit, in case pagedeamon did not coped
with the page shortage in one of the late passes. This is wrong
because it does not account for the presence of the reclamaible pages
in the queues which are not selectable for reclaim immediately. E.g.,
on the swap-less systems, large active queue easily triggered OOM.
Instead, only raise OOM when pagedaemon is unable to produce a free
page in several back-to-back passes. Track the failed passes per
pagedaemon thread.
The number of passes to trigger OOM was selected empirically and
tested both on small (32M-64M i386 VM) and large (32G amd64)
configurations. If the specifics of the load require tuning, sysctl
vm.pageout_oom_seq sets the number of back-to-back passes which must
fail before OOM is raised. Each pass takes 1/2 of seconds. Less the
value, more sensible the pagedaemon is to the page shortage.
In future, some heuristic to calculate the value of the tunable might
be designed based on the system configuration and load. But before it
can be done, the i/o system must be fixed to reliably time-out
pagedaemon writes, even if waiting for the memory to proceed. Then,
code can account for the in-flight page-outs and postpone OOM until
all of them finished, which should reduce the need in tuning. Right
now, ignoring the in-flight writes and the counter allows to break
deadlocks due to write path doing sleepable memory allocations.
Reported by: Dmitry Sivachenko, bde, many others
Tested by: pho, bde, tuexen (arm)
Reviewed by: alc
Discussed with: bde, imp
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
2015-11-16 06:26:26 +00:00
|
|
|
SYSCTL_INT(_vm, OID_AUTO, pageout_oom_seq,
|
2017-11-08 19:55:17 +00:00
|
|
|
CTLFLAG_RWTUN, &vm_pageout_oom_seq, 0,
|
Rework the test which raises OOM condition. Right now, the code
checks for the swap space consumption plus checks that the amount of
the free pages exceeds some limit, in case pagedeamon did not coped
with the page shortage in one of the late passes. This is wrong
because it does not account for the presence of the reclamaible pages
in the queues which are not selectable for reclaim immediately. E.g.,
on the swap-less systems, large active queue easily triggered OOM.
Instead, only raise OOM when pagedaemon is unable to produce a free
page in several back-to-back passes. Track the failed passes per
pagedaemon thread.
The number of passes to trigger OOM was selected empirically and
tested both on small (32M-64M i386 VM) and large (32G amd64)
configurations. If the specifics of the load require tuning, sysctl
vm.pageout_oom_seq sets the number of back-to-back passes which must
fail before OOM is raised. Each pass takes 1/2 of seconds. Less the
value, more sensible the pagedaemon is to the page shortage.
In future, some heuristic to calculate the value of the tunable might
be designed based on the system configuration and load. But before it
can be done, the i/o system must be fixed to reliably time-out
pagedaemon writes, even if waiting for the memory to proceed. Then,
code can account for the in-flight page-outs and postpone OOM until
all of them finished, which should reduce the need in tuning. Right
now, ignoring the in-flight writes and the counter allows to break
deadlocks due to write path doing sleepable memory allocations.
Reported by: Dmitry Sivachenko, bde, many others
Tested by: pho, bde, tuexen (arm)
Reviewed by: alc
Discussed with: bde, imp
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
2015-11-16 06:26:26 +00:00
|
|
|
"back-to-back calls to oom detector to start OOM");
|
|
|
|
|
Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty
pages, specificially, dirty pages that have passed once through the inactive
queue. A new, dedicated thread is responsible for both deciding when to
launder pages and actually laundering them. The new policy uses the
relative sizes of the inactive and laundry queues to determine whether to
launder pages at a given point in time. In general, this leads to more
intelligent swapping behavior, since the laundry thread will avoid pageouts
when the marginal benefit of doing so is low. Previously, without a
dedicated queue for dirty pages, the page daemon didn't have the information
to determine whether pageout provides any benefit to the system. Thus, the
previous policy often resulted in small but steadily increasing amounts of
swap usage when the system is under memory pressure, even when the inactive
queue consisted mostly of clean pages. This change addresses that issue,
and also paves the way for some future virtual memory system improvements by
removing the last source of object-cached clean pages, i.e., PG_CACHE pages.
The new laundry thread sleeps while waiting for a request from the page
daemon thread(s). A request is raised by setting the variable
vm_laundry_request and waking the laundry thread. We request launderings
for two reasons: to try and balance the inactive and laundry queue sizes
("background laundering"), and to quickly make up for a shortage of free
pages and clean inactive pages ("shortfall laundering"). When background
laundering is requested, the laundry thread computes the number of page
daemon wakeups that have taken place since the last laundering. If this
number is large enough relative to the ratio of the laundry and (global)
inactive queue sizes, we will launder vm_background_launder_target pages at
vm_background_launder_rate KB/s. Otherwise, the laundry thread goes back
to sleep without doing any work. When scanning the laundry queue during
background laundering, reactivated pages are counted towards the laundry
thread's target.
In contrast, shortfall laundering is requested when an inactive queue scan
fails to meet its target. In this case, the laundry thread attempts to
launder enough pages to meet v_free_target within 0.5s, which is the
inactive queue scan period.
A laundry request can be latched while another is currently being
serviced. In particular, a shortfall request will immediately preempt a
background laundering.
This change also redefines the meaning of vm_cnt.v_reactivated and removes
the functions vm_page_cache() and vm_page_try_to_cache(). The new meaning
of vm_cnt.v_reactivated now better reflects its name. It represents the
number of inactive or laundry pages that are returned to the active queue
on account of a reference.
In collaboration with: markj
Reviewed by: kib
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8302
2016-11-09 18:48:37 +00:00
|
|
|
static int act_scan_laundry_weight = 3;
|
2017-11-08 19:55:17 +00:00
|
|
|
SYSCTL_INT(_vm, OID_AUTO, act_scan_laundry_weight, CTLFLAG_RWTUN,
|
Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty
pages, specificially, dirty pages that have passed once through the inactive
queue. A new, dedicated thread is responsible for both deciding when to
launder pages and actually laundering them. The new policy uses the
relative sizes of the inactive and laundry queues to determine whether to
launder pages at a given point in time. In general, this leads to more
intelligent swapping behavior, since the laundry thread will avoid pageouts
when the marginal benefit of doing so is low. Previously, without a
dedicated queue for dirty pages, the page daemon didn't have the information
to determine whether pageout provides any benefit to the system. Thus, the
previous policy often resulted in small but steadily increasing amounts of
swap usage when the system is under memory pressure, even when the inactive
queue consisted mostly of clean pages. This change addresses that issue,
and also paves the way for some future virtual memory system improvements by
removing the last source of object-cached clean pages, i.e., PG_CACHE pages.
The new laundry thread sleeps while waiting for a request from the page
daemon thread(s). A request is raised by setting the variable
vm_laundry_request and waking the laundry thread. We request launderings
for two reasons: to try and balance the inactive and laundry queue sizes
("background laundering"), and to quickly make up for a shortage of free
pages and clean inactive pages ("shortfall laundering"). When background
laundering is requested, the laundry thread computes the number of page
daemon wakeups that have taken place since the last laundering. If this
number is large enough relative to the ratio of the laundry and (global)
inactive queue sizes, we will launder vm_background_launder_target pages at
vm_background_launder_rate KB/s. Otherwise, the laundry thread goes back
to sleep without doing any work. When scanning the laundry queue during
background laundering, reactivated pages are counted towards the laundry
thread's target.
In contrast, shortfall laundering is requested when an inactive queue scan
fails to meet its target. In this case, the laundry thread attempts to
launder enough pages to meet v_free_target within 0.5s, which is the
inactive queue scan period.
A laundry request can be latched while another is currently being
serviced. In particular, a shortfall request will immediately preempt a
background laundering.
This change also redefines the meaning of vm_cnt.v_reactivated and removes
the functions vm_page_cache() and vm_page_try_to_cache(). The new meaning
of vm_cnt.v_reactivated now better reflects its name. It represents the
number of inactive or laundry pages that are returned to the active queue
on account of a reference.
In collaboration with: markj
Reviewed by: kib
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8302
2016-11-09 18:48:37 +00:00
|
|
|
&act_scan_laundry_weight, 0,
|
|
|
|
"weight given to clean vs. dirty pages in active queue scans");
|
|
|
|
|
|
|
|
static u_int vm_background_launder_rate = 4096;
|
2017-11-08 19:55:17 +00:00
|
|
|
SYSCTL_UINT(_vm, OID_AUTO, background_launder_rate, CTLFLAG_RWTUN,
|
Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty
pages, specificially, dirty pages that have passed once through the inactive
queue. A new, dedicated thread is responsible for both deciding when to
launder pages and actually laundering them. The new policy uses the
relative sizes of the inactive and laundry queues to determine whether to
launder pages at a given point in time. In general, this leads to more
intelligent swapping behavior, since the laundry thread will avoid pageouts
when the marginal benefit of doing so is low. Previously, without a
dedicated queue for dirty pages, the page daemon didn't have the information
to determine whether pageout provides any benefit to the system. Thus, the
previous policy often resulted in small but steadily increasing amounts of
swap usage when the system is under memory pressure, even when the inactive
queue consisted mostly of clean pages. This change addresses that issue,
and also paves the way for some future virtual memory system improvements by
removing the last source of object-cached clean pages, i.e., PG_CACHE pages.
The new laundry thread sleeps while waiting for a request from the page
daemon thread(s). A request is raised by setting the variable
vm_laundry_request and waking the laundry thread. We request launderings
for two reasons: to try and balance the inactive and laundry queue sizes
("background laundering"), and to quickly make up for a shortage of free
pages and clean inactive pages ("shortfall laundering"). When background
laundering is requested, the laundry thread computes the number of page
daemon wakeups that have taken place since the last laundering. If this
number is large enough relative to the ratio of the laundry and (global)
inactive queue sizes, we will launder vm_background_launder_target pages at
vm_background_launder_rate KB/s. Otherwise, the laundry thread goes back
to sleep without doing any work. When scanning the laundry queue during
background laundering, reactivated pages are counted towards the laundry
thread's target.
In contrast, shortfall laundering is requested when an inactive queue scan
fails to meet its target. In this case, the laundry thread attempts to
launder enough pages to meet v_free_target within 0.5s, which is the
inactive queue scan period.
A laundry request can be latched while another is currently being
serviced. In particular, a shortfall request will immediately preempt a
background laundering.
This change also redefines the meaning of vm_cnt.v_reactivated and removes
the functions vm_page_cache() and vm_page_try_to_cache(). The new meaning
of vm_cnt.v_reactivated now better reflects its name. It represents the
number of inactive or laundry pages that are returned to the active queue
on account of a reference.
In collaboration with: markj
Reviewed by: kib
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8302
2016-11-09 18:48:37 +00:00
|
|
|
&vm_background_launder_rate, 0,
|
|
|
|
"background laundering rate, in kilobytes per second");
|
|
|
|
|
|
|
|
static u_int vm_background_launder_max = 20 * 1024;
|
2017-11-08 19:55:17 +00:00
|
|
|
SYSCTL_UINT(_vm, OID_AUTO, background_launder_max, CTLFLAG_RWTUN,
|
Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty
pages, specificially, dirty pages that have passed once through the inactive
queue. A new, dedicated thread is responsible for both deciding when to
launder pages and actually laundering them. The new policy uses the
relative sizes of the inactive and laundry queues to determine whether to
launder pages at a given point in time. In general, this leads to more
intelligent swapping behavior, since the laundry thread will avoid pageouts
when the marginal benefit of doing so is low. Previously, without a
dedicated queue for dirty pages, the page daemon didn't have the information
to determine whether pageout provides any benefit to the system. Thus, the
previous policy often resulted in small but steadily increasing amounts of
swap usage when the system is under memory pressure, even when the inactive
queue consisted mostly of clean pages. This change addresses that issue,
and also paves the way for some future virtual memory system improvements by
removing the last source of object-cached clean pages, i.e., PG_CACHE pages.
The new laundry thread sleeps while waiting for a request from the page
daemon thread(s). A request is raised by setting the variable
vm_laundry_request and waking the laundry thread. We request launderings
for two reasons: to try and balance the inactive and laundry queue sizes
("background laundering"), and to quickly make up for a shortage of free
pages and clean inactive pages ("shortfall laundering"). When background
laundering is requested, the laundry thread computes the number of page
daemon wakeups that have taken place since the last laundering. If this
number is large enough relative to the ratio of the laundry and (global)
inactive queue sizes, we will launder vm_background_launder_target pages at
vm_background_launder_rate KB/s. Otherwise, the laundry thread goes back
to sleep without doing any work. When scanning the laundry queue during
background laundering, reactivated pages are counted towards the laundry
thread's target.
In contrast, shortfall laundering is requested when an inactive queue scan
fails to meet its target. In this case, the laundry thread attempts to
launder enough pages to meet v_free_target within 0.5s, which is the
inactive queue scan period.
A laundry request can be latched while another is currently being
serviced. In particular, a shortfall request will immediately preempt a
background laundering.
This change also redefines the meaning of vm_cnt.v_reactivated and removes
the functions vm_page_cache() and vm_page_try_to_cache(). The new meaning
of vm_cnt.v_reactivated now better reflects its name. It represents the
number of inactive or laundry pages that are returned to the active queue
on account of a reference.
In collaboration with: markj
Reviewed by: kib
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8302
2016-11-09 18:48:37 +00:00
|
|
|
&vm_background_launder_max, 0, "background laundering cap, in kilobytes");
|
|
|
|
|
2017-06-24 17:10:33 +00:00
|
|
|
int vm_pageout_page_count = 32;
|
1994-05-24 10:09:53 +00:00
|
|
|
|
Provide separate accounting for user-wired pages.
Historically we have not distinguished between kernel wirings and user
wirings for accounting purposes. User wirings (via mlock(2)) were
subject to a global limit on the number of wired pages, so if large
swaths of physical memory were wired by the kernel, as happens with
the ZFS ARC among other things, the limit could be exceeded, causing
user wirings to fail.
The change adds a new counter, v_user_wire_count, which counts the
number of virtual pages wired by user processes via mlock(2) and
mlockall(2). Only user-wired pages are subject to the system-wide
limit which helps provide some safety against deadlocks. In
particular, while sources of kernel wirings typically support some
backpressure mechanism, there is no way to reclaim user-wired pages
shorting of killing the wiring process. The limit is exported as
vm.max_user_wired, renamed from vm.max_wired, and changed from u_int
to u_long.
The choice to count virtual user-wired pages rather than physical
pages was done for simplicity. There are mechanisms that can cause
user-wired mappings to be destroyed while maintaining a wiring of
the backing physical page; these make it difficult to accurately
track user wirings at the physical page layer.
The change also closes some holes which allowed user wirings to succeed
even when they would cause the system limit to be exceeded. For
instance, mmap() may now fail with ENOMEM in a process that has called
mlockall(MCL_FUTURE) if the new mapping would cause the user wiring
limit to be exceeded.
Note that bhyve -S is subject to the user wiring limit, which defaults
to 1/3 of physical RAM. Users that wish to exceed the limit must tune
vm.max_user_wired.
Reviewed by: kib, ngie (mlock() test changes)
Tested by: pho (earlier version)
MFC after: 45 days
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D19908
2019-05-13 16:38:48 +00:00
|
|
|
u_long vm_page_max_user_wired;
|
|
|
|
SYSCTL_ULONG(_vm, OID_AUTO, max_user_wired, CTLFLAG_RW,
|
|
|
|
&vm_page_max_user_wired, 0,
|
|
|
|
"system-wide limit to user-wired page count");
|
1994-05-24 10:09:53 +00:00
|
|
|
|
Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty
pages, specificially, dirty pages that have passed once through the inactive
queue. A new, dedicated thread is responsible for both deciding when to
launder pages and actually laundering them. The new policy uses the
relative sizes of the inactive and laundry queues to determine whether to
launder pages at a given point in time. In general, this leads to more
intelligent swapping behavior, since the laundry thread will avoid pageouts
when the marginal benefit of doing so is low. Previously, without a
dedicated queue for dirty pages, the page daemon didn't have the information
to determine whether pageout provides any benefit to the system. Thus, the
previous policy often resulted in small but steadily increasing amounts of
swap usage when the system is under memory pressure, even when the inactive
queue consisted mostly of clean pages. This change addresses that issue,
and also paves the way for some future virtual memory system improvements by
removing the last source of object-cached clean pages, i.e., PG_CACHE pages.
The new laundry thread sleeps while waiting for a request from the page
daemon thread(s). A request is raised by setting the variable
vm_laundry_request and waking the laundry thread. We request launderings
for two reasons: to try and balance the inactive and laundry queue sizes
("background laundering"), and to quickly make up for a shortage of free
pages and clean inactive pages ("shortfall laundering"). When background
laundering is requested, the laundry thread computes the number of page
daemon wakeups that have taken place since the last laundering. If this
number is large enough relative to the ratio of the laundry and (global)
inactive queue sizes, we will launder vm_background_launder_target pages at
vm_background_launder_rate KB/s. Otherwise, the laundry thread goes back
to sleep without doing any work. When scanning the laundry queue during
background laundering, reactivated pages are counted towards the laundry
thread's target.
In contrast, shortfall laundering is requested when an inactive queue scan
fails to meet its target. In this case, the laundry thread attempts to
launder enough pages to meet v_free_target within 0.5s, which is the
inactive queue scan period.
A laundry request can be latched while another is currently being
serviced. In particular, a shortfall request will immediately preempt a
background laundering.
This change also redefines the meaning of vm_cnt.v_reactivated and removes
the functions vm_page_cache() and vm_page_try_to_cache(). The new meaning
of vm_cnt.v_reactivated now better reflects its name. It represents the
number of inactive or laundry pages that are returned to the active queue
on account of a reference.
In collaboration with: markj
Reviewed by: kib
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8302
2016-11-09 18:48:37 +00:00
|
|
|
static u_int isqrt(u_int num);
|
|
|
|
static int vm_pageout_launder(struct vm_domain *vmd, int launder,
|
|
|
|
bool in_shortfall);
|
|
|
|
static void vm_pageout_laundry_worker(void *arg);
|
2018-04-24 21:15:54 +00:00
|
|
|
|
|
|
|
struct scan_state {
|
|
|
|
struct vm_batchqueue bq;
|
|
|
|
struct vm_pagequeue *pq;
|
|
|
|
vm_page_t marker;
|
|
|
|
int maxscan;
|
|
|
|
int scanned;
|
|
|
|
};
|
|
|
|
|
|
|
|
static void
|
|
|
|
vm_pageout_init_scan(struct scan_state *ss, struct vm_pagequeue *pq,
|
|
|
|
vm_page_t marker, vm_page_t after, int maxscan)
|
|
|
|
{
|
|
|
|
|
|
|
|
vm_pagequeue_assert_locked(pq);
|
2019-09-16 15:04:45 +00:00
|
|
|
KASSERT((marker->aflags & PGA_ENQUEUED) == 0,
|
2018-04-24 21:15:54 +00:00
|
|
|
("marker %p already enqueued", marker));
|
|
|
|
|
|
|
|
if (after == NULL)
|
|
|
|
TAILQ_INSERT_HEAD(&pq->pq_pl, marker, plinks.q);
|
|
|
|
else
|
|
|
|
TAILQ_INSERT_AFTER(&pq->pq_pl, after, marker, plinks.q);
|
|
|
|
vm_page_aflag_set(marker, PGA_ENQUEUED);
|
|
|
|
|
|
|
|
vm_batchqueue_init(&ss->bq);
|
|
|
|
ss->pq = pq;
|
|
|
|
ss->marker = marker;
|
|
|
|
ss->maxscan = maxscan;
|
|
|
|
ss->scanned = 0;
|
|
|
|
vm_pagequeue_unlock(pq);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
|
|
vm_pageout_end_scan(struct scan_state *ss)
|
|
|
|
{
|
|
|
|
struct vm_pagequeue *pq;
|
|
|
|
|
|
|
|
pq = ss->pq;
|
|
|
|
vm_pagequeue_assert_locked(pq);
|
2019-09-16 15:04:45 +00:00
|
|
|
KASSERT((ss->marker->aflags & PGA_ENQUEUED) != 0,
|
2018-04-24 21:15:54 +00:00
|
|
|
("marker %p not enqueued", ss->marker));
|
|
|
|
|
|
|
|
TAILQ_REMOVE(&pq->pq_pl, ss->marker, plinks.q);
|
|
|
|
vm_page_aflag_clear(ss->marker, PGA_ENQUEUED);
|
2018-08-23 21:03:45 +00:00
|
|
|
pq->pq_pdpages += ss->scanned;
|
2018-04-24 21:15:54 +00:00
|
|
|
}
|
1995-10-07 19:02:56 +00:00
|
|
|
|
2010-05-06 04:57:33 +00:00
|
|
|
/*
|
2018-04-24 21:15:54 +00:00
|
|
|
* Add a small number of queued pages to a batch queue for later processing
|
|
|
|
* without the corresponding queue lock held. The caller must have enqueued a
|
|
|
|
* marker page at the desired start point for the scan. Pages will be
|
|
|
|
* physically dequeued if the caller so requests. Otherwise, the returned
|
|
|
|
* batch may contain marker pages, and it is up to the caller to handle them.
|
2010-05-06 04:57:33 +00:00
|
|
|
*
|
2018-05-13 13:00:59 +00:00
|
|
|
* When processing the batch queue, vm_page_queue() must be used to
|
|
|
|
* determine whether the page has been logically dequeued by another thread.
|
|
|
|
* Once this check is performed, the page lock guarantees that the page will
|
|
|
|
* not be disassociated from the queue.
|
2010-05-06 04:57:33 +00:00
|
|
|
*/
|
2018-04-24 21:15:54 +00:00
|
|
|
static __always_inline void
|
|
|
|
vm_pageout_collect_batch(struct scan_state *ss, const bool dequeue)
|
2010-05-06 04:57:33 +00:00
|
|
|
{
|
2012-11-13 02:50:39 +00:00
|
|
|
struct vm_pagequeue *pq;
|
2019-07-03 18:46:39 +00:00
|
|
|
vm_page_t m, marker, n;
|
2010-05-06 04:57:33 +00:00
|
|
|
|
2018-04-24 21:15:54 +00:00
|
|
|
marker = ss->marker;
|
|
|
|
pq = ss->pq;
|
2010-05-06 04:57:33 +00:00
|
|
|
|
2019-09-16 15:04:45 +00:00
|
|
|
KASSERT((marker->aflags & PGA_ENQUEUED) != 0,
|
2018-04-24 21:15:54 +00:00
|
|
|
("marker %p not enqueued", ss->marker));
|
2010-05-06 04:57:33 +00:00
|
|
|
|
2012-11-13 02:50:39 +00:00
|
|
|
vm_pagequeue_lock(pq);
|
2018-04-24 21:15:54 +00:00
|
|
|
for (m = TAILQ_NEXT(marker, plinks.q); m != NULL &&
|
|
|
|
ss->scanned < ss->maxscan && ss->bq.bq_cnt < VM_BATCHQUEUE_SIZE;
|
2019-07-03 18:46:39 +00:00
|
|
|
m = n, ss->scanned++) {
|
|
|
|
n = TAILQ_NEXT(m, plinks.q);
|
2018-04-24 21:15:54 +00:00
|
|
|
if ((m->flags & PG_MARKER) == 0) {
|
2019-09-16 15:04:45 +00:00
|
|
|
KASSERT((m->aflags & PGA_ENQUEUED) != 0,
|
2018-04-24 21:15:54 +00:00
|
|
|
("page %p not enqueued", m));
|
|
|
|
KASSERT((m->flags & PG_FICTITIOUS) == 0,
|
|
|
|
("Fictitious page %p cannot be in page queue", m));
|
|
|
|
KASSERT((m->oflags & VPO_UNMANAGED) == 0,
|
|
|
|
("Unmanaged page %p cannot be in page queue", m));
|
|
|
|
} else if (dequeue)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
(void)vm_batchqueue_insert(&ss->bq, m);
|
|
|
|
if (dequeue) {
|
|
|
|
TAILQ_REMOVE(&pq->pq_pl, m, plinks.q);
|
|
|
|
vm_page_aflag_clear(m, PGA_ENQUEUED);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
TAILQ_REMOVE(&pq->pq_pl, marker, plinks.q);
|
|
|
|
if (__predict_true(m != NULL))
|
|
|
|
TAILQ_INSERT_BEFORE(m, marker, plinks.q);
|
|
|
|
else
|
|
|
|
TAILQ_INSERT_TAIL(&pq->pq_pl, marker, plinks.q);
|
|
|
|
if (dequeue)
|
|
|
|
vm_pagequeue_cnt_add(pq, -ss->bq.bq_cnt);
|
|
|
|
vm_pagequeue_unlock(pq);
|
|
|
|
}
|
|
|
|
|
Change synchonization rules for vm_page reference counting.
There are several mechanisms by which a vm_page reference is held,
preventing the page from being freed back to the page allocator. In
particular, holding the page's object lock is sufficient to prevent the
page from being freed; holding the busy lock or a wiring is sufficent as
well. These references are protected by the page lock, which must
therefore be acquired for many per-page operations. This results in
false sharing since the page locks are external to the vm_page
structures themselves and each lock protects multiple structures.
Transition to using an atomically updated per-page reference counter.
The object's reference is counted using a flag bit in the counter. A
second flag bit is used to atomically block new references via
pmap_extract_and_hold() while removing managed mappings of a page.
Thus, the reference count of a page is guaranteed not to increase if the
page is unbusied, unmapped, and the object's write lock is held. As
a consequence of this, the page lock no longer protects a page's
identity; operations which move pages between objects are now
synchronized solely by the objects' locks.
The vm_page_wire() and vm_page_unwire() KPIs are changed. The former
requires that either the object lock or the busy lock is held. The
latter no longer has a return value and may free the page if it releases
the last reference to that page. vm_page_unwire_noq() behaves the same
as before; the caller is responsible for checking its return value and
freeing or enqueuing the page as appropriate. vm_page_wire_mapped() is
introduced for use in pmap_extract_and_hold(). It fails if the page is
concurrently being unmapped, typically triggering a fallback to the
fault handler. vm_page_wire() no longer requires the page lock and
vm_page_unwire() now internally acquires the page lock when releasing
the last wiring of a page (since the page lock still protects a page's
queue state). In particular, synchronization details are no longer
leaked into the caller.
The change excises the page lock from several frequently executed code
paths. In particular, vm_object_terminate() no longer bounces between
page locks as it releases an object's pages, and direct I/O and
sendfile(SF_NOCACHE) completions no longer require the page lock. In
these latter cases we now get linear scalability in the common scenario
where different threads are operating on different files.
__FreeBSD_version is bumped. The DRM ports have been updated to
accomodate the KPI changes.
Reviewed by: jeff (earlier version)
Tested by: gallatin (earlier version), pho
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20486
2019-09-09 21:32:42 +00:00
|
|
|
/*
|
|
|
|
* Return the next page to be scanned, or NULL if the scan is complete.
|
|
|
|
*/
|
2018-04-24 21:15:54 +00:00
|
|
|
static __always_inline vm_page_t
|
|
|
|
vm_pageout_next(struct scan_state *ss, const bool dequeue)
|
|
|
|
{
|
2010-05-06 04:57:33 +00:00
|
|
|
|
2018-04-24 21:15:54 +00:00
|
|
|
if (ss->bq.bq_cnt == 0)
|
|
|
|
vm_pageout_collect_batch(ss, dequeue);
|
|
|
|
return (vm_batchqueue_pop(&ss->bq));
|
2010-05-06 04:57:33 +00:00
|
|
|
}
|
|
|
|
|
1994-05-24 10:09:53 +00:00
|
|
|
/*
|
2016-08-04 16:20:12 +00:00
|
|
|
* Scan for pages at adjacent offsets within the given page's object that are
|
|
|
|
* eligible for laundering, form a cluster of these pages and the given page,
|
|
|
|
* and launder that cluster.
|
1994-05-24 10:09:53 +00:00
|
|
|
*/
|
1995-11-20 12:20:02 +00:00
|
|
|
static int
|
2015-04-07 02:18:52 +00:00
|
|
|
vm_pageout_cluster(vm_page_t m)
|
1994-05-24 10:09:53 +00:00
|
|
|
{
|
2001-07-04 19:00:13 +00:00
|
|
|
vm_object_t object;
|
2016-08-04 16:20:12 +00:00
|
|
|
vm_page_t mc[2 * vm_pageout_page_count], p, pb, ps;
|
|
|
|
vm_pindex_t pindex;
|
|
|
|
int ib, is, page_base, pageout_count;
|
1994-05-25 09:21:21 +00:00
|
|
|
|
2011-01-03 00:41:56 +00:00
|
|
|
object = m->object;
|
2013-02-21 21:54:53 +00:00
|
|
|
VM_OBJECT_ASSERT_WLOCKED(object);
|
2016-08-04 16:20:12 +00:00
|
|
|
pindex = m->pindex;
|
2001-07-04 16:20:28 +00:00
|
|
|
|
2013-08-09 11:11:11 +00:00
|
|
|
vm_page_assert_unbusied(m);
|
1994-05-24 10:09:53 +00:00
|
|
|
|
2010-06-21 23:27:24 +00:00
|
|
|
mc[vm_pageout_page_count] = pb = ps = m;
|
1994-05-25 09:21:21 +00:00
|
|
|
pageout_count = 1;
|
1996-05-31 00:38:04 +00:00
|
|
|
page_base = vm_pageout_page_count;
|
1999-09-17 04:56:40 +00:00
|
|
|
ib = 1;
|
|
|
|
is = 1;
|
|
|
|
|
NOTE: libkvm, w, ps, 'top', and any other utility which depends on struct
proc or any VM system structure will have to be rebuilt!!!
Much needed overhaul of the VM system. Included in this first round of
changes:
1) Improved pager interfaces: init, alloc, dealloc, getpages, putpages,
haspage, and sync operations are supported. The haspage interface now
provides information about clusterability. All pager routines now take
struct vm_object's instead of "pagers".
2) Improved data structures. In the previous paradigm, there is constant
confusion caused by pagers being both a data structure ("allocate a
pager") and a collection of routines. The idea of a pager structure has
escentially been eliminated. Objects now have types, and this type is
used to index the appropriate pager. In most cases, items in the pager
structure were duplicated in the object data structure and thus were
unnecessary. In the few cases that remained, a un_pager structure union
was created in the object to contain these items.
3) Because of the cleanup of #1 & #2, a lot of unnecessary layering can now
be removed. For instance, vm_object_enter(), vm_object_lookup(),
vm_object_remove(), and the associated object hash list were some of the
things that were removed.
4) simple_lock's removed. Discussion with several people reveals that the
SMP locking primitives used in the VM system aren't likely the mechanism
that we'll be adopting. Even if it were, the locking that was in the code
was very inadequate and would have to be mostly re-done anyway. The
locking in a uni-processor kernel was a no-op but went a long way toward
making the code difficult to read and debug.
5) Places that attempted to kludge-up the fact that we don't have kernel
thread support have been fixed to reflect the reality that we are really
dealing with processes, not threads. The VM system didn't have complete
thread support, so the comments and mis-named routines were just wrong.
We now use tsleep and wakeup directly in the lock routines, for instance.
6) Where appropriate, the pagers have been improved, especially in the
pager_alloc routines. Most of the pager_allocs have been rewritten and
are now faster and easier to maintain.
7) The pagedaemon pageout clustering algorithm has been rewritten and
now tries harder to output an even number of pages before and after
the requested page. This is sort of the reverse of the ideal pagein
algorithm and should provide better overall performance.
8) Unnecessary (incorrect) casts to caddr_t in calls to tsleep & wakeup
have been removed. Some other unnecessary casts have also been removed.
9) Some almost useless debugging code removed.
10) Terminology of shadow objects vs. backing objects straightened out.
The fact that the vm_object data structure escentially had this
backwards really confused things. The use of "shadow" and "backing
object" throughout the code is now internally consistent and correct
in the Mach terminology.
11) Several minor bug fixes, including one in the vm daemon that caused
0 RSS objects to not get purged as intended.
12) A "default pager" has now been created which cleans up the transition
of objects to the "swap" type. The previous checks throughout the code
for swp->pg_data != NULL were really ugly. This change also provides
the rudiments for future backing of "anonymous" memory by something
other than the swap pager (via the vnode pager, for example), and it
allows the decision about which of these pagers to use to be made
dynamically (although will need some additional decision code to do
this, of course).
13) (dyson) MAP_COPY has been deprecated and the corresponding "copy
object" code has been removed. MAP_COPY was undocumented and non-
standard. It was furthermore broken in several ways which caused its
behavior to degrade to MAP_PRIVATE. Binaries that use MAP_COPY will
continue to work correctly, but via the slightly different semantics
of MAP_PRIVATE.
14) (dyson) Sharing maps have been removed. It's marginal usefulness in a
threads design can be worked around in other ways. Both #12 and #13
were done to simplify the code and improve readability and maintain-
ability. (As were most all of these changes)
TODO:
1) Rewrite most of the vnode pager to use VOP_GETPAGES/PUTPAGES. Doing
this will reduce the vnode pager to a mere fraction of its current size.
2) Rewrite vm_fault and the swap/vnode pagers to use the clustering
information provided by the new haspage pager interface. This will
substantially reduce the overhead by eliminating a large number of
VOP_BMAP() calls. The VOP_BMAP() filesystem interface should be
improved to provide both a "behind" and "ahead" indication of
contiguousness.
3) Implement the extended features of pager_haspage in swap_pager_haspage().
It currently just says 0 pages ahead/behind.
4) Re-implement the swap device (swstrategy) in a more elegant way, perhaps
via a much more general mechanism that could also be used for disk
striping of regular filesystems.
5) Do something to improve the architecture of vm_object_collapse(). The
fact that it makes calls into the swap pager and knows too much about
how the swap pager operates really bothers me. It also doesn't allow
for collapsing of non-swap pager objects ("unnamed" objects backed by
other pagers).
1995-07-13 08:48:48 +00:00
|
|
|
/*
|
2016-08-04 16:20:12 +00:00
|
|
|
* We can cluster only if the page is not clean, busy, or held, and
|
Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty
pages, specificially, dirty pages that have passed once through the inactive
queue. A new, dedicated thread is responsible for both deciding when to
launder pages and actually laundering them. The new policy uses the
relative sizes of the inactive and laundry queues to determine whether to
launder pages at a given point in time. In general, this leads to more
intelligent swapping behavior, since the laundry thread will avoid pageouts
when the marginal benefit of doing so is low. Previously, without a
dedicated queue for dirty pages, the page daemon didn't have the information
to determine whether pageout provides any benefit to the system. Thus, the
previous policy often resulted in small but steadily increasing amounts of
swap usage when the system is under memory pressure, even when the inactive
queue consisted mostly of clean pages. This change addresses that issue,
and also paves the way for some future virtual memory system improvements by
removing the last source of object-cached clean pages, i.e., PG_CACHE pages.
The new laundry thread sleeps while waiting for a request from the page
daemon thread(s). A request is raised by setting the variable
vm_laundry_request and waking the laundry thread. We request launderings
for two reasons: to try and balance the inactive and laundry queue sizes
("background laundering"), and to quickly make up for a shortage of free
pages and clean inactive pages ("shortfall laundering"). When background
laundering is requested, the laundry thread computes the number of page
daemon wakeups that have taken place since the last laundering. If this
number is large enough relative to the ratio of the laundry and (global)
inactive queue sizes, we will launder vm_background_launder_target pages at
vm_background_launder_rate KB/s. Otherwise, the laundry thread goes back
to sleep without doing any work. When scanning the laundry queue during
background laundering, reactivated pages are counted towards the laundry
thread's target.
In contrast, shortfall laundering is requested when an inactive queue scan
fails to meet its target. In this case, the laundry thread attempts to
launder enough pages to meet v_free_target within 0.5s, which is the
inactive queue scan period.
A laundry request can be latched while another is currently being
serviced. In particular, a shortfall request will immediately preempt a
background laundering.
This change also redefines the meaning of vm_cnt.v_reactivated and removes
the functions vm_page_cache() and vm_page_try_to_cache(). The new meaning
of vm_cnt.v_reactivated now better reflects its name. It represents the
number of inactive or laundry pages that are returned to the active queue
on account of a reference.
In collaboration with: markj
Reviewed by: kib
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8302
2016-11-09 18:48:37 +00:00
|
|
|
* the page is in the laundry queue.
|
1999-09-17 04:56:40 +00:00
|
|
|
*
|
|
|
|
* During heavy mmap/modification loads the pageout
|
|
|
|
* daemon can really fragment the underlying file
|
2016-08-04 16:20:12 +00:00
|
|
|
* due to flushing pages out of order and not trying to
|
|
|
|
* align the clusters (which leaves sporadic out-of-order
|
1999-09-17 04:56:40 +00:00
|
|
|
* holes). To solve this problem we do the reverse scan
|
|
|
|
* first and attempt to align our cluster, then do a
|
|
|
|
* forward scan if room remains.
|
NOTE: libkvm, w, ps, 'top', and any other utility which depends on struct
proc or any VM system structure will have to be rebuilt!!!
Much needed overhaul of the VM system. Included in this first round of
changes:
1) Improved pager interfaces: init, alloc, dealloc, getpages, putpages,
haspage, and sync operations are supported. The haspage interface now
provides information about clusterability. All pager routines now take
struct vm_object's instead of "pagers".
2) Improved data structures. In the previous paradigm, there is constant
confusion caused by pagers being both a data structure ("allocate a
pager") and a collection of routines. The idea of a pager structure has
escentially been eliminated. Objects now have types, and this type is
used to index the appropriate pager. In most cases, items in the pager
structure were duplicated in the object data structure and thus were
unnecessary. In the few cases that remained, a un_pager structure union
was created in the object to contain these items.
3) Because of the cleanup of #1 & #2, a lot of unnecessary layering can now
be removed. For instance, vm_object_enter(), vm_object_lookup(),
vm_object_remove(), and the associated object hash list were some of the
things that were removed.
4) simple_lock's removed. Discussion with several people reveals that the
SMP locking primitives used in the VM system aren't likely the mechanism
that we'll be adopting. Even if it were, the locking that was in the code
was very inadequate and would have to be mostly re-done anyway. The
locking in a uni-processor kernel was a no-op but went a long way toward
making the code difficult to read and debug.
5) Places that attempted to kludge-up the fact that we don't have kernel
thread support have been fixed to reflect the reality that we are really
dealing with processes, not threads. The VM system didn't have complete
thread support, so the comments and mis-named routines were just wrong.
We now use tsleep and wakeup directly in the lock routines, for instance.
6) Where appropriate, the pagers have been improved, especially in the
pager_alloc routines. Most of the pager_allocs have been rewritten and
are now faster and easier to maintain.
7) The pagedaemon pageout clustering algorithm has been rewritten and
now tries harder to output an even number of pages before and after
the requested page. This is sort of the reverse of the ideal pagein
algorithm and should provide better overall performance.
8) Unnecessary (incorrect) casts to caddr_t in calls to tsleep & wakeup
have been removed. Some other unnecessary casts have also been removed.
9) Some almost useless debugging code removed.
10) Terminology of shadow objects vs. backing objects straightened out.
The fact that the vm_object data structure escentially had this
backwards really confused things. The use of "shadow" and "backing
object" throughout the code is now internally consistent and correct
in the Mach terminology.
11) Several minor bug fixes, including one in the vm daemon that caused
0 RSS objects to not get purged as intended.
12) A "default pager" has now been created which cleans up the transition
of objects to the "swap" type. The previous checks throughout the code
for swp->pg_data != NULL were really ugly. This change also provides
the rudiments for future backing of "anonymous" memory by something
other than the swap pager (via the vnode pager, for example), and it
allows the decision about which of these pagers to use to be made
dynamically (although will need some additional decision code to do
this, of course).
13) (dyson) MAP_COPY has been deprecated and the corresponding "copy
object" code has been removed. MAP_COPY was undocumented and non-
standard. It was furthermore broken in several ways which caused its
behavior to degrade to MAP_PRIVATE. Binaries that use MAP_COPY will
continue to work correctly, but via the slightly different semantics
of MAP_PRIVATE.
14) (dyson) Sharing maps have been removed. It's marginal usefulness in a
threads design can be worked around in other ways. Both #12 and #13
were done to simplify the code and improve readability and maintain-
ability. (As were most all of these changes)
TODO:
1) Rewrite most of the vnode pager to use VOP_GETPAGES/PUTPAGES. Doing
this will reduce the vnode pager to a mere fraction of its current size.
2) Rewrite vm_fault and the swap/vnode pagers to use the clustering
information provided by the new haspage pager interface. This will
substantially reduce the overhead by eliminating a large number of
VOP_BMAP() calls. The VOP_BMAP() filesystem interface should be
improved to provide both a "behind" and "ahead" indication of
contiguousness.
3) Implement the extended features of pager_haspage in swap_pager_haspage().
It currently just says 0 pages ahead/behind.
4) Re-implement the swap device (swstrategy) in a more elegant way, perhaps
via a much more general mechanism that could also be used for disk
striping of regular filesystems.
5) Do something to improve the architecture of vm_object_collapse(). The
fact that it makes calls into the swap pager and knows too much about
how the swap pager operates really bothers me. It also doesn't allow
for collapsing of non-swap pager objects ("unnamed" objects backed by
other pagers).
1995-07-13 08:48:48 +00:00
|
|
|
*/
|
1999-09-17 04:56:40 +00:00
|
|
|
more:
|
2016-08-04 16:20:12 +00:00
|
|
|
while (ib != 0 && pageout_count < vm_pageout_page_count) {
|
1999-09-17 04:56:40 +00:00
|
|
|
if (ib > pindex) {
|
|
|
|
ib = 0;
|
|
|
|
break;
|
|
|
|
}
|
Change synchonization rules for vm_page reference counting.
There are several mechanisms by which a vm_page reference is held,
preventing the page from being freed back to the page allocator. In
particular, holding the page's object lock is sufficient to prevent the
page from being freed; holding the busy lock or a wiring is sufficent as
well. These references are protected by the page lock, which must
therefore be acquired for many per-page operations. This results in
false sharing since the page locks are external to the vm_page
structures themselves and each lock protects multiple structures.
Transition to using an atomically updated per-page reference counter.
The object's reference is counted using a flag bit in the counter. A
second flag bit is used to atomically block new references via
pmap_extract_and_hold() while removing managed mappings of a page.
Thus, the reference count of a page is guaranteed not to increase if the
page is unbusied, unmapped, and the object's write lock is held. As
a consequence of this, the page lock no longer protects a page's
identity; operations which move pages between objects are now
synchronized solely by the objects' locks.
The vm_page_wire() and vm_page_unwire() KPIs are changed. The former
requires that either the object lock or the busy lock is held. The
latter no longer has a return value and may free the page if it releases
the last reference to that page. vm_page_unwire_noq() behaves the same
as before; the caller is responsible for checking its return value and
freeing or enqueuing the page as appropriate. vm_page_wire_mapped() is
introduced for use in pmap_extract_and_hold(). It fails if the page is
concurrently being unmapped, typically triggering a fallback to the
fault handler. vm_page_wire() no longer requires the page lock and
vm_page_unwire() now internally acquires the page lock when releasing
the last wiring of a page (since the page lock still protects a page's
queue state). In particular, synchronization details are no longer
leaked into the caller.
The change excises the page lock from several frequently executed code
paths. In particular, vm_object_terminate() no longer bounces between
page locks as it releases an object's pages, and direct I/O and
sendfile(SF_NOCACHE) completions no longer require the page lock. In
these latter cases we now get linear scalability in the common scenario
where different threads are operating on different files.
__FreeBSD_version is bumped. The DRM ports have been updated to
accomodate the KPI changes.
Reviewed by: jeff (earlier version)
Tested by: gallatin (earlier version), pho
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20486
2019-09-09 21:32:42 +00:00
|
|
|
if ((p = vm_page_prev(pb)) == NULL || vm_page_busied(p) ||
|
|
|
|
vm_page_wired(p)) {
|
1999-09-17 04:56:40 +00:00
|
|
|
ib = 0;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
vm_page_test_dirty(p);
|
2018-05-04 17:17:30 +00:00
|
|
|
if (p->dirty == 0) {
|
2015-08-25 01:01:25 +00:00
|
|
|
ib = 0;
|
|
|
|
break;
|
|
|
|
}
|
2019-09-16 15:04:45 +00:00
|
|
|
vm_page_lock(p);
|
Change synchonization rules for vm_page reference counting.
There are several mechanisms by which a vm_page reference is held,
preventing the page from being freed back to the page allocator. In
particular, holding the page's object lock is sufficient to prevent the
page from being freed; holding the busy lock or a wiring is sufficent as
well. These references are protected by the page lock, which must
therefore be acquired for many per-page operations. This results in
false sharing since the page locks are external to the vm_page
structures themselves and each lock protects multiple structures.
Transition to using an atomically updated per-page reference counter.
The object's reference is counted using a flag bit in the counter. A
second flag bit is used to atomically block new references via
pmap_extract_and_hold() while removing managed mappings of a page.
Thus, the reference count of a page is guaranteed not to increase if the
page is unbusied, unmapped, and the object's write lock is held. As
a consequence of this, the page lock no longer protects a page's
identity; operations which move pages between objects are now
synchronized solely by the objects' locks.
The vm_page_wire() and vm_page_unwire() KPIs are changed. The former
requires that either the object lock or the busy lock is held. The
latter no longer has a return value and may free the page if it releases
the last reference to that page. vm_page_unwire_noq() behaves the same
as before; the caller is responsible for checking its return value and
freeing or enqueuing the page as appropriate. vm_page_wire_mapped() is
introduced for use in pmap_extract_and_hold(). It fails if the page is
concurrently being unmapped, typically triggering a fallback to the
fault handler. vm_page_wire() no longer requires the page lock and
vm_page_unwire() now internally acquires the page lock when releasing
the last wiring of a page (since the page lock still protects a page's
queue state). In particular, synchronization details are no longer
leaked into the caller.
The change excises the page lock from several frequently executed code
paths. In particular, vm_object_terminate() no longer bounces between
page locks as it releases an object's pages, and direct I/O and
sendfile(SF_NOCACHE) completions no longer require the page lock. In
these latter cases we now get linear scalability in the common scenario
where different threads are operating on different files.
__FreeBSD_version is bumped. The DRM ports have been updated to
accomodate the KPI changes.
Reviewed by: jeff (earlier version)
Tested by: gallatin (earlier version), pho
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20486
2019-09-09 21:32:42 +00:00
|
|
|
if (!vm_page_in_laundry(p) || !vm_page_try_remove_write(p)) {
|
2019-09-16 15:04:45 +00:00
|
|
|
vm_page_unlock(p);
|
1999-09-17 04:56:40 +00:00
|
|
|
ib = 0;
|
|
|
|
break;
|
1995-04-09 06:03:56 +00:00
|
|
|
}
|
2019-09-16 15:04:45 +00:00
|
|
|
vm_page_unlock(p);
|
2010-06-21 23:27:24 +00:00
|
|
|
mc[--page_base] = pb = p;
|
1999-09-17 04:56:40 +00:00
|
|
|
++pageout_count;
|
|
|
|
++ib;
|
2016-08-04 16:20:12 +00:00
|
|
|
|
These changes embody the support of the fully coherent merged VM buffer cache,
much higher filesystem I/O performance, and much better paging performance. It
represents the culmination of over 6 months of R&D.
The majority of the merged VM/cache work is by John Dyson.
The following highlights the most significant changes. Additionally, there are
(mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to
support the new VM/buffer scheme.
vfs_bio.c:
Significant rewrite of most of vfs_bio to support the merged VM buffer cache
scheme. The scheme is almost fully compatible with the old filesystem
interface. Significant improvement in the number of opportunities for write
clustering.
vfs_cluster.c, vfs_subr.c
Upgrade and performance enhancements in vfs layer code to support merged
VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff.
vm_object.c:
Yet more improvements in the collapse code. Elimination of some windows that
can cause list corruption.
vm_pageout.c:
Fixed it, it really works better now. Somehow in 2.0, some "enhancements"
broke the code. This code has been reworked from the ground-up.
vm_fault.c, vm_page.c, pmap.c, vm_object.c
Support for small-block filesystems with merged VM/buffer cache scheme.
pmap.c vm_map.c
Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of
kernel PTs.
vm_glue.c
Much simpler and more effective swapping code. No more gratuitous swapping.
proc.h
Fixed the problem that the p_lock flag was not being cleared on a fork.
swap_pager.c, vnode_pager.c
Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the
code doesn't need it anymore.
machdep.c
Changes to better support the parameter values for the merged VM/buffer cache
scheme.
machdep.c, kern_exec.c, vm_glue.c
Implemented a seperate submap for temporary exec string space and another one
to contain process upages. This eliminates all map fragmentation problems
that previously existed.
ffs_inode.c, ufs_inode.c, ufs_readwrite.c
Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on
busy buffers.
Submitted by: John Dyson and David Greenman
1995-01-09 16:06:02 +00:00
|
|
|
/*
|
2016-08-04 16:20:12 +00:00
|
|
|
* We are at an alignment boundary. Stop here, and switch
|
|
|
|
* directions. Do not clear ib.
|
These changes embody the support of the fully coherent merged VM buffer cache,
much higher filesystem I/O performance, and much better paging performance. It
represents the culmination of over 6 months of R&D.
The majority of the merged VM/cache work is by John Dyson.
The following highlights the most significant changes. Additionally, there are
(mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to
support the new VM/buffer scheme.
vfs_bio.c:
Significant rewrite of most of vfs_bio to support the merged VM buffer cache
scheme. The scheme is almost fully compatible with the old filesystem
interface. Significant improvement in the number of opportunities for write
clustering.
vfs_cluster.c, vfs_subr.c
Upgrade and performance enhancements in vfs layer code to support merged
VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff.
vm_object.c:
Yet more improvements in the collapse code. Elimination of some windows that
can cause list corruption.
vm_pageout.c:
Fixed it, it really works better now. Somehow in 2.0, some "enhancements"
broke the code. This code has been reworked from the ground-up.
vm_fault.c, vm_page.c, pmap.c, vm_object.c
Support for small-block filesystems with merged VM/buffer cache scheme.
pmap.c vm_map.c
Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of
kernel PTs.
vm_glue.c
Much simpler and more effective swapping code. No more gratuitous swapping.
proc.h
Fixed the problem that the p_lock flag was not being cleared on a fork.
swap_pager.c, vnode_pager.c
Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the
code doesn't need it anymore.
machdep.c
Changes to better support the parameter values for the merged VM/buffer cache
scheme.
machdep.c, kern_exec.c, vm_glue.c
Implemented a seperate submap for temporary exec string space and another one
to contain process upages. This eliminates all map fragmentation problems
that previously existed.
ffs_inode.c, ufs_inode.c, ufs_readwrite.c
Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on
busy buffers.
Submitted by: John Dyson and David Greenman
1995-01-09 16:06:02 +00:00
|
|
|
*/
|
1999-09-17 04:56:40 +00:00
|
|
|
if ((pindex - (ib - 1)) % vm_pageout_page_count == 0)
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
while (pageout_count < vm_pageout_page_count &&
|
|
|
|
pindex + is < object->size) {
|
Change synchonization rules for vm_page reference counting.
There are several mechanisms by which a vm_page reference is held,
preventing the page from being freed back to the page allocator. In
particular, holding the page's object lock is sufficient to prevent the
page from being freed; holding the busy lock or a wiring is sufficent as
well. These references are protected by the page lock, which must
therefore be acquired for many per-page operations. This results in
false sharing since the page locks are external to the vm_page
structures themselves and each lock protects multiple structures.
Transition to using an atomically updated per-page reference counter.
The object's reference is counted using a flag bit in the counter. A
second flag bit is used to atomically block new references via
pmap_extract_and_hold() while removing managed mappings of a page.
Thus, the reference count of a page is guaranteed not to increase if the
page is unbusied, unmapped, and the object's write lock is held. As
a consequence of this, the page lock no longer protects a page's
identity; operations which move pages between objects are now
synchronized solely by the objects' locks.
The vm_page_wire() and vm_page_unwire() KPIs are changed. The former
requires that either the object lock or the busy lock is held. The
latter no longer has a return value and may free the page if it releases
the last reference to that page. vm_page_unwire_noq() behaves the same
as before; the caller is responsible for checking its return value and
freeing or enqueuing the page as appropriate. vm_page_wire_mapped() is
introduced for use in pmap_extract_and_hold(). It fails if the page is
concurrently being unmapped, typically triggering a fallback to the
fault handler. vm_page_wire() no longer requires the page lock and
vm_page_unwire() now internally acquires the page lock when releasing
the last wiring of a page (since the page lock still protects a page's
queue state). In particular, synchronization details are no longer
leaked into the caller.
The change excises the page lock from several frequently executed code
paths. In particular, vm_object_terminate() no longer bounces between
page locks as it releases an object's pages, and direct I/O and
sendfile(SF_NOCACHE) completions no longer require the page lock. In
these latter cases we now get linear scalability in the common scenario
where different threads are operating on different files.
__FreeBSD_version is bumped. The DRM ports have been updated to
accomodate the KPI changes.
Reviewed by: jeff (earlier version)
Tested by: gallatin (earlier version), pho
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20486
2019-09-09 21:32:42 +00:00
|
|
|
if ((p = vm_page_next(ps)) == NULL || vm_page_busied(p) ||
|
|
|
|
vm_page_wired(p))
|
1999-09-17 04:56:40 +00:00
|
|
|
break;
|
|
|
|
vm_page_test_dirty(p);
|
2018-05-04 17:17:30 +00:00
|
|
|
if (p->dirty == 0)
|
2015-08-25 01:01:25 +00:00
|
|
|
break;
|
2019-09-16 15:04:45 +00:00
|
|
|
vm_page_lock(p);
|
|
|
|
if (!vm_page_in_laundry(p) || !vm_page_try_remove_write(p)) {
|
|
|
|
vm_page_unlock(p);
|
1999-09-17 04:56:40 +00:00
|
|
|
break;
|
2019-09-16 15:04:45 +00:00
|
|
|
}
|
|
|
|
vm_page_unlock(p);
|
2010-06-21 23:27:24 +00:00
|
|
|
mc[page_base + pageout_count] = ps = p;
|
1999-09-17 04:56:40 +00:00
|
|
|
++pageout_count;
|
|
|
|
++is;
|
1994-05-24 10:09:53 +00:00
|
|
|
}
|
|
|
|
|
1999-09-17 04:56:40 +00:00
|
|
|
/*
|
|
|
|
* If we exhausted our forward scan, continue with the reverse scan
|
2016-08-04 16:20:12 +00:00
|
|
|
* when possible, even past an alignment boundary. This catches
|
|
|
|
* boundary conditions.
|
1999-09-17 04:56:40 +00:00
|
|
|
*/
|
2016-08-04 16:20:12 +00:00
|
|
|
if (ib != 0 && pageout_count < vm_pageout_page_count)
|
1999-09-17 04:56:40 +00:00
|
|
|
goto more;
|
|
|
|
|
2016-11-23 17:53:07 +00:00
|
|
|
return (vm_pageout_flush(&mc[page_base], pageout_count,
|
|
|
|
VM_PAGER_PUT_NOREUSE, 0, NULL, NULL));
|
1995-11-05 20:46:03 +00:00
|
|
|
}
|
|
|
|
|
1999-01-21 08:29:12 +00:00
|
|
|
/*
|
|
|
|
* vm_pageout_flush() - launder the given pages
|
|
|
|
*
|
|
|
|
* The given pages are laundered. Note that we setup for the start of
|
|
|
|
* I/O ( i.e. busy the page ), mark it read-only, and bump the object
|
|
|
|
* reference count all in here rather then in the parent. If we want
|
|
|
|
* the parent to do more sophisticated things we may have to change
|
|
|
|
* the ordering.
|
2010-11-18 21:09:02 +00:00
|
|
|
*
|
|
|
|
* Returned runlen is the count of pages between mreq and first
|
|
|
|
* page after mreq with status VM_PAGER_AGAIN.
|
2012-03-17 23:00:32 +00:00
|
|
|
* *eio is set to TRUE if pager returned VM_PAGER_ERROR or VM_PAGER_FAIL
|
|
|
|
* for any page in runlen set.
|
1999-01-21 08:29:12 +00:00
|
|
|
*/
|
1995-11-05 20:46:03 +00:00
|
|
|
int
|
2012-03-17 23:00:32 +00:00
|
|
|
vm_pageout_flush(vm_page_t *mc, int count, int flags, int mreq, int *prunlen,
|
|
|
|
boolean_t *eio)
|
1995-11-05 20:46:03 +00:00
|
|
|
{
|
2003-10-24 06:43:04 +00:00
|
|
|
vm_object_t object = mc[0]->object;
|
1995-11-05 20:46:03 +00:00
|
|
|
int pageout_status[count];
|
1998-02-05 03:32:49 +00:00
|
|
|
int numpagedout = 0;
|
2010-11-18 21:09:02 +00:00
|
|
|
int i, runlen;
|
1995-11-05 20:46:03 +00:00
|
|
|
|
2013-02-21 21:54:53 +00:00
|
|
|
VM_OBJECT_ASSERT_WLOCKED(object);
|
2010-04-30 22:31:37 +00:00
|
|
|
|
1999-01-21 08:29:12 +00:00
|
|
|
/*
|
Synchronize page laundering with pmap_extract_and_hold().
Before r207410, the hold count of a page in a page queue was protected
by the queue lock, and, before laundering a page, the page daemon
removed managed writeable mappings of the page before releasing the
queue lock. This ensured that other threads could not concurrently
create transient writeable mappings using pmap_extract_and_hold() on a
user map, as is done for example by vmapbuf(). With that revision,
however, a race can allow the creation of such a mapping, meaning that
the page might be modified as it is being laundered, potentially
resulting in it being marked clean when its contents do not match
those given to the pager. Close the race by using the page lock to
synchronize the hold count check in vm_pageout_cluster() with the
removal of writeable managed mappings.
Reported by: alc
Reviewed by: alc, kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D12084
2017-08-28 22:10:15 +00:00
|
|
|
* Initiate I/O. Mark the pages busy and verify that they're valid
|
|
|
|
* and read-only.
|
1999-01-21 08:29:12 +00:00
|
|
|
*
|
|
|
|
* We do not have to fixup the clean/dirty bits here... we can
|
|
|
|
* allow the pager to do it after the I/O completes.
|
2000-12-11 07:52:47 +00:00
|
|
|
*
|
|
|
|
* NOTE! mc[i]->dirty may be partial or fragmented due to an
|
|
|
|
* edge case with file fragments.
|
1999-01-21 08:29:12 +00:00
|
|
|
*/
|
This mega-commit is meant to fix numerous interrelated problems. There
has been some bitrot and incorrect assumptions in the vfs_bio code. These
problems have manifest themselves worse on NFS type filesystems, but can
still affect local filesystems under certain circumstances. Most of
the problems have involved mmap consistancy, and as a side-effect broke
the vfs.ioopt code. This code might have been committed seperately, but
almost everything is interrelated.
1) Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that
are fully valid.
2) Rather than deactivating erroneously read initial (header) pages in
kern_exec, we now free them.
3) Fix the rundown of non-VMIO buffers that are in an inconsistent
(missing vp) state.
4) Fix the disassociation of pages from buffers in brelse. The previous
code had rotted and was faulty in a couple of important circumstances.
5) Remove a gratuitious buffer wakeup in vfs_vmio_release.
6) Remove a crufty and currently unused cluster mechanism for VBLK
files in vfs_bio_awrite. When the code is functional, I'll add back
a cleaner version.
7) The page busy count wakeups assocated with the buffer cache usage were
incorrectly cleaned up in a previous commit by me. Revert to the
original, correct version, but with a cleaner implementation.
8) The cluster read code now tries to keep data associated with buffers
more aggressively (without breaking the heuristics) when it is presumed
that the read data (buffers) will be soon needed.
9) Change to filesystem lockmgr locks so that they use LK_NOPAUSE. The
delay loop waiting is not useful for filesystem locks, due to the
length of the time intervals.
10) Correct and clean-up spec_getpages.
11) Implement a fully functional nfs_getpages, nfs_putpages.
12) Fix nfs_write so that modifications are coherent with the NFS data on
the server disk (at least as well as NFS seems to allow.)
13) Properly support MS_INVALIDATE on NFS.
14) Properly pass down MS_INVALIDATE to lower levels of the VM code from
vm_map_clean.
15) Better support the notion of pages being busy but valid, so that
fewer in-transit waits occur. (use p->busy more for pageouts instead
of PG_BUSY.) Since the page is fully valid, it is still usable for
reads.
16) It is possible (in error) for cached pages to be busy. Make the
page allocation code handle that case correctly. (It should probably
be a printf or panic, but I want the system to handle coding errors
robustly. I'll probably add a printf.)
17) Correct the design and usage of vm_page_sleep. It didn't handle
consistancy problems very well, so make the design a little less
lofty. After vm_page_sleep, if it ever blocked, it is still important
to relookup the page (if the object generation count changed), and
verify it's status (always.)
18) In vm_pageout.c, vm_pageout_clean had rotted, so clean that up.
19) Push the page busy for writes and VM_PROT_READ into vm_pageout_flush.
20) Fix vm_pager_put_pages and it's descendents to support an int flag
instead of a boolean, so that we can pass down the invalidate bit.
1998-03-07 21:37:31 +00:00
|
|
|
for (i = 0; i < count; i++) {
|
2003-10-18 21:09:21 +00:00
|
|
|
KASSERT(mc[i]->valid == VM_PAGE_BITS_ALL,
|
|
|
|
("vm_pageout_flush: partially invalid page %p index %d/%d",
|
|
|
|
mc[i], i, count));
|
2019-09-16 15:04:45 +00:00
|
|
|
KASSERT((mc[i]->aflags & PGA_WRITEABLE) == 0,
|
Synchronize page laundering with pmap_extract_and_hold().
Before r207410, the hold count of a page in a page queue was protected
by the queue lock, and, before laundering a page, the page daemon
removed managed writeable mappings of the page before releasing the
queue lock. This ensured that other threads could not concurrently
create transient writeable mappings using pmap_extract_and_hold() on a
user map, as is done for example by vmapbuf(). With that revision,
however, a race can allow the creation of such a mapping, meaning that
the page might be modified as it is being laundered, potentially
resulting in it being marked clean when its contents do not match
those given to the pager. Close the race by using the page lock to
synchronize the hold count check in vm_pageout_cluster() with the
removal of writeable managed mappings.
Reported by: alc
Reviewed by: alc, kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D12084
2017-08-28 22:10:15 +00:00
|
|
|
("vm_pageout_flush: writeable page %p", mc[i]));
|
2013-08-09 11:11:11 +00:00
|
|
|
vm_page_sbusy(mc[i]);
|
This mega-commit is meant to fix numerous interrelated problems. There
has been some bitrot and incorrect assumptions in the vfs_bio code. These
problems have manifest themselves worse on NFS type filesystems, but can
still affect local filesystems under certain circumstances. Most of
the problems have involved mmap consistancy, and as a side-effect broke
the vfs.ioopt code. This code might have been committed seperately, but
almost everything is interrelated.
1) Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that
are fully valid.
2) Rather than deactivating erroneously read initial (header) pages in
kern_exec, we now free them.
3) Fix the rundown of non-VMIO buffers that are in an inconsistent
(missing vp) state.
4) Fix the disassociation of pages from buffers in brelse. The previous
code had rotted and was faulty in a couple of important circumstances.
5) Remove a gratuitious buffer wakeup in vfs_vmio_release.
6) Remove a crufty and currently unused cluster mechanism for VBLK
files in vfs_bio_awrite. When the code is functional, I'll add back
a cleaner version.
7) The page busy count wakeups assocated with the buffer cache usage were
incorrectly cleaned up in a previous commit by me. Revert to the
original, correct version, but with a cleaner implementation.
8) The cluster read code now tries to keep data associated with buffers
more aggressively (without breaking the heuristics) when it is presumed
that the read data (buffers) will be soon needed.
9) Change to filesystem lockmgr locks so that they use LK_NOPAUSE. The
delay loop waiting is not useful for filesystem locks, due to the
length of the time intervals.
10) Correct and clean-up spec_getpages.
11) Implement a fully functional nfs_getpages, nfs_putpages.
12) Fix nfs_write so that modifications are coherent with the NFS data on
the server disk (at least as well as NFS seems to allow.)
13) Properly support MS_INVALIDATE on NFS.
14) Properly pass down MS_INVALIDATE to lower levels of the VM code from
vm_map_clean.
15) Better support the notion of pages being busy but valid, so that
fewer in-transit waits occur. (use p->busy more for pageouts instead
of PG_BUSY.) Since the page is fully valid, it is still usable for
reads.
16) It is possible (in error) for cached pages to be busy. Make the
page allocation code handle that case correctly. (It should probably
be a printf or panic, but I want the system to handle coding errors
robustly. I'll probably add a printf.)
17) Correct the design and usage of vm_page_sleep. It didn't handle
consistancy problems very well, so make the design a little less
lofty. After vm_page_sleep, if it ever blocked, it is still important
to relookup the page (if the object generation count changed), and
verify it's status (always.)
18) In vm_pageout.c, vm_pageout_clean had rotted, so clean that up.
19) Push the page busy for writes and VM_PROT_READ into vm_pageout_flush.
20) Fix vm_pager_put_pages and it's descendents to support an int flag
instead of a boolean, so that we can pass down the invalidate bit.
1998-03-07 21:37:31 +00:00
|
|
|
}
|
1998-08-06 08:33:19 +00:00
|
|
|
vm_object_pip_add(object, count);
|
1995-11-05 20:46:03 +00:00
|
|
|
|
2007-06-13 06:10:10 +00:00
|
|
|
vm_pager_put_pages(object, mc, count, flags, pageout_status);
|
1994-05-24 10:09:53 +00:00
|
|
|
|
2010-11-18 21:09:02 +00:00
|
|
|
runlen = count - mreq;
|
2012-03-17 23:00:32 +00:00
|
|
|
if (eio != NULL)
|
|
|
|
*eio = FALSE;
|
1995-11-05 20:46:03 +00:00
|
|
|
for (i = 0; i < count; i++) {
|
|
|
|
vm_page_t mt = mc[i];
|
NOTE: libkvm, w, ps, 'top', and any other utility which depends on struct
proc or any VM system structure will have to be rebuilt!!!
Much needed overhaul of the VM system. Included in this first round of
changes:
1) Improved pager interfaces: init, alloc, dealloc, getpages, putpages,
haspage, and sync operations are supported. The haspage interface now
provides information about clusterability. All pager routines now take
struct vm_object's instead of "pagers".
2) Improved data structures. In the previous paradigm, there is constant
confusion caused by pagers being both a data structure ("allocate a
pager") and a collection of routines. The idea of a pager structure has
escentially been eliminated. Objects now have types, and this type is
used to index the appropriate pager. In most cases, items in the pager
structure were duplicated in the object data structure and thus were
unnecessary. In the few cases that remained, a un_pager structure union
was created in the object to contain these items.
3) Because of the cleanup of #1 & #2, a lot of unnecessary layering can now
be removed. For instance, vm_object_enter(), vm_object_lookup(),
vm_object_remove(), and the associated object hash list were some of the
things that were removed.
4) simple_lock's removed. Discussion with several people reveals that the
SMP locking primitives used in the VM system aren't likely the mechanism
that we'll be adopting. Even if it were, the locking that was in the code
was very inadequate and would have to be mostly re-done anyway. The
locking in a uni-processor kernel was a no-op but went a long way toward
making the code difficult to read and debug.
5) Places that attempted to kludge-up the fact that we don't have kernel
thread support have been fixed to reflect the reality that we are really
dealing with processes, not threads. The VM system didn't have complete
thread support, so the comments and mis-named routines were just wrong.
We now use tsleep and wakeup directly in the lock routines, for instance.
6) Where appropriate, the pagers have been improved, especially in the
pager_alloc routines. Most of the pager_allocs have been rewritten and
are now faster and easier to maintain.
7) The pagedaemon pageout clustering algorithm has been rewritten and
now tries harder to output an even number of pages before and after
the requested page. This is sort of the reverse of the ideal pagein
algorithm and should provide better overall performance.
8) Unnecessary (incorrect) casts to caddr_t in calls to tsleep & wakeup
have been removed. Some other unnecessary casts have also been removed.
9) Some almost useless debugging code removed.
10) Terminology of shadow objects vs. backing objects straightened out.
The fact that the vm_object data structure escentially had this
backwards really confused things. The use of "shadow" and "backing
object" throughout the code is now internally consistent and correct
in the Mach terminology.
11) Several minor bug fixes, including one in the vm daemon that caused
0 RSS objects to not get purged as intended.
12) A "default pager" has now been created which cleans up the transition
of objects to the "swap" type. The previous checks throughout the code
for swp->pg_data != NULL were really ugly. This change also provides
the rudiments for future backing of "anonymous" memory by something
other than the swap pager (via the vnode pager, for example), and it
allows the decision about which of these pagers to use to be made
dynamically (although will need some additional decision code to do
this, of course).
13) (dyson) MAP_COPY has been deprecated and the corresponding "copy
object" code has been removed. MAP_COPY was undocumented and non-
standard. It was furthermore broken in several ways which caused its
behavior to degrade to MAP_PRIVATE. Binaries that use MAP_COPY will
continue to work correctly, but via the slightly different semantics
of MAP_PRIVATE.
14) (dyson) Sharing maps have been removed. It's marginal usefulness in a
threads design can be worked around in other ways. Both #12 and #13
were done to simplify the code and improve readability and maintain-
ability. (As were most all of these changes)
TODO:
1) Rewrite most of the vnode pager to use VOP_GETPAGES/PUTPAGES. Doing
this will reduce the vnode pager to a mere fraction of its current size.
2) Rewrite vm_fault and the swap/vnode pagers to use the clustering
information provided by the new haspage pager interface. This will
substantially reduce the overhead by eliminating a large number of
VOP_BMAP() calls. The VOP_BMAP() filesystem interface should be
improved to provide both a "behind" and "ahead" indication of
contiguousness.
3) Implement the extended features of pager_haspage in swap_pager_haspage().
It currently just says 0 pages ahead/behind.
4) Re-implement the swap device (swstrategy) in a more elegant way, perhaps
via a much more general mechanism that could also be used for disk
striping of regular filesystems.
5) Do something to improve the architecture of vm_object_collapse(). The
fact that it makes calls into the swap pager and knows too much about
how the swap pager operates really bothers me. It also doesn't allow
for collapsing of non-swap pager objects ("unnamed" objects backed by
other pagers).
1995-07-13 08:48:48 +00:00
|
|
|
|
2007-09-15 18:30:28 +00:00
|
|
|
KASSERT(pageout_status[i] == VM_PAGER_PEND ||
|
2012-06-16 18:56:19 +00:00
|
|
|
!pmap_page_is_write_mapped(mt),
|
2004-02-21 23:32:00 +00:00
|
|
|
("vm_pageout_flush: page %p is not write protected", mt));
|
1994-05-25 09:21:21 +00:00
|
|
|
switch (pageout_status[i]) {
|
|
|
|
case VM_PAGER_OK:
|
Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty
pages, specificially, dirty pages that have passed once through the inactive
queue. A new, dedicated thread is responsible for both deciding when to
launder pages and actually laundering them. The new policy uses the
relative sizes of the inactive and laundry queues to determine whether to
launder pages at a given point in time. In general, this leads to more
intelligent swapping behavior, since the laundry thread will avoid pageouts
when the marginal benefit of doing so is low. Previously, without a
dedicated queue for dirty pages, the page daemon didn't have the information
to determine whether pageout provides any benefit to the system. Thus, the
previous policy often resulted in small but steadily increasing amounts of
swap usage when the system is under memory pressure, even when the inactive
queue consisted mostly of clean pages. This change addresses that issue,
and also paves the way for some future virtual memory system improvements by
removing the last source of object-cached clean pages, i.e., PG_CACHE pages.
The new laundry thread sleeps while waiting for a request from the page
daemon thread(s). A request is raised by setting the variable
vm_laundry_request and waking the laundry thread. We request launderings
for two reasons: to try and balance the inactive and laundry queue sizes
("background laundering"), and to quickly make up for a shortage of free
pages and clean inactive pages ("shortfall laundering"). When background
laundering is requested, the laundry thread computes the number of page
daemon wakeups that have taken place since the last laundering. If this
number is large enough relative to the ratio of the laundry and (global)
inactive queue sizes, we will launder vm_background_launder_target pages at
vm_background_launder_rate KB/s. Otherwise, the laundry thread goes back
to sleep without doing any work. When scanning the laundry queue during
background laundering, reactivated pages are counted towards the laundry
thread's target.
In contrast, shortfall laundering is requested when an inactive queue scan
fails to meet its target. In this case, the laundry thread attempts to
launder enough pages to meet v_free_target within 0.5s, which is the
inactive queue scan period.
A laundry request can be latched while another is currently being
serviced. In particular, a shortfall request will immediately preempt a
background laundering.
This change also redefines the meaning of vm_cnt.v_reactivated and removes
the functions vm_page_cache() and vm_page_try_to_cache(). The new meaning
of vm_cnt.v_reactivated now better reflects its name. It represents the
number of inactive or laundry pages that are returned to the active queue
on account of a reference.
In collaboration with: markj
Reviewed by: kib
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8302
2016-11-09 18:48:37 +00:00
|
|
|
vm_page_lock(mt);
|
|
|
|
if (vm_page_in_laundry(mt))
|
|
|
|
vm_page_deactivate_noreuse(mt);
|
|
|
|
vm_page_unlock(mt);
|
|
|
|
/* FALLTHROUGH */
|
1994-05-25 09:21:21 +00:00
|
|
|
case VM_PAGER_PEND:
|
1998-02-05 03:32:49 +00:00
|
|
|
numpagedout++;
|
1994-05-25 09:21:21 +00:00
|
|
|
break;
|
|
|
|
case VM_PAGER_BAD:
|
|
|
|
/*
|
Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty
pages, specificially, dirty pages that have passed once through the inactive
queue. A new, dedicated thread is responsible for both deciding when to
launder pages and actually laundering them. The new policy uses the
relative sizes of the inactive and laundry queues to determine whether to
launder pages at a given point in time. In general, this leads to more
intelligent swapping behavior, since the laundry thread will avoid pageouts
when the marginal benefit of doing so is low. Previously, without a
dedicated queue for dirty pages, the page daemon didn't have the information
to determine whether pageout provides any benefit to the system. Thus, the
previous policy often resulted in small but steadily increasing amounts of
swap usage when the system is under memory pressure, even when the inactive
queue consisted mostly of clean pages. This change addresses that issue,
and also paves the way for some future virtual memory system improvements by
removing the last source of object-cached clean pages, i.e., PG_CACHE pages.
The new laundry thread sleeps while waiting for a request from the page
daemon thread(s). A request is raised by setting the variable
vm_laundry_request and waking the laundry thread. We request launderings
for two reasons: to try and balance the inactive and laundry queue sizes
("background laundering"), and to quickly make up for a shortage of free
pages and clean inactive pages ("shortfall laundering"). When background
laundering is requested, the laundry thread computes the number of page
daemon wakeups that have taken place since the last laundering. If this
number is large enough relative to the ratio of the laundry and (global)
inactive queue sizes, we will launder vm_background_launder_target pages at
vm_background_launder_rate KB/s. Otherwise, the laundry thread goes back
to sleep without doing any work. When scanning the laundry queue during
background laundering, reactivated pages are counted towards the laundry
thread's target.
In contrast, shortfall laundering is requested when an inactive queue scan
fails to meet its target. In this case, the laundry thread attempts to
launder enough pages to meet v_free_target within 0.5s, which is the
inactive queue scan period.
A laundry request can be latched while another is currently being
serviced. In particular, a shortfall request will immediately preempt a
background laundering.
This change also redefines the meaning of vm_cnt.v_reactivated and removes
the functions vm_page_cache() and vm_page_try_to_cache(). The new meaning
of vm_cnt.v_reactivated now better reflects its name. It represents the
number of inactive or laundry pages that are returned to the active queue
on account of a reference.
In collaboration with: markj
Reviewed by: kib
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8302
2016-11-09 18:48:37 +00:00
|
|
|
* The page is outside the object's range. We pretend
|
|
|
|
* that the page out worked and clean the page, so the
|
|
|
|
* changes will be lost if the page is reclaimed by
|
|
|
|
* the page daemon.
|
1994-05-25 09:21:21 +00:00
|
|
|
*/
|
1999-09-17 04:56:40 +00:00
|
|
|
vm_page_undirty(mt);
|
Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty
pages, specificially, dirty pages that have passed once through the inactive
queue. A new, dedicated thread is responsible for both deciding when to
launder pages and actually laundering them. The new policy uses the
relative sizes of the inactive and laundry queues to determine whether to
launder pages at a given point in time. In general, this leads to more
intelligent swapping behavior, since the laundry thread will avoid pageouts
when the marginal benefit of doing so is low. Previously, without a
dedicated queue for dirty pages, the page daemon didn't have the information
to determine whether pageout provides any benefit to the system. Thus, the
previous policy often resulted in small but steadily increasing amounts of
swap usage when the system is under memory pressure, even when the inactive
queue consisted mostly of clean pages. This change addresses that issue,
and also paves the way for some future virtual memory system improvements by
removing the last source of object-cached clean pages, i.e., PG_CACHE pages.
The new laundry thread sleeps while waiting for a request from the page
daemon thread(s). A request is raised by setting the variable
vm_laundry_request and waking the laundry thread. We request launderings
for two reasons: to try and balance the inactive and laundry queue sizes
("background laundering"), and to quickly make up for a shortage of free
pages and clean inactive pages ("shortfall laundering"). When background
laundering is requested, the laundry thread computes the number of page
daemon wakeups that have taken place since the last laundering. If this
number is large enough relative to the ratio of the laundry and (global)
inactive queue sizes, we will launder vm_background_launder_target pages at
vm_background_launder_rate KB/s. Otherwise, the laundry thread goes back
to sleep without doing any work. When scanning the laundry queue during
background laundering, reactivated pages are counted towards the laundry
thread's target.
In contrast, shortfall laundering is requested when an inactive queue scan
fails to meet its target. In this case, the laundry thread attempts to
launder enough pages to meet v_free_target within 0.5s, which is the
inactive queue scan period.
A laundry request can be latched while another is currently being
serviced. In particular, a shortfall request will immediately preempt a
background laundering.
This change also redefines the meaning of vm_cnt.v_reactivated and removes
the functions vm_page_cache() and vm_page_try_to_cache(). The new meaning
of vm_cnt.v_reactivated now better reflects its name. It represents the
number of inactive or laundry pages that are returned to the active queue
on account of a reference.
In collaboration with: markj
Reviewed by: kib
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8302
2016-11-09 18:48:37 +00:00
|
|
|
vm_page_lock(mt);
|
|
|
|
if (vm_page_in_laundry(mt))
|
|
|
|
vm_page_deactivate_noreuse(mt);
|
|
|
|
vm_page_unlock(mt);
|
1994-05-25 09:21:21 +00:00
|
|
|
break;
|
|
|
|
case VM_PAGER_ERROR:
|
|
|
|
case VM_PAGER_FAIL:
|
|
|
|
/*
|
2017-01-03 00:05:44 +00:00
|
|
|
* If the page couldn't be paged out to swap because the
|
|
|
|
* pager wasn't able to find space, place the page in
|
|
|
|
* the PQ_UNSWAPPABLE holding queue. This is an
|
|
|
|
* optimization that prevents the page daemon from
|
|
|
|
* wasting CPU cycles on pages that cannot be reclaimed
|
|
|
|
* becase no swap device is configured.
|
|
|
|
*
|
|
|
|
* Otherwise, reactivate the page so that it doesn't
|
|
|
|
* clog the laundry and inactive queues. (We will try
|
|
|
|
* paging it out again later.)
|
1994-05-25 09:21:21 +00:00
|
|
|
*/
|
2010-05-08 20:34:01 +00:00
|
|
|
vm_page_lock(mt);
|
2017-01-03 00:05:44 +00:00
|
|
|
if (object->type == OBJT_SWAP &&
|
|
|
|
pageout_status[i] == VM_PAGER_FAIL) {
|
|
|
|
vm_page_unswappable(mt);
|
|
|
|
numpagedout++;
|
|
|
|
} else
|
|
|
|
vm_page_activate(mt);
|
2010-05-08 20:34:01 +00:00
|
|
|
vm_page_unlock(mt);
|
2012-03-17 23:00:32 +00:00
|
|
|
if (eio != NULL && i >= mreq && i - mreq < runlen)
|
|
|
|
*eio = TRUE;
|
1994-05-25 09:21:21 +00:00
|
|
|
break;
|
|
|
|
case VM_PAGER_AGAIN:
|
2010-11-18 21:09:02 +00:00
|
|
|
if (i >= mreq && i - mreq < runlen)
|
|
|
|
runlen = i - mreq;
|
1994-05-24 10:09:53 +00:00
|
|
|
break;
|
1994-05-25 09:21:21 +00:00
|
|
|
}
|
1994-05-24 10:09:53 +00:00
|
|
|
|
|
|
|
/*
|
These changes embody the support of the fully coherent merged VM buffer cache,
much higher filesystem I/O performance, and much better paging performance. It
represents the culmination of over 6 months of R&D.
The majority of the merged VM/cache work is by John Dyson.
The following highlights the most significant changes. Additionally, there are
(mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to
support the new VM/buffer scheme.
vfs_bio.c:
Significant rewrite of most of vfs_bio to support the merged VM buffer cache
scheme. The scheme is almost fully compatible with the old filesystem
interface. Significant improvement in the number of opportunities for write
clustering.
vfs_cluster.c, vfs_subr.c
Upgrade and performance enhancements in vfs layer code to support merged
VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff.
vm_object.c:
Yet more improvements in the collapse code. Elimination of some windows that
can cause list corruption.
vm_pageout.c:
Fixed it, it really works better now. Somehow in 2.0, some "enhancements"
broke the code. This code has been reworked from the ground-up.
vm_fault.c, vm_page.c, pmap.c, vm_object.c
Support for small-block filesystems with merged VM/buffer cache scheme.
pmap.c vm_map.c
Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of
kernel PTs.
vm_glue.c
Much simpler and more effective swapping code. No more gratuitous swapping.
proc.h
Fixed the problem that the p_lock flag was not being cleared on a fork.
swap_pager.c, vnode_pager.c
Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the
code doesn't need it anymore.
machdep.c
Changes to better support the parameter values for the merged VM/buffer cache
scheme.
machdep.c, kern_exec.c, vm_glue.c
Implemented a seperate submap for temporary exec string space and another one
to contain process upages. This eliminates all map fragmentation problems
that previously existed.
ffs_inode.c, ufs_inode.c, ufs_readwrite.c
Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on
busy buffers.
Submitted by: John Dyson and David Greenman
1995-01-09 16:06:02 +00:00
|
|
|
* If the operation is still going, leave the page busy to
|
|
|
|
* block all other accesses. Also, leave the paging in
|
|
|
|
* progress indicator set so that we don't attempt an object
|
|
|
|
* collapse.
|
1994-05-24 10:09:53 +00:00
|
|
|
*/
|
1994-05-25 09:21:21 +00:00
|
|
|
if (pageout_status[i] != VM_PAGER_PEND) {
|
1995-03-01 23:30:04 +00:00
|
|
|
vm_object_pip_wakeup(object);
|
2013-08-09 11:11:11 +00:00
|
|
|
vm_page_sunbusy(mt);
|
1994-05-24 10:09:53 +00:00
|
|
|
}
|
1994-05-25 09:21:21 +00:00
|
|
|
}
|
2010-11-18 21:09:02 +00:00
|
|
|
if (prunlen != NULL)
|
|
|
|
*prunlen = runlen;
|
2010-05-08 20:34:01 +00:00
|
|
|
return (numpagedout);
|
1994-05-25 09:21:21 +00:00
|
|
|
}
|
1994-05-24 10:09:53 +00:00
|
|
|
|
2017-01-03 00:05:44 +00:00
|
|
|
static void
|
|
|
|
vm_pageout_swapon(void *arg __unused, struct swdevt *sp __unused)
|
|
|
|
{
|
|
|
|
|
|
|
|
atomic_store_rel_int(&swapdev_enabled, 1);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
|
|
vm_pageout_swapoff(void *arg __unused, struct swdevt *sp __unused)
|
|
|
|
{
|
|
|
|
|
|
|
|
if (swap_pager_nswapdev() == 1)
|
|
|
|
atomic_store_rel_int(&swapdev_enabled, 0);
|
|
|
|
}
|
|
|
|
|
2015-04-07 02:18:52 +00:00
|
|
|
/*
|
|
|
|
* Attempt to acquire all of the necessary locks to launder a page and
|
|
|
|
* then call through the clustering layer to PUTPAGES. Wait a short
|
|
|
|
* time for a vnode lock.
|
|
|
|
*
|
|
|
|
* Requires the page and object lock on entry, releases both before return.
|
|
|
|
* Returns 0 on success and an errno otherwise.
|
|
|
|
*/
|
|
|
|
static int
|
Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty
pages, specificially, dirty pages that have passed once through the inactive
queue. A new, dedicated thread is responsible for both deciding when to
launder pages and actually laundering them. The new policy uses the
relative sizes of the inactive and laundry queues to determine whether to
launder pages at a given point in time. In general, this leads to more
intelligent swapping behavior, since the laundry thread will avoid pageouts
when the marginal benefit of doing so is low. Previously, without a
dedicated queue for dirty pages, the page daemon didn't have the information
to determine whether pageout provides any benefit to the system. Thus, the
previous policy often resulted in small but steadily increasing amounts of
swap usage when the system is under memory pressure, even when the inactive
queue consisted mostly of clean pages. This change addresses that issue,
and also paves the way for some future virtual memory system improvements by
removing the last source of object-cached clean pages, i.e., PG_CACHE pages.
The new laundry thread sleeps while waiting for a request from the page
daemon thread(s). A request is raised by setting the variable
vm_laundry_request and waking the laundry thread. We request launderings
for two reasons: to try and balance the inactive and laundry queue sizes
("background laundering"), and to quickly make up for a shortage of free
pages and clean inactive pages ("shortfall laundering"). When background
laundering is requested, the laundry thread computes the number of page
daemon wakeups that have taken place since the last laundering. If this
number is large enough relative to the ratio of the laundry and (global)
inactive queue sizes, we will launder vm_background_launder_target pages at
vm_background_launder_rate KB/s. Otherwise, the laundry thread goes back
to sleep without doing any work. When scanning the laundry queue during
background laundering, reactivated pages are counted towards the laundry
thread's target.
In contrast, shortfall laundering is requested when an inactive queue scan
fails to meet its target. In this case, the laundry thread attempts to
launder enough pages to meet v_free_target within 0.5s, which is the
inactive queue scan period.
A laundry request can be latched while another is currently being
serviced. In particular, a shortfall request will immediately preempt a
background laundering.
This change also redefines the meaning of vm_cnt.v_reactivated and removes
the functions vm_page_cache() and vm_page_try_to_cache(). The new meaning
of vm_cnt.v_reactivated now better reflects its name. It represents the
number of inactive or laundry pages that are returned to the active queue
on account of a reference.
In collaboration with: markj
Reviewed by: kib
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8302
2016-11-09 18:48:37 +00:00
|
|
|
vm_pageout_clean(vm_page_t m, int *numpagedout)
|
2015-04-07 02:18:52 +00:00
|
|
|
{
|
|
|
|
struct vnode *vp;
|
|
|
|
struct mount *mp;
|
|
|
|
vm_object_t object;
|
|
|
|
vm_pindex_t pindex;
|
|
|
|
int error, lockmode;
|
|
|
|
|
2019-09-16 15:04:45 +00:00
|
|
|
vm_page_assert_locked(m);
|
2015-04-07 02:18:52 +00:00
|
|
|
object = m->object;
|
|
|
|
VM_OBJECT_ASSERT_WLOCKED(object);
|
|
|
|
error = 0;
|
|
|
|
vp = NULL;
|
|
|
|
mp = NULL;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The object is already known NOT to be dead. It
|
|
|
|
* is possible for the vget() to block the whole
|
|
|
|
* pageout daemon, but the new low-memory handling
|
|
|
|
* code should prevent it.
|
|
|
|
*
|
|
|
|
* We can't wait forever for the vnode lock, we might
|
|
|
|
* deadlock due to a vn_read() getting stuck in
|
|
|
|
* vm_wait while holding this vnode. We skip the
|
|
|
|
* vnode if we can't get it in a reasonable amount
|
|
|
|
* of time.
|
|
|
|
*/
|
|
|
|
if (object->type == OBJT_VNODE) {
|
2019-09-16 15:04:45 +00:00
|
|
|
vm_page_unlock(m);
|
2015-04-07 02:18:52 +00:00
|
|
|
vp = object->handle;
|
|
|
|
if (vp->v_type == VREG &&
|
|
|
|
vn_start_write(vp, &mp, V_NOWAIT) != 0) {
|
|
|
|
mp = NULL;
|
|
|
|
error = EDEADLK;
|
|
|
|
goto unlock_all;
|
|
|
|
}
|
|
|
|
KASSERT(mp != NULL,
|
|
|
|
("vp %p with NULL v_mount", vp));
|
|
|
|
vm_object_reference_locked(object);
|
|
|
|
pindex = m->pindex;
|
|
|
|
VM_OBJECT_WUNLOCK(object);
|
|
|
|
lockmode = MNT_SHARED_WRITES(vp->v_mount) ?
|
|
|
|
LK_SHARED : LK_EXCLUSIVE;
|
|
|
|
if (vget(vp, lockmode | LK_TIMELOCK, curthread)) {
|
|
|
|
vp = NULL;
|
|
|
|
error = EDEADLK;
|
|
|
|
goto unlock_mp;
|
|
|
|
}
|
|
|
|
VM_OBJECT_WLOCK(object);
|
2017-11-29 19:47:09 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Ensure that the object and vnode were not disassociated
|
|
|
|
* while locks were dropped.
|
|
|
|
*/
|
|
|
|
if (vp->v_object != object) {
|
|
|
|
error = ENOENT;
|
|
|
|
goto unlock_all;
|
|
|
|
}
|
2019-09-16 15:04:45 +00:00
|
|
|
vm_page_lock(m);
|
2017-11-29 19:47:09 +00:00
|
|
|
|
2015-04-07 02:18:52 +00:00
|
|
|
/*
|
|
|
|
* While the object and page were unlocked, the page
|
|
|
|
* may have been:
|
|
|
|
* (1) moved to a different queue,
|
|
|
|
* (2) reallocated to a different object,
|
|
|
|
* (3) reallocated to a different offset, or
|
|
|
|
* (4) cleaned.
|
|
|
|
*/
|
Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty
pages, specificially, dirty pages that have passed once through the inactive
queue. A new, dedicated thread is responsible for both deciding when to
launder pages and actually laundering them. The new policy uses the
relative sizes of the inactive and laundry queues to determine whether to
launder pages at a given point in time. In general, this leads to more
intelligent swapping behavior, since the laundry thread will avoid pageouts
when the marginal benefit of doing so is low. Previously, without a
dedicated queue for dirty pages, the page daemon didn't have the information
to determine whether pageout provides any benefit to the system. Thus, the
previous policy often resulted in small but steadily increasing amounts of
swap usage when the system is under memory pressure, even when the inactive
queue consisted mostly of clean pages. This change addresses that issue,
and also paves the way for some future virtual memory system improvements by
removing the last source of object-cached clean pages, i.e., PG_CACHE pages.
The new laundry thread sleeps while waiting for a request from the page
daemon thread(s). A request is raised by setting the variable
vm_laundry_request and waking the laundry thread. We request launderings
for two reasons: to try and balance the inactive and laundry queue sizes
("background laundering"), and to quickly make up for a shortage of free
pages and clean inactive pages ("shortfall laundering"). When background
laundering is requested, the laundry thread computes the number of page
daemon wakeups that have taken place since the last laundering. If this
number is large enough relative to the ratio of the laundry and (global)
inactive queue sizes, we will launder vm_background_launder_target pages at
vm_background_launder_rate KB/s. Otherwise, the laundry thread goes back
to sleep without doing any work. When scanning the laundry queue during
background laundering, reactivated pages are counted towards the laundry
thread's target.
In contrast, shortfall laundering is requested when an inactive queue scan
fails to meet its target. In this case, the laundry thread attempts to
launder enough pages to meet v_free_target within 0.5s, which is the
inactive queue scan period.
A laundry request can be latched while another is currently being
serviced. In particular, a shortfall request will immediately preempt a
background laundering.
This change also redefines the meaning of vm_cnt.v_reactivated and removes
the functions vm_page_cache() and vm_page_try_to_cache(). The new meaning
of vm_cnt.v_reactivated now better reflects its name. It represents the
number of inactive or laundry pages that are returned to the active queue
on account of a reference.
In collaboration with: markj
Reviewed by: kib
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8302
2016-11-09 18:48:37 +00:00
|
|
|
if (!vm_page_in_laundry(m) || m->object != object ||
|
2015-04-07 02:18:52 +00:00
|
|
|
m->pindex != pindex || m->dirty == 0) {
|
|
|
|
vm_page_unlock(m);
|
|
|
|
error = ENXIO;
|
|
|
|
goto unlock_all;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
Change synchonization rules for vm_page reference counting.
There are several mechanisms by which a vm_page reference is held,
preventing the page from being freed back to the page allocator. In
particular, holding the page's object lock is sufficient to prevent the
page from being freed; holding the busy lock or a wiring is sufficent as
well. These references are protected by the page lock, which must
therefore be acquired for many per-page operations. This results in
false sharing since the page locks are external to the vm_page
structures themselves and each lock protects multiple structures.
Transition to using an atomically updated per-page reference counter.
The object's reference is counted using a flag bit in the counter. A
second flag bit is used to atomically block new references via
pmap_extract_and_hold() while removing managed mappings of a page.
Thus, the reference count of a page is guaranteed not to increase if the
page is unbusied, unmapped, and the object's write lock is held. As
a consequence of this, the page lock no longer protects a page's
identity; operations which move pages between objects are now
synchronized solely by the objects' locks.
The vm_page_wire() and vm_page_unwire() KPIs are changed. The former
requires that either the object lock or the busy lock is held. The
latter no longer has a return value and may free the page if it releases
the last reference to that page. vm_page_unwire_noq() behaves the same
as before; the caller is responsible for checking its return value and
freeing or enqueuing the page as appropriate. vm_page_wire_mapped() is
introduced for use in pmap_extract_and_hold(). It fails if the page is
concurrently being unmapped, typically triggering a fallback to the
fault handler. vm_page_wire() no longer requires the page lock and
vm_page_unwire() now internally acquires the page lock when releasing
the last wiring of a page (since the page lock still protects a page's
queue state). In particular, synchronization details are no longer
leaked into the caller.
The change excises the page lock from several frequently executed code
paths. In particular, vm_object_terminate() no longer bounces between
page locks as it releases an object's pages, and direct I/O and
sendfile(SF_NOCACHE) completions no longer require the page lock. In
these latter cases we now get linear scalability in the common scenario
where different threads are operating on different files.
__FreeBSD_version is bumped. The DRM ports have been updated to
accomodate the KPI changes.
Reviewed by: jeff (earlier version)
Tested by: gallatin (earlier version), pho
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20486
2019-09-09 21:32:42 +00:00
|
|
|
* The page may have been busied while the object and page
|
|
|
|
* locks were released.
|
2015-04-07 02:18:52 +00:00
|
|
|
*/
|
Change synchonization rules for vm_page reference counting.
There are several mechanisms by which a vm_page reference is held,
preventing the page from being freed back to the page allocator. In
particular, holding the page's object lock is sufficient to prevent the
page from being freed; holding the busy lock or a wiring is sufficent as
well. These references are protected by the page lock, which must
therefore be acquired for many per-page operations. This results in
false sharing since the page locks are external to the vm_page
structures themselves and each lock protects multiple structures.
Transition to using an atomically updated per-page reference counter.
The object's reference is counted using a flag bit in the counter. A
second flag bit is used to atomically block new references via
pmap_extract_and_hold() while removing managed mappings of a page.
Thus, the reference count of a page is guaranteed not to increase if the
page is unbusied, unmapped, and the object's write lock is held. As
a consequence of this, the page lock no longer protects a page's
identity; operations which move pages between objects are now
synchronized solely by the objects' locks.
The vm_page_wire() and vm_page_unwire() KPIs are changed. The former
requires that either the object lock or the busy lock is held. The
latter no longer has a return value and may free the page if it releases
the last reference to that page. vm_page_unwire_noq() behaves the same
as before; the caller is responsible for checking its return value and
freeing or enqueuing the page as appropriate. vm_page_wire_mapped() is
introduced for use in pmap_extract_and_hold(). It fails if the page is
concurrently being unmapped, typically triggering a fallback to the
fault handler. vm_page_wire() no longer requires the page lock and
vm_page_unwire() now internally acquires the page lock when releasing
the last wiring of a page (since the page lock still protects a page's
queue state). In particular, synchronization details are no longer
leaked into the caller.
The change excises the page lock from several frequently executed code
paths. In particular, vm_object_terminate() no longer bounces between
page locks as it releases an object's pages, and direct I/O and
sendfile(SF_NOCACHE) completions no longer require the page lock. In
these latter cases we now get linear scalability in the common scenario
where different threads are operating on different files.
__FreeBSD_version is bumped. The DRM ports have been updated to
accomodate the KPI changes.
Reviewed by: jeff (earlier version)
Tested by: gallatin (earlier version), pho
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20486
2019-09-09 21:32:42 +00:00
|
|
|
if (vm_page_busied(m)) {
|
2015-04-07 02:18:52 +00:00
|
|
|
vm_page_unlock(m);
|
|
|
|
error = EBUSY;
|
|
|
|
goto unlock_all;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
Change synchonization rules for vm_page reference counting.
There are several mechanisms by which a vm_page reference is held,
preventing the page from being freed back to the page allocator. In
particular, holding the page's object lock is sufficient to prevent the
page from being freed; holding the busy lock or a wiring is sufficent as
well. These references are protected by the page lock, which must
therefore be acquired for many per-page operations. This results in
false sharing since the page locks are external to the vm_page
structures themselves and each lock protects multiple structures.
Transition to using an atomically updated per-page reference counter.
The object's reference is counted using a flag bit in the counter. A
second flag bit is used to atomically block new references via
pmap_extract_and_hold() while removing managed mappings of a page.
Thus, the reference count of a page is guaranteed not to increase if the
page is unbusied, unmapped, and the object's write lock is held. As
a consequence of this, the page lock no longer protects a page's
identity; operations which move pages between objects are now
synchronized solely by the objects' locks.
The vm_page_wire() and vm_page_unwire() KPIs are changed. The former
requires that either the object lock or the busy lock is held. The
latter no longer has a return value and may free the page if it releases
the last reference to that page. vm_page_unwire_noq() behaves the same
as before; the caller is responsible for checking its return value and
freeing or enqueuing the page as appropriate. vm_page_wire_mapped() is
introduced for use in pmap_extract_and_hold(). It fails if the page is
concurrently being unmapped, typically triggering a fallback to the
fault handler. vm_page_wire() no longer requires the page lock and
vm_page_unwire() now internally acquires the page lock when releasing
the last wiring of a page (since the page lock still protects a page's
queue state). In particular, synchronization details are no longer
leaked into the caller.
The change excises the page lock from several frequently executed code
paths. In particular, vm_object_terminate() no longer bounces between
page locks as it releases an object's pages, and direct I/O and
sendfile(SF_NOCACHE) completions no longer require the page lock. In
these latter cases we now get linear scalability in the common scenario
where different threads are operating on different files.
__FreeBSD_version is bumped. The DRM ports have been updated to
accomodate the KPI changes.
Reviewed by: jeff (earlier version)
Tested by: gallatin (earlier version), pho
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20486
2019-09-09 21:32:42 +00:00
|
|
|
/*
|
|
|
|
* Remove all writeable mappings, failing if the page is wired.
|
|
|
|
*/
|
|
|
|
if (!vm_page_try_remove_write(m)) {
|
|
|
|
vm_page_unlock(m);
|
|
|
|
error = EBUSY;
|
|
|
|
goto unlock_all;
|
|
|
|
}
|
2019-09-16 15:04:45 +00:00
|
|
|
vm_page_unlock(m);
|
Change synchonization rules for vm_page reference counting.
There are several mechanisms by which a vm_page reference is held,
preventing the page from being freed back to the page allocator. In
particular, holding the page's object lock is sufficient to prevent the
page from being freed; holding the busy lock or a wiring is sufficent as
well. These references are protected by the page lock, which must
therefore be acquired for many per-page operations. This results in
false sharing since the page locks are external to the vm_page
structures themselves and each lock protects multiple structures.
Transition to using an atomically updated per-page reference counter.
The object's reference is counted using a flag bit in the counter. A
second flag bit is used to atomically block new references via
pmap_extract_and_hold() while removing managed mappings of a page.
Thus, the reference count of a page is guaranteed not to increase if the
page is unbusied, unmapped, and the object's write lock is held. As
a consequence of this, the page lock no longer protects a page's
identity; operations which move pages between objects are now
synchronized solely by the objects' locks.
The vm_page_wire() and vm_page_unwire() KPIs are changed. The former
requires that either the object lock or the busy lock is held. The
latter no longer has a return value and may free the page if it releases
the last reference to that page. vm_page_unwire_noq() behaves the same
as before; the caller is responsible for checking its return value and
freeing or enqueuing the page as appropriate. vm_page_wire_mapped() is
introduced for use in pmap_extract_and_hold(). It fails if the page is
concurrently being unmapped, typically triggering a fallback to the
fault handler. vm_page_wire() no longer requires the page lock and
vm_page_unwire() now internally acquires the page lock when releasing
the last wiring of a page (since the page lock still protects a page's
queue state). In particular, synchronization details are no longer
leaked into the caller.
The change excises the page lock from several frequently executed code
paths. In particular, vm_object_terminate() no longer bounces between
page locks as it releases an object's pages, and direct I/O and
sendfile(SF_NOCACHE) completions no longer require the page lock. In
these latter cases we now get linear scalability in the common scenario
where different threads are operating on different files.
__FreeBSD_version is bumped. The DRM ports have been updated to
accomodate the KPI changes.
Reviewed by: jeff (earlier version)
Tested by: gallatin (earlier version), pho
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20486
2019-09-09 21:32:42 +00:00
|
|
|
|
2015-04-07 02:18:52 +00:00
|
|
|
/*
|
|
|
|
* If a page is dirty, then it is either being washed
|
|
|
|
* (but not yet cleaned) or it is still in the
|
|
|
|
* laundry. If it is still in the laundry, then we
|
|
|
|
* start the cleaning operation.
|
|
|
|
*/
|
Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty
pages, specificially, dirty pages that have passed once through the inactive
queue. A new, dedicated thread is responsible for both deciding when to
launder pages and actually laundering them. The new policy uses the
relative sizes of the inactive and laundry queues to determine whether to
launder pages at a given point in time. In general, this leads to more
intelligent swapping behavior, since the laundry thread will avoid pageouts
when the marginal benefit of doing so is low. Previously, without a
dedicated queue for dirty pages, the page daemon didn't have the information
to determine whether pageout provides any benefit to the system. Thus, the
previous policy often resulted in small but steadily increasing amounts of
swap usage when the system is under memory pressure, even when the inactive
queue consisted mostly of clean pages. This change addresses that issue,
and also paves the way for some future virtual memory system improvements by
removing the last source of object-cached clean pages, i.e., PG_CACHE pages.
The new laundry thread sleeps while waiting for a request from the page
daemon thread(s). A request is raised by setting the variable
vm_laundry_request and waking the laundry thread. We request launderings
for two reasons: to try and balance the inactive and laundry queue sizes
("background laundering"), and to quickly make up for a shortage of free
pages and clean inactive pages ("shortfall laundering"). When background
laundering is requested, the laundry thread computes the number of page
daemon wakeups that have taken place since the last laundering. If this
number is large enough relative to the ratio of the laundry and (global)
inactive queue sizes, we will launder vm_background_launder_target pages at
vm_background_launder_rate KB/s. Otherwise, the laundry thread goes back
to sleep without doing any work. When scanning the laundry queue during
background laundering, reactivated pages are counted towards the laundry
thread's target.
In contrast, shortfall laundering is requested when an inactive queue scan
fails to meet its target. In this case, the laundry thread attempts to
launder enough pages to meet v_free_target within 0.5s, which is the
inactive queue scan period.
A laundry request can be latched while another is currently being
serviced. In particular, a shortfall request will immediately preempt a
background laundering.
This change also redefines the meaning of vm_cnt.v_reactivated and removes
the functions vm_page_cache() and vm_page_try_to_cache(). The new meaning
of vm_cnt.v_reactivated now better reflects its name. It represents the
number of inactive or laundry pages that are returned to the active queue
on account of a reference.
In collaboration with: markj
Reviewed by: kib
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8302
2016-11-09 18:48:37 +00:00
|
|
|
if ((*numpagedout = vm_pageout_cluster(m)) == 0)
|
2015-04-07 02:18:52 +00:00
|
|
|
error = EIO;
|
|
|
|
|
|
|
|
unlock_all:
|
|
|
|
VM_OBJECT_WUNLOCK(object);
|
|
|
|
|
|
|
|
unlock_mp:
|
|
|
|
vm_page_lock_assert(m, MA_NOTOWNED);
|
|
|
|
if (mp != NULL) {
|
|
|
|
if (vp != NULL)
|
|
|
|
vput(vp);
|
|
|
|
vm_object_deallocate(object);
|
|
|
|
vn_finished_write(mp);
|
|
|
|
}
|
|
|
|
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty
pages, specificially, dirty pages that have passed once through the inactive
queue. A new, dedicated thread is responsible for both deciding when to
launder pages and actually laundering them. The new policy uses the
relative sizes of the inactive and laundry queues to determine whether to
launder pages at a given point in time. In general, this leads to more
intelligent swapping behavior, since the laundry thread will avoid pageouts
when the marginal benefit of doing so is low. Previously, without a
dedicated queue for dirty pages, the page daemon didn't have the information
to determine whether pageout provides any benefit to the system. Thus, the
previous policy often resulted in small but steadily increasing amounts of
swap usage when the system is under memory pressure, even when the inactive
queue consisted mostly of clean pages. This change addresses that issue,
and also paves the way for some future virtual memory system improvements by
removing the last source of object-cached clean pages, i.e., PG_CACHE pages.
The new laundry thread sleeps while waiting for a request from the page
daemon thread(s). A request is raised by setting the variable
vm_laundry_request and waking the laundry thread. We request launderings
for two reasons: to try and balance the inactive and laundry queue sizes
("background laundering"), and to quickly make up for a shortage of free
pages and clean inactive pages ("shortfall laundering"). When background
laundering is requested, the laundry thread computes the number of page
daemon wakeups that have taken place since the last laundering. If this
number is large enough relative to the ratio of the laundry and (global)
inactive queue sizes, we will launder vm_background_launder_target pages at
vm_background_launder_rate KB/s. Otherwise, the laundry thread goes back
to sleep without doing any work. When scanning the laundry queue during
background laundering, reactivated pages are counted towards the laundry
thread's target.
In contrast, shortfall laundering is requested when an inactive queue scan
fails to meet its target. In this case, the laundry thread attempts to
launder enough pages to meet v_free_target within 0.5s, which is the
inactive queue scan period.
A laundry request can be latched while another is currently being
serviced. In particular, a shortfall request will immediately preempt a
background laundering.
This change also redefines the meaning of vm_cnt.v_reactivated and removes
the functions vm_page_cache() and vm_page_try_to_cache(). The new meaning
of vm_cnt.v_reactivated now better reflects its name. It represents the
number of inactive or laundry pages that are returned to the active queue
on account of a reference.
In collaboration with: markj
Reviewed by: kib
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8302
2016-11-09 18:48:37 +00:00
|
|
|
/*
|
|
|
|
* Attempt to launder the specified number of pages.
|
|
|
|
*
|
|
|
|
* Returns the number of pages successfully laundered.
|
|
|
|
*/
|
|
|
|
static int
|
|
|
|
vm_pageout_launder(struct vm_domain *vmd, int launder, bool in_shortfall)
|
|
|
|
{
|
2018-04-24 21:15:54 +00:00
|
|
|
struct scan_state ss;
|
Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty
pages, specificially, dirty pages that have passed once through the inactive
queue. A new, dedicated thread is responsible for both deciding when to
launder pages and actually laundering them. The new policy uses the
relative sizes of the inactive and laundry queues to determine whether to
launder pages at a given point in time. In general, this leads to more
intelligent swapping behavior, since the laundry thread will avoid pageouts
when the marginal benefit of doing so is low. Previously, without a
dedicated queue for dirty pages, the page daemon didn't have the information
to determine whether pageout provides any benefit to the system. Thus, the
previous policy often resulted in small but steadily increasing amounts of
swap usage when the system is under memory pressure, even when the inactive
queue consisted mostly of clean pages. This change addresses that issue,
and also paves the way for some future virtual memory system improvements by
removing the last source of object-cached clean pages, i.e., PG_CACHE pages.
The new laundry thread sleeps while waiting for a request from the page
daemon thread(s). A request is raised by setting the variable
vm_laundry_request and waking the laundry thread. We request launderings
for two reasons: to try and balance the inactive and laundry queue sizes
("background laundering"), and to quickly make up for a shortage of free
pages and clean inactive pages ("shortfall laundering"). When background
laundering is requested, the laundry thread computes the number of page
daemon wakeups that have taken place since the last laundering. If this
number is large enough relative to the ratio of the laundry and (global)
inactive queue sizes, we will launder vm_background_launder_target pages at
vm_background_launder_rate KB/s. Otherwise, the laundry thread goes back
to sleep without doing any work. When scanning the laundry queue during
background laundering, reactivated pages are counted towards the laundry
thread's target.
In contrast, shortfall laundering is requested when an inactive queue scan
fails to meet its target. In this case, the laundry thread attempts to
launder enough pages to meet v_free_target within 0.5s, which is the
inactive queue scan period.
A laundry request can be latched while another is currently being
serviced. In particular, a shortfall request will immediately preempt a
background laundering.
This change also redefines the meaning of vm_cnt.v_reactivated and removes
the functions vm_page_cache() and vm_page_try_to_cache(). The new meaning
of vm_cnt.v_reactivated now better reflects its name. It represents the
number of inactive or laundry pages that are returned to the active queue
on account of a reference.
In collaboration with: markj
Reviewed by: kib
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8302
2016-11-09 18:48:37 +00:00
|
|
|
struct vm_pagequeue *pq;
|
2019-09-16 15:04:45 +00:00
|
|
|
struct mtx *mtx;
|
Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty
pages, specificially, dirty pages that have passed once through the inactive
queue. A new, dedicated thread is responsible for both deciding when to
launder pages and actually laundering them. The new policy uses the
relative sizes of the inactive and laundry queues to determine whether to
launder pages at a given point in time. In general, this leads to more
intelligent swapping behavior, since the laundry thread will avoid pageouts
when the marginal benefit of doing so is low. Previously, without a
dedicated queue for dirty pages, the page daemon didn't have the information
to determine whether pageout provides any benefit to the system. Thus, the
previous policy often resulted in small but steadily increasing amounts of
swap usage when the system is under memory pressure, even when the inactive
queue consisted mostly of clean pages. This change addresses that issue,
and also paves the way for some future virtual memory system improvements by
removing the last source of object-cached clean pages, i.e., PG_CACHE pages.
The new laundry thread sleeps while waiting for a request from the page
daemon thread(s). A request is raised by setting the variable
vm_laundry_request and waking the laundry thread. We request launderings
for two reasons: to try and balance the inactive and laundry queue sizes
("background laundering"), and to quickly make up for a shortage of free
pages and clean inactive pages ("shortfall laundering"). When background
laundering is requested, the laundry thread computes the number of page
daemon wakeups that have taken place since the last laundering. If this
number is large enough relative to the ratio of the laundry and (global)
inactive queue sizes, we will launder vm_background_launder_target pages at
vm_background_launder_rate KB/s. Otherwise, the laundry thread goes back
to sleep without doing any work. When scanning the laundry queue during
background laundering, reactivated pages are counted towards the laundry
thread's target.
In contrast, shortfall laundering is requested when an inactive queue scan
fails to meet its target. In this case, the laundry thread attempts to
launder enough pages to meet v_free_target within 0.5s, which is the
inactive queue scan period.
A laundry request can be latched while another is currently being
serviced. In particular, a shortfall request will immediately preempt a
background laundering.
This change also redefines the meaning of vm_cnt.v_reactivated and removes
the functions vm_page_cache() and vm_page_try_to_cache(). The new meaning
of vm_cnt.v_reactivated now better reflects its name. It represents the
number of inactive or laundry pages that are returned to the active queue
on account of a reference.
In collaboration with: markj
Reviewed by: kib
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8302
2016-11-09 18:48:37 +00:00
|
|
|
vm_object_t object;
|
2018-04-24 21:15:54 +00:00
|
|
|
vm_page_t m, marker;
|
2019-09-16 15:04:45 +00:00
|
|
|
int act_delta, error, numpagedout, queue, starting_target;
|
Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty
pages, specificially, dirty pages that have passed once through the inactive
queue. A new, dedicated thread is responsible for both deciding when to
launder pages and actually laundering them. The new policy uses the
relative sizes of the inactive and laundry queues to determine whether to
launder pages at a given point in time. In general, this leads to more
intelligent swapping behavior, since the laundry thread will avoid pageouts
when the marginal benefit of doing so is low. Previously, without a
dedicated queue for dirty pages, the page daemon didn't have the information
to determine whether pageout provides any benefit to the system. Thus, the
previous policy often resulted in small but steadily increasing amounts of
swap usage when the system is under memory pressure, even when the inactive
queue consisted mostly of clean pages. This change addresses that issue,
and also paves the way for some future virtual memory system improvements by
removing the last source of object-cached clean pages, i.e., PG_CACHE pages.
The new laundry thread sleeps while waiting for a request from the page
daemon thread(s). A request is raised by setting the variable
vm_laundry_request and waking the laundry thread. We request launderings
for two reasons: to try and balance the inactive and laundry queue sizes
("background laundering"), and to quickly make up for a shortage of free
pages and clean inactive pages ("shortfall laundering"). When background
laundering is requested, the laundry thread computes the number of page
daemon wakeups that have taken place since the last laundering. If this
number is large enough relative to the ratio of the laundry and (global)
inactive queue sizes, we will launder vm_background_launder_target pages at
vm_background_launder_rate KB/s. Otherwise, the laundry thread goes back
to sleep without doing any work. When scanning the laundry queue during
background laundering, reactivated pages are counted towards the laundry
thread's target.
In contrast, shortfall laundering is requested when an inactive queue scan
fails to meet its target. In this case, the laundry thread attempts to
launder enough pages to meet v_free_target within 0.5s, which is the
inactive queue scan period.
A laundry request can be latched while another is currently being
serviced. In particular, a shortfall request will immediately preempt a
background laundering.
This change also redefines the meaning of vm_cnt.v_reactivated and removes
the functions vm_page_cache() and vm_page_try_to_cache(). The new meaning
of vm_cnt.v_reactivated now better reflects its name. It represents the
number of inactive or laundry pages that are returned to the active queue
on account of a reference.
In collaboration with: markj
Reviewed by: kib
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8302
2016-11-09 18:48:37 +00:00
|
|
|
int vnodes_skipped;
|
2019-02-17 16:35:19 +00:00
|
|
|
bool pageout_ok;
|
Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty
pages, specificially, dirty pages that have passed once through the inactive
queue. A new, dedicated thread is responsible for both deciding when to
launder pages and actually laundering them. The new policy uses the
relative sizes of the inactive and laundry queues to determine whether to
launder pages at a given point in time. In general, this leads to more
intelligent swapping behavior, since the laundry thread will avoid pageouts
when the marginal benefit of doing so is low. Previously, without a
dedicated queue for dirty pages, the page daemon didn't have the information
to determine whether pageout provides any benefit to the system. Thus, the
previous policy often resulted in small but steadily increasing amounts of
swap usage when the system is under memory pressure, even when the inactive
queue consisted mostly of clean pages. This change addresses that issue,
and also paves the way for some future virtual memory system improvements by
removing the last source of object-cached clean pages, i.e., PG_CACHE pages.
The new laundry thread sleeps while waiting for a request from the page
daemon thread(s). A request is raised by setting the variable
vm_laundry_request and waking the laundry thread. We request launderings
for two reasons: to try and balance the inactive and laundry queue sizes
("background laundering"), and to quickly make up for a shortage of free
pages and clean inactive pages ("shortfall laundering"). When background
laundering is requested, the laundry thread computes the number of page
daemon wakeups that have taken place since the last laundering. If this
number is large enough relative to the ratio of the laundry and (global)
inactive queue sizes, we will launder vm_background_launder_target pages at
vm_background_launder_rate KB/s. Otherwise, the laundry thread goes back
to sleep without doing any work. When scanning the laundry queue during
background laundering, reactivated pages are counted towards the laundry
thread's target.
In contrast, shortfall laundering is requested when an inactive queue scan
fails to meet its target. In this case, the laundry thread attempts to
launder enough pages to meet v_free_target within 0.5s, which is the
inactive queue scan period.
A laundry request can be latched while another is currently being
serviced. In particular, a shortfall request will immediately preempt a
background laundering.
This change also redefines the meaning of vm_cnt.v_reactivated and removes
the functions vm_page_cache() and vm_page_try_to_cache(). The new meaning
of vm_cnt.v_reactivated now better reflects its name. It represents the
number of inactive or laundry pages that are returned to the active queue
on account of a reference.
In collaboration with: markj
Reviewed by: kib
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8302
2016-11-09 18:48:37 +00:00
|
|
|
|
2019-09-16 15:04:45 +00:00
|
|
|
mtx = NULL;
|
2018-04-24 21:15:54 +00:00
|
|
|
object = NULL;
|
Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty
pages, specificially, dirty pages that have passed once through the inactive
queue. A new, dedicated thread is responsible for both deciding when to
launder pages and actually laundering them. The new policy uses the
relative sizes of the inactive and laundry queues to determine whether to
launder pages at a given point in time. In general, this leads to more
intelligent swapping behavior, since the laundry thread will avoid pageouts
when the marginal benefit of doing so is low. Previously, without a
dedicated queue for dirty pages, the page daemon didn't have the information
to determine whether pageout provides any benefit to the system. Thus, the
previous policy often resulted in small but steadily increasing amounts of
swap usage when the system is under memory pressure, even when the inactive
queue consisted mostly of clean pages. This change addresses that issue,
and also paves the way for some future virtual memory system improvements by
removing the last source of object-cached clean pages, i.e., PG_CACHE pages.
The new laundry thread sleeps while waiting for a request from the page
daemon thread(s). A request is raised by setting the variable
vm_laundry_request and waking the laundry thread. We request launderings
for two reasons: to try and balance the inactive and laundry queue sizes
("background laundering"), and to quickly make up for a shortage of free
pages and clean inactive pages ("shortfall laundering"). When background
laundering is requested, the laundry thread computes the number of page
daemon wakeups that have taken place since the last laundering. If this
number is large enough relative to the ratio of the laundry and (global)
inactive queue sizes, we will launder vm_background_launder_target pages at
vm_background_launder_rate KB/s. Otherwise, the laundry thread goes back
to sleep without doing any work. When scanning the laundry queue during
background laundering, reactivated pages are counted towards the laundry
thread's target.
In contrast, shortfall laundering is requested when an inactive queue scan
fails to meet its target. In this case, the laundry thread attempts to
launder enough pages to meet v_free_target within 0.5s, which is the
inactive queue scan period.
A laundry request can be latched while another is currently being
serviced. In particular, a shortfall request will immediately preempt a
background laundering.
This change also redefines the meaning of vm_cnt.v_reactivated and removes
the functions vm_page_cache() and vm_page_try_to_cache(). The new meaning
of vm_cnt.v_reactivated now better reflects its name. It represents the
number of inactive or laundry pages that are returned to the active queue
on account of a reference.
In collaboration with: markj
Reviewed by: kib
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8302
2016-11-09 18:48:37 +00:00
|
|
|
starting_target = launder;
|
|
|
|
vnodes_skipped = 0;
|
|
|
|
|
|
|
|
/*
|
2017-01-03 00:05:44 +00:00
|
|
|
* Scan the laundry queues for pages eligible to be laundered. We stop
|
Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty
pages, specificially, dirty pages that have passed once through the inactive
queue. A new, dedicated thread is responsible for both deciding when to
launder pages and actually laundering them. The new policy uses the
relative sizes of the inactive and laundry queues to determine whether to
launder pages at a given point in time. In general, this leads to more
intelligent swapping behavior, since the laundry thread will avoid pageouts
when the marginal benefit of doing so is low. Previously, without a
dedicated queue for dirty pages, the page daemon didn't have the information
to determine whether pageout provides any benefit to the system. Thus, the
previous policy often resulted in small but steadily increasing amounts of
swap usage when the system is under memory pressure, even when the inactive
queue consisted mostly of clean pages. This change addresses that issue,
and also paves the way for some future virtual memory system improvements by
removing the last source of object-cached clean pages, i.e., PG_CACHE pages.
The new laundry thread sleeps while waiting for a request from the page
daemon thread(s). A request is raised by setting the variable
vm_laundry_request and waking the laundry thread. We request launderings
for two reasons: to try and balance the inactive and laundry queue sizes
("background laundering"), and to quickly make up for a shortage of free
pages and clean inactive pages ("shortfall laundering"). When background
laundering is requested, the laundry thread computes the number of page
daemon wakeups that have taken place since the last laundering. If this
number is large enough relative to the ratio of the laundry and (global)
inactive queue sizes, we will launder vm_background_launder_target pages at
vm_background_launder_rate KB/s. Otherwise, the laundry thread goes back
to sleep without doing any work. When scanning the laundry queue during
background laundering, reactivated pages are counted towards the laundry
thread's target.
In contrast, shortfall laundering is requested when an inactive queue scan
fails to meet its target. In this case, the laundry thread attempts to
launder enough pages to meet v_free_target within 0.5s, which is the
inactive queue scan period.
A laundry request can be latched while another is currently being
serviced. In particular, a shortfall request will immediately preempt a
background laundering.
This change also redefines the meaning of vm_cnt.v_reactivated and removes
the functions vm_page_cache() and vm_page_try_to_cache(). The new meaning
of vm_cnt.v_reactivated now better reflects its name. It represents the
number of inactive or laundry pages that are returned to the active queue
on account of a reference.
In collaboration with: markj
Reviewed by: kib
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8302
2016-11-09 18:48:37 +00:00
|
|
|
* once the target number of dirty pages have been laundered, or once
|
|
|
|
* we've reached the end of the queue. A single iteration of this loop
|
|
|
|
* may cause more than one page to be laundered because of clustering.
|
|
|
|
*
|
2017-01-03 00:05:44 +00:00
|
|
|
* As an optimization, we avoid laundering from PQ_UNSWAPPABLE when no
|
|
|
|
* swap devices are configured.
|
Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty
pages, specificially, dirty pages that have passed once through the inactive
queue. A new, dedicated thread is responsible for both deciding when to
launder pages and actually laundering them. The new policy uses the
relative sizes of the inactive and laundry queues to determine whether to
launder pages at a given point in time. In general, this leads to more
intelligent swapping behavior, since the laundry thread will avoid pageouts
when the marginal benefit of doing so is low. Previously, without a
dedicated queue for dirty pages, the page daemon didn't have the information
to determine whether pageout provides any benefit to the system. Thus, the
previous policy often resulted in small but steadily increasing amounts of
swap usage when the system is under memory pressure, even when the inactive
queue consisted mostly of clean pages. This change addresses that issue,
and also paves the way for some future virtual memory system improvements by
removing the last source of object-cached clean pages, i.e., PG_CACHE pages.
The new laundry thread sleeps while waiting for a request from the page
daemon thread(s). A request is raised by setting the variable
vm_laundry_request and waking the laundry thread. We request launderings
for two reasons: to try and balance the inactive and laundry queue sizes
("background laundering"), and to quickly make up for a shortage of free
pages and clean inactive pages ("shortfall laundering"). When background
laundering is requested, the laundry thread computes the number of page
daemon wakeups that have taken place since the last laundering. If this
number is large enough relative to the ratio of the laundry and (global)
inactive queue sizes, we will launder vm_background_launder_target pages at
vm_background_launder_rate KB/s. Otherwise, the laundry thread goes back
to sleep without doing any work. When scanning the laundry queue during
background laundering, reactivated pages are counted towards the laundry
thread's target.
In contrast, shortfall laundering is requested when an inactive queue scan
fails to meet its target. In this case, the laundry thread attempts to
launder enough pages to meet v_free_target within 0.5s, which is the
inactive queue scan period.
A laundry request can be latched while another is currently being
serviced. In particular, a shortfall request will immediately preempt a
background laundering.
This change also redefines the meaning of vm_cnt.v_reactivated and removes
the functions vm_page_cache() and vm_page_try_to_cache(). The new meaning
of vm_cnt.v_reactivated now better reflects its name. It represents the
number of inactive or laundry pages that are returned to the active queue
on account of a reference.
In collaboration with: markj
Reviewed by: kib
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8302
2016-11-09 18:48:37 +00:00
|
|
|
*/
|
2017-01-03 00:05:44 +00:00
|
|
|
if (atomic_load_acq_int(&swapdev_enabled))
|
2018-04-19 14:09:44 +00:00
|
|
|
queue = PQ_UNSWAPPABLE;
|
2017-01-03 00:05:44 +00:00
|
|
|
else
|
2018-04-19 14:09:44 +00:00
|
|
|
queue = PQ_LAUNDRY;
|
Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty
pages, specificially, dirty pages that have passed once through the inactive
queue. A new, dedicated thread is responsible for both deciding when to
launder pages and actually laundering them. The new policy uses the
relative sizes of the inactive and laundry queues to determine whether to
launder pages at a given point in time. In general, this leads to more
intelligent swapping behavior, since the laundry thread will avoid pageouts
when the marginal benefit of doing so is low. Previously, without a
dedicated queue for dirty pages, the page daemon didn't have the information
to determine whether pageout provides any benefit to the system. Thus, the
previous policy often resulted in small but steadily increasing amounts of
swap usage when the system is under memory pressure, even when the inactive
queue consisted mostly of clean pages. This change addresses that issue,
and also paves the way for some future virtual memory system improvements by
removing the last source of object-cached clean pages, i.e., PG_CACHE pages.
The new laundry thread sleeps while waiting for a request from the page
daemon thread(s). A request is raised by setting the variable
vm_laundry_request and waking the laundry thread. We request launderings
for two reasons: to try and balance the inactive and laundry queue sizes
("background laundering"), and to quickly make up for a shortage of free
pages and clean inactive pages ("shortfall laundering"). When background
laundering is requested, the laundry thread computes the number of page
daemon wakeups that have taken place since the last laundering. If this
number is large enough relative to the ratio of the laundry and (global)
inactive queue sizes, we will launder vm_background_launder_target pages at
vm_background_launder_rate KB/s. Otherwise, the laundry thread goes back
to sleep without doing any work. When scanning the laundry queue during
background laundering, reactivated pages are counted towards the laundry
thread's target.
In contrast, shortfall laundering is requested when an inactive queue scan
fails to meet its target. In this case, the laundry thread attempts to
launder enough pages to meet v_free_target within 0.5s, which is the
inactive queue scan period.
A laundry request can be latched while another is currently being
serviced. In particular, a shortfall request will immediately preempt a
background laundering.
This change also redefines the meaning of vm_cnt.v_reactivated and removes
the functions vm_page_cache() and vm_page_try_to_cache(). The new meaning
of vm_cnt.v_reactivated now better reflects its name. It represents the
number of inactive or laundry pages that are returned to the active queue
on account of a reference.
In collaboration with: markj
Reviewed by: kib
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8302
2016-11-09 18:48:37 +00:00
|
|
|
|
2017-01-03 00:05:44 +00:00
|
|
|
scan:
|
2018-04-19 14:09:44 +00:00
|
|
|
marker = &vmd->vmd_markers[queue];
|
2018-04-24 21:15:54 +00:00
|
|
|
pq = &vmd->vmd_pagequeues[queue];
|
Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty
pages, specificially, dirty pages that have passed once through the inactive
queue. A new, dedicated thread is responsible for both deciding when to
launder pages and actually laundering them. The new policy uses the
relative sizes of the inactive and laundry queues to determine whether to
launder pages at a given point in time. In general, this leads to more
intelligent swapping behavior, since the laundry thread will avoid pageouts
when the marginal benefit of doing so is low. Previously, without a
dedicated queue for dirty pages, the page daemon didn't have the information
to determine whether pageout provides any benefit to the system. Thus, the
previous policy often resulted in small but steadily increasing amounts of
swap usage when the system is under memory pressure, even when the inactive
queue consisted mostly of clean pages. This change addresses that issue,
and also paves the way for some future virtual memory system improvements by
removing the last source of object-cached clean pages, i.e., PG_CACHE pages.
The new laundry thread sleeps while waiting for a request from the page
daemon thread(s). A request is raised by setting the variable
vm_laundry_request and waking the laundry thread. We request launderings
for two reasons: to try and balance the inactive and laundry queue sizes
("background laundering"), and to quickly make up for a shortage of free
pages and clean inactive pages ("shortfall laundering"). When background
laundering is requested, the laundry thread computes the number of page
daemon wakeups that have taken place since the last laundering. If this
number is large enough relative to the ratio of the laundry and (global)
inactive queue sizes, we will launder vm_background_launder_target pages at
vm_background_launder_rate KB/s. Otherwise, the laundry thread goes back
to sleep without doing any work. When scanning the laundry queue during
background laundering, reactivated pages are counted towards the laundry
thread's target.
In contrast, shortfall laundering is requested when an inactive queue scan
fails to meet its target. In this case, the laundry thread attempts to
launder enough pages to meet v_free_target within 0.5s, which is the
inactive queue scan period.
A laundry request can be latched while another is currently being
serviced. In particular, a shortfall request will immediately preempt a
background laundering.
This change also redefines the meaning of vm_cnt.v_reactivated and removes
the functions vm_page_cache() and vm_page_try_to_cache(). The new meaning
of vm_cnt.v_reactivated now better reflects its name. It represents the
number of inactive or laundry pages that are returned to the active queue
on account of a reference.
In collaboration with: markj
Reviewed by: kib
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8302
2016-11-09 18:48:37 +00:00
|
|
|
vm_pagequeue_lock(pq);
|
2018-04-24 21:15:54 +00:00
|
|
|
vm_pageout_init_scan(&ss, pq, marker, NULL, pq->pq_cnt);
|
|
|
|
while (launder > 0 && (m = vm_pageout_next(&ss, false)) != NULL) {
|
|
|
|
if (__predict_false((m->flags & PG_MARKER) != 0))
|
Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty
pages, specificially, dirty pages that have passed once through the inactive
queue. A new, dedicated thread is responsible for both deciding when to
launder pages and actually laundering them. The new policy uses the
relative sizes of the inactive and laundry queues to determine whether to
launder pages at a given point in time. In general, this leads to more
intelligent swapping behavior, since the laundry thread will avoid pageouts
when the marginal benefit of doing so is low. Previously, without a
dedicated queue for dirty pages, the page daemon didn't have the information
to determine whether pageout provides any benefit to the system. Thus, the
previous policy often resulted in small but steadily increasing amounts of
swap usage when the system is under memory pressure, even when the inactive
queue consisted mostly of clean pages. This change addresses that issue,
and also paves the way for some future virtual memory system improvements by
removing the last source of object-cached clean pages, i.e., PG_CACHE pages.
The new laundry thread sleeps while waiting for a request from the page
daemon thread(s). A request is raised by setting the variable
vm_laundry_request and waking the laundry thread. We request launderings
for two reasons: to try and balance the inactive and laundry queue sizes
("background laundering"), and to quickly make up for a shortage of free
pages and clean inactive pages ("shortfall laundering"). When background
laundering is requested, the laundry thread computes the number of page
daemon wakeups that have taken place since the last laundering. If this
number is large enough relative to the ratio of the laundry and (global)
inactive queue sizes, we will launder vm_background_launder_target pages at
vm_background_launder_rate KB/s. Otherwise, the laundry thread goes back
to sleep without doing any work. When scanning the laundry queue during
background laundering, reactivated pages are counted towards the laundry
thread's target.
In contrast, shortfall laundering is requested when an inactive queue scan
fails to meet its target. In this case, the laundry thread attempts to
launder enough pages to meet v_free_target within 0.5s, which is the
inactive queue scan period.
A laundry request can be latched while another is currently being
serviced. In particular, a shortfall request will immediately preempt a
background laundering.
This change also redefines the meaning of vm_cnt.v_reactivated and removes
the functions vm_page_cache() and vm_page_try_to_cache(). The new meaning
of vm_cnt.v_reactivated now better reflects its name. It represents the
number of inactive or laundry pages that are returned to the active queue
on account of a reference.
In collaboration with: markj
Reviewed by: kib
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8302
2016-11-09 18:48:37 +00:00
|
|
|
continue;
|
2018-04-24 21:15:54 +00:00
|
|
|
|
2019-09-16 15:04:45 +00:00
|
|
|
vm_page_change_lock(m, &mtx);
|
|
|
|
|
|
|
|
recheck:
|
2018-04-24 21:15:54 +00:00
|
|
|
/*
|
2019-09-16 15:04:45 +00:00
|
|
|
* The page may have been disassociated from the queue
|
|
|
|
* or even freed while locks were dropped. We thus must be
|
|
|
|
* careful whenever modifying page state. Once the object lock
|
|
|
|
* has been acquired, we have a stable reference to the page.
|
2018-04-24 21:15:54 +00:00
|
|
|
*/
|
2019-09-16 15:04:45 +00:00
|
|
|
if (vm_page_queue(m) != queue)
|
2018-04-24 21:15:54 +00:00
|
|
|
continue;
|
2019-09-16 15:04:45 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* A requeue was requested, so this page gets a second
|
|
|
|
* chance.
|
|
|
|
*/
|
|
|
|
if ((m->aflags & PGA_REQUEUE) != 0) {
|
2019-09-03 14:29:58 +00:00
|
|
|
vm_page_pqbatch_submit(m, queue);
|
Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty
pages, specificially, dirty pages that have passed once through the inactive
queue. A new, dedicated thread is responsible for both deciding when to
launder pages and actually laundering them. The new policy uses the
relative sizes of the inactive and laundry queues to determine whether to
launder pages at a given point in time. In general, this leads to more
intelligent swapping behavior, since the laundry thread will avoid pageouts
when the marginal benefit of doing so is low. Previously, without a
dedicated queue for dirty pages, the page daemon didn't have the information
to determine whether pageout provides any benefit to the system. Thus, the
previous policy often resulted in small but steadily increasing amounts of
swap usage when the system is under memory pressure, even when the inactive
queue consisted mostly of clean pages. This change addresses that issue,
and also paves the way for some future virtual memory system improvements by
removing the last source of object-cached clean pages, i.e., PG_CACHE pages.
The new laundry thread sleeps while waiting for a request from the page
daemon thread(s). A request is raised by setting the variable
vm_laundry_request and waking the laundry thread. We request launderings
for two reasons: to try and balance the inactive and laundry queue sizes
("background laundering"), and to quickly make up for a shortage of free
pages and clean inactive pages ("shortfall laundering"). When background
laundering is requested, the laundry thread computes the number of page
daemon wakeups that have taken place since the last laundering. If this
number is large enough relative to the ratio of the laundry and (global)
inactive queue sizes, we will launder vm_background_launder_target pages at
vm_background_launder_rate KB/s. Otherwise, the laundry thread goes back
to sleep without doing any work. When scanning the laundry queue during
background laundering, reactivated pages are counted towards the laundry
thread's target.
In contrast, shortfall laundering is requested when an inactive queue scan
fails to meet its target. In this case, the laundry thread attempts to
launder enough pages to meet v_free_target within 0.5s, which is the
inactive queue scan period.
A laundry request can be latched while another is currently being
serviced. In particular, a shortfall request will immediately preempt a
background laundering.
This change also redefines the meaning of vm_cnt.v_reactivated and removes
the functions vm_page_cache() and vm_page_try_to_cache(). The new meaning
of vm_cnt.v_reactivated now better reflects its name. It represents the
number of inactive or laundry pages that are returned to the active queue
on account of a reference.
In collaboration with: markj
Reviewed by: kib
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8302
2016-11-09 18:48:37 +00:00
|
|
|
continue;
|
|
|
|
}
|
2018-04-24 21:15:54 +00:00
|
|
|
|
2019-09-16 15:04:45 +00:00
|
|
|
/*
|
|
|
|
* Wired pages may not be freed. Complete their removal
|
|
|
|
* from the queue now to avoid needless revisits during
|
|
|
|
* future scans. This check is racy and must be reverified once
|
|
|
|
* we hold the object lock and have verified that the page
|
|
|
|
* is not busy.
|
|
|
|
*/
|
|
|
|
if (vm_page_wired(m)) {
|
|
|
|
vm_page_dequeue_deferred(m);
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
2018-04-24 21:15:54 +00:00
|
|
|
if (object != m->object) {
|
2019-02-17 16:35:19 +00:00
|
|
|
if (object != NULL)
|
2018-04-24 21:15:54 +00:00
|
|
|
VM_OBJECT_WUNLOCK(object);
|
2019-09-16 15:04:45 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* A page's object pointer may be set to NULL before
|
|
|
|
* the object lock is acquired.
|
|
|
|
*/
|
Change synchonization rules for vm_page reference counting.
There are several mechanisms by which a vm_page reference is held,
preventing the page from being freed back to the page allocator. In
particular, holding the page's object lock is sufficient to prevent the
page from being freed; holding the busy lock or a wiring is sufficent as
well. These references are protected by the page lock, which must
therefore be acquired for many per-page operations. This results in
false sharing since the page locks are external to the vm_page
structures themselves and each lock protects multiple structures.
Transition to using an atomically updated per-page reference counter.
The object's reference is counted using a flag bit in the counter. A
second flag bit is used to atomically block new references via
pmap_extract_and_hold() while removing managed mappings of a page.
Thus, the reference count of a page is guaranteed not to increase if the
page is unbusied, unmapped, and the object's write lock is held. As
a consequence of this, the page lock no longer protects a page's
identity; operations which move pages between objects are now
synchronized solely by the objects' locks.
The vm_page_wire() and vm_page_unwire() KPIs are changed. The former
requires that either the object lock or the busy lock is held. The
latter no longer has a return value and may free the page if it releases
the last reference to that page. vm_page_unwire_noq() behaves the same
as before; the caller is responsible for checking its return value and
freeing or enqueuing the page as appropriate. vm_page_wire_mapped() is
introduced for use in pmap_extract_and_hold(). It fails if the page is
concurrently being unmapped, typically triggering a fallback to the
fault handler. vm_page_wire() no longer requires the page lock and
vm_page_unwire() now internally acquires the page lock when releasing
the last wiring of a page (since the page lock still protects a page's
queue state). In particular, synchronization details are no longer
leaked into the caller.
The change excises the page lock from several frequently executed code
paths. In particular, vm_object_terminate() no longer bounces between
page locks as it releases an object's pages, and direct I/O and
sendfile(SF_NOCACHE) completions no longer require the page lock. In
these latter cases we now get linear scalability in the common scenario
where different threads are operating on different files.
__FreeBSD_version is bumped. The DRM ports have been updated to
accomodate the KPI changes.
Reviewed by: jeff (earlier version)
Tested by: gallatin (earlier version), pho
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20486
2019-09-09 21:32:42 +00:00
|
|
|
object = (vm_object_t)atomic_load_ptr(&m->object);
|
2019-09-16 15:04:45 +00:00
|
|
|
if (object != NULL && !VM_OBJECT_TRYWLOCK(object)) {
|
|
|
|
mtx_unlock(mtx);
|
|
|
|
/* Depends on type-stability. */
|
|
|
|
VM_OBJECT_WLOCK(object);
|
|
|
|
mtx_lock(mtx);
|
|
|
|
goto recheck;
|
2019-02-17 16:35:19 +00:00
|
|
|
}
|
Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty
pages, specificially, dirty pages that have passed once through the inactive
queue. A new, dedicated thread is responsible for both deciding when to
launder pages and actually laundering them. The new policy uses the
relative sizes of the inactive and laundry queues to determine whether to
launder pages at a given point in time. In general, this leads to more
intelligent swapping behavior, since the laundry thread will avoid pageouts
when the marginal benefit of doing so is low. Previously, without a
dedicated queue for dirty pages, the page daemon didn't have the information
to determine whether pageout provides any benefit to the system. Thus, the
previous policy often resulted in small but steadily increasing amounts of
swap usage when the system is under memory pressure, even when the inactive
queue consisted mostly of clean pages. This change addresses that issue,
and also paves the way for some future virtual memory system improvements by
removing the last source of object-cached clean pages, i.e., PG_CACHE pages.
The new laundry thread sleeps while waiting for a request from the page
daemon thread(s). A request is raised by setting the variable
vm_laundry_request and waking the laundry thread. We request launderings
for two reasons: to try and balance the inactive and laundry queue sizes
("background laundering"), and to quickly make up for a shortage of free
pages and clean inactive pages ("shortfall laundering"). When background
laundering is requested, the laundry thread computes the number of page
daemon wakeups that have taken place since the last laundering. If this
number is large enough relative to the ratio of the laundry and (global)
inactive queue sizes, we will launder vm_background_launder_target pages at
vm_background_launder_rate KB/s. Otherwise, the laundry thread goes back
to sleep without doing any work. When scanning the laundry queue during
background laundering, reactivated pages are counted towards the laundry
thread's target.
In contrast, shortfall laundering is requested when an inactive queue scan
fails to meet its target. In this case, the laundry thread attempts to
launder enough pages to meet v_free_target within 0.5s, which is the
inactive queue scan period.
A laundry request can be latched while another is currently being
serviced. In particular, a shortfall request will immediately preempt a
background laundering.
This change also redefines the meaning of vm_cnt.v_reactivated and removes
the functions vm_page_cache() and vm_page_try_to_cache(). The new meaning
of vm_cnt.v_reactivated now better reflects its name. It represents the
number of inactive or laundry pages that are returned to the active queue
on account of a reference.
In collaboration with: markj
Reviewed by: kib
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8302
2016-11-09 18:48:37 +00:00
|
|
|
}
|
2019-09-16 15:04:45 +00:00
|
|
|
if (__predict_false(m->object == NULL))
|
|
|
|
/*
|
|
|
|
* The page has been removed from its object.
|
|
|
|
*/
|
|
|
|
continue;
|
|
|
|
KASSERT(m->object == object, ("page %p does not belong to %p",
|
|
|
|
m, object));
|
Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty
pages, specificially, dirty pages that have passed once through the inactive
queue. A new, dedicated thread is responsible for both deciding when to
launder pages and actually laundering them. The new policy uses the
relative sizes of the inactive and laundry queues to determine whether to
launder pages at a given point in time. In general, this leads to more
intelligent swapping behavior, since the laundry thread will avoid pageouts
when the marginal benefit of doing so is low. Previously, without a
dedicated queue for dirty pages, the page daemon didn't have the information
to determine whether pageout provides any benefit to the system. Thus, the
previous policy often resulted in small but steadily increasing amounts of
swap usage when the system is under memory pressure, even when the inactive
queue consisted mostly of clean pages. This change addresses that issue,
and also paves the way for some future virtual memory system improvements by
removing the last source of object-cached clean pages, i.e., PG_CACHE pages.
The new laundry thread sleeps while waiting for a request from the page
daemon thread(s). A request is raised by setting the variable
vm_laundry_request and waking the laundry thread. We request launderings
for two reasons: to try and balance the inactive and laundry queue sizes
("background laundering"), and to quickly make up for a shortage of free
pages and clean inactive pages ("shortfall laundering"). When background
laundering is requested, the laundry thread computes the number of page
daemon wakeups that have taken place since the last laundering. If this
number is large enough relative to the ratio of the laundry and (global)
inactive queue sizes, we will launder vm_background_launder_target pages at
vm_background_launder_rate KB/s. Otherwise, the laundry thread goes back
to sleep without doing any work. When scanning the laundry queue during
background laundering, reactivated pages are counted towards the laundry
thread's target.
In contrast, shortfall laundering is requested when an inactive queue scan
fails to meet its target. In this case, the laundry thread attempts to
launder enough pages to meet v_free_target within 0.5s, which is the
inactive queue scan period.
A laundry request can be latched while another is currently being
serviced. In particular, a shortfall request will immediately preempt a
background laundering.
This change also redefines the meaning of vm_cnt.v_reactivated and removes
the functions vm_page_cache() and vm_page_try_to_cache(). The new meaning
of vm_cnt.v_reactivated now better reflects its name. It represents the
number of inactive or laundry pages that are returned to the active queue
on account of a reference.
In collaboration with: markj
Reviewed by: kib
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8302
2016-11-09 18:48:37 +00:00
|
|
|
|
2018-04-24 21:15:54 +00:00
|
|
|
if (vm_page_busied(m))
|
|
|
|
continue;
|
Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty
pages, specificially, dirty pages that have passed once through the inactive
queue. A new, dedicated thread is responsible for both deciding when to
launder pages and actually laundering them. The new policy uses the
relative sizes of the inactive and laundry queues to determine whether to
launder pages at a given point in time. In general, this leads to more
intelligent swapping behavior, since the laundry thread will avoid pageouts
when the marginal benefit of doing so is low. Previously, without a
dedicated queue for dirty pages, the page daemon didn't have the information
to determine whether pageout provides any benefit to the system. Thus, the
previous policy often resulted in small but steadily increasing amounts of
swap usage when the system is under memory pressure, even when the inactive
queue consisted mostly of clean pages. This change addresses that issue,
and also paves the way for some future virtual memory system improvements by
removing the last source of object-cached clean pages, i.e., PG_CACHE pages.
The new laundry thread sleeps while waiting for a request from the page
daemon thread(s). A request is raised by setting the variable
vm_laundry_request and waking the laundry thread. We request launderings
for two reasons: to try and balance the inactive and laundry queue sizes
("background laundering"), and to quickly make up for a shortage of free
pages and clean inactive pages ("shortfall laundering"). When background
laundering is requested, the laundry thread computes the number of page
daemon wakeups that have taken place since the last laundering. If this
number is large enough relative to the ratio of the laundry and (global)
inactive queue sizes, we will launder vm_background_launder_target pages at
vm_background_launder_rate KB/s. Otherwise, the laundry thread goes back
to sleep without doing any work. When scanning the laundry queue during
background laundering, reactivated pages are counted towards the laundry
thread's target.
In contrast, shortfall laundering is requested when an inactive queue scan
fails to meet its target. In this case, the laundry thread attempts to
launder enough pages to meet v_free_target within 0.5s, which is the
inactive queue scan period.
A laundry request can be latched while another is currently being
serviced. In particular, a shortfall request will immediately preempt a
background laundering.
This change also redefines the meaning of vm_cnt.v_reactivated and removes
the functions vm_page_cache() and vm_page_try_to_cache(). The new meaning
of vm_cnt.v_reactivated now better reflects its name. It represents the
number of inactive or laundry pages that are returned to the active queue
on account of a reference.
In collaboration with: markj
Reviewed by: kib
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8302
2016-11-09 18:48:37 +00:00
|
|
|
|
Change synchonization rules for vm_page reference counting.
There are several mechanisms by which a vm_page reference is held,
preventing the page from being freed back to the page allocator. In
particular, holding the page's object lock is sufficient to prevent the
page from being freed; holding the busy lock or a wiring is sufficent as
well. These references are protected by the page lock, which must
therefore be acquired for many per-page operations. This results in
false sharing since the page locks are external to the vm_page
structures themselves and each lock protects multiple structures.
Transition to using an atomically updated per-page reference counter.
The object's reference is counted using a flag bit in the counter. A
second flag bit is used to atomically block new references via
pmap_extract_and_hold() while removing managed mappings of a page.
Thus, the reference count of a page is guaranteed not to increase if the
page is unbusied, unmapped, and the object's write lock is held. As
a consequence of this, the page lock no longer protects a page's
identity; operations which move pages between objects are now
synchronized solely by the objects' locks.
The vm_page_wire() and vm_page_unwire() KPIs are changed. The former
requires that either the object lock or the busy lock is held. The
latter no longer has a return value and may free the page if it releases
the last reference to that page. vm_page_unwire_noq() behaves the same
as before; the caller is responsible for checking its return value and
freeing or enqueuing the page as appropriate. vm_page_wire_mapped() is
introduced for use in pmap_extract_and_hold(). It fails if the page is
concurrently being unmapped, typically triggering a fallback to the
fault handler. vm_page_wire() no longer requires the page lock and
vm_page_unwire() now internally acquires the page lock when releasing
the last wiring of a page (since the page lock still protects a page's
queue state). In particular, synchronization details are no longer
leaked into the caller.
The change excises the page lock from several frequently executed code
paths. In particular, vm_object_terminate() no longer bounces between
page locks as it releases an object's pages, and direct I/O and
sendfile(SF_NOCACHE) completions no longer require the page lock. In
these latter cases we now get linear scalability in the common scenario
where different threads are operating on different files.
__FreeBSD_version is bumped. The DRM ports have been updated to
accomodate the KPI changes.
Reviewed by: jeff (earlier version)
Tested by: gallatin (earlier version), pho
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20486
2019-09-09 21:32:42 +00:00
|
|
|
/*
|
2019-09-16 15:04:45 +00:00
|
|
|
* Re-check for wirings now that we hold the object lock and
|
|
|
|
* have verified that the page is unbusied. If the page is
|
|
|
|
* mapped, it may still be wired by pmap lookups. The call to
|
Change synchonization rules for vm_page reference counting.
There are several mechanisms by which a vm_page reference is held,
preventing the page from being freed back to the page allocator. In
particular, holding the page's object lock is sufficient to prevent the
page from being freed; holding the busy lock or a wiring is sufficent as
well. These references are protected by the page lock, which must
therefore be acquired for many per-page operations. This results in
false sharing since the page locks are external to the vm_page
structures themselves and each lock protects multiple structures.
Transition to using an atomically updated per-page reference counter.
The object's reference is counted using a flag bit in the counter. A
second flag bit is used to atomically block new references via
pmap_extract_and_hold() while removing managed mappings of a page.
Thus, the reference count of a page is guaranteed not to increase if the
page is unbusied, unmapped, and the object's write lock is held. As
a consequence of this, the page lock no longer protects a page's
identity; operations which move pages between objects are now
synchronized solely by the objects' locks.
The vm_page_wire() and vm_page_unwire() KPIs are changed. The former
requires that either the object lock or the busy lock is held. The
latter no longer has a return value and may free the page if it releases
the last reference to that page. vm_page_unwire_noq() behaves the same
as before; the caller is responsible for checking its return value and
freeing or enqueuing the page as appropriate. vm_page_wire_mapped() is
introduced for use in pmap_extract_and_hold(). It fails if the page is
concurrently being unmapped, typically triggering a fallback to the
fault handler. vm_page_wire() no longer requires the page lock and
vm_page_unwire() now internally acquires the page lock when releasing
the last wiring of a page (since the page lock still protects a page's
queue state). In particular, synchronization details are no longer
leaked into the caller.
The change excises the page lock from several frequently executed code
paths. In particular, vm_object_terminate() no longer bounces between
page locks as it releases an object's pages, and direct I/O and
sendfile(SF_NOCACHE) completions no longer require the page lock. In
these latter cases we now get linear scalability in the common scenario
where different threads are operating on different files.
__FreeBSD_version is bumped. The DRM ports have been updated to
accomodate the KPI changes.
Reviewed by: jeff (earlier version)
Tested by: gallatin (earlier version), pho
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20486
2019-09-09 21:32:42 +00:00
|
|
|
* vm_page_try_remove_all() below atomically checks for such
|
|
|
|
* wirings and removes mappings. If the page is unmapped, the
|
|
|
|
* wire count is guaranteed not to increase.
|
|
|
|
*/
|
|
|
|
if (__predict_false(vm_page_wired(m))) {
|
2019-09-16 15:04:45 +00:00
|
|
|
vm_page_dequeue_deferred(m);
|
Change synchonization rules for vm_page reference counting.
There are several mechanisms by which a vm_page reference is held,
preventing the page from being freed back to the page allocator. In
particular, holding the page's object lock is sufficient to prevent the
page from being freed; holding the busy lock or a wiring is sufficent as
well. These references are protected by the page lock, which must
therefore be acquired for many per-page operations. This results in
false sharing since the page locks are external to the vm_page
structures themselves and each lock protects multiple structures.
Transition to using an atomically updated per-page reference counter.
The object's reference is counted using a flag bit in the counter. A
second flag bit is used to atomically block new references via
pmap_extract_and_hold() while removing managed mappings of a page.
Thus, the reference count of a page is guaranteed not to increase if the
page is unbusied, unmapped, and the object's write lock is held. As
a consequence of this, the page lock no longer protects a page's
identity; operations which move pages between objects are now
synchronized solely by the objects' locks.
The vm_page_wire() and vm_page_unwire() KPIs are changed. The former
requires that either the object lock or the busy lock is held. The
latter no longer has a return value and may free the page if it releases
the last reference to that page. vm_page_unwire_noq() behaves the same
as before; the caller is responsible for checking its return value and
freeing or enqueuing the page as appropriate. vm_page_wire_mapped() is
introduced for use in pmap_extract_and_hold(). It fails if the page is
concurrently being unmapped, typically triggering a fallback to the
fault handler. vm_page_wire() no longer requires the page lock and
vm_page_unwire() now internally acquires the page lock when releasing
the last wiring of a page (since the page lock still protects a page's
queue state). In particular, synchronization details are no longer
leaked into the caller.
The change excises the page lock from several frequently executed code
paths. In particular, vm_object_terminate() no longer bounces between
page locks as it releases an object's pages, and direct I/O and
sendfile(SF_NOCACHE) completions no longer require the page lock. In
these latter cases we now get linear scalability in the common scenario
where different threads are operating on different files.
__FreeBSD_version is bumped. The DRM ports have been updated to
accomodate the KPI changes.
Reviewed by: jeff (earlier version)
Tested by: gallatin (earlier version), pho
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20486
2019-09-09 21:32:42 +00:00
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty
pages, specificially, dirty pages that have passed once through the inactive
queue. A new, dedicated thread is responsible for both deciding when to
launder pages and actually laundering them. The new policy uses the
relative sizes of the inactive and laundry queues to determine whether to
launder pages at a given point in time. In general, this leads to more
intelligent swapping behavior, since the laundry thread will avoid pageouts
when the marginal benefit of doing so is low. Previously, without a
dedicated queue for dirty pages, the page daemon didn't have the information
to determine whether pageout provides any benefit to the system. Thus, the
previous policy often resulted in small but steadily increasing amounts of
swap usage when the system is under memory pressure, even when the inactive
queue consisted mostly of clean pages. This change addresses that issue,
and also paves the way for some future virtual memory system improvements by
removing the last source of object-cached clean pages, i.e., PG_CACHE pages.
The new laundry thread sleeps while waiting for a request from the page
daemon thread(s). A request is raised by setting the variable
vm_laundry_request and waking the laundry thread. We request launderings
for two reasons: to try and balance the inactive and laundry queue sizes
("background laundering"), and to quickly make up for a shortage of free
pages and clean inactive pages ("shortfall laundering"). When background
laundering is requested, the laundry thread computes the number of page
daemon wakeups that have taken place since the last laundering. If this
number is large enough relative to the ratio of the laundry and (global)
inactive queue sizes, we will launder vm_background_launder_target pages at
vm_background_launder_rate KB/s. Otherwise, the laundry thread goes back
to sleep without doing any work. When scanning the laundry queue during
background laundering, reactivated pages are counted towards the laundry
thread's target.
In contrast, shortfall laundering is requested when an inactive queue scan
fails to meet its target. In this case, the laundry thread attempts to
launder enough pages to meet v_free_target within 0.5s, which is the
inactive queue scan period.
A laundry request can be latched while another is currently being
serviced. In particular, a shortfall request will immediately preempt a
background laundering.
This change also redefines the meaning of vm_cnt.v_reactivated and removes
the functions vm_page_cache() and vm_page_try_to_cache(). The new meaning
of vm_cnt.v_reactivated now better reflects its name. It represents the
number of inactive or laundry pages that are returned to the active queue
on account of a reference.
In collaboration with: markj
Reviewed by: kib
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8302
2016-11-09 18:48:37 +00:00
|
|
|
/*
|
|
|
|
* Invalid pages can be easily freed. They cannot be
|
|
|
|
* mapped; vm_page_free() asserts this.
|
|
|
|
*/
|
|
|
|
if (m->valid == 0)
|
|
|
|
goto free_page;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If the page has been referenced and the object is not dead,
|
|
|
|
* reactivate or requeue the page depending on whether the
|
|
|
|
* object is mapped.
|
2018-07-15 19:25:15 +00:00
|
|
|
*
|
|
|
|
* Test PGA_REFERENCED after calling pmap_ts_referenced() so
|
|
|
|
* that a reference from a concurrently destroyed mapping is
|
|
|
|
* observed here and now.
|
Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty
pages, specificially, dirty pages that have passed once through the inactive
queue. A new, dedicated thread is responsible for both deciding when to
launder pages and actually laundering them. The new policy uses the
relative sizes of the inactive and laundry queues to determine whether to
launder pages at a given point in time. In general, this leads to more
intelligent swapping behavior, since the laundry thread will avoid pageouts
when the marginal benefit of doing so is low. Previously, without a
dedicated queue for dirty pages, the page daemon didn't have the information
to determine whether pageout provides any benefit to the system. Thus, the
previous policy often resulted in small but steadily increasing amounts of
swap usage when the system is under memory pressure, even when the inactive
queue consisted mostly of clean pages. This change addresses that issue,
and also paves the way for some future virtual memory system improvements by
removing the last source of object-cached clean pages, i.e., PG_CACHE pages.
The new laundry thread sleeps while waiting for a request from the page
daemon thread(s). A request is raised by setting the variable
vm_laundry_request and waking the laundry thread. We request launderings
for two reasons: to try and balance the inactive and laundry queue sizes
("background laundering"), and to quickly make up for a shortage of free
pages and clean inactive pages ("shortfall laundering"). When background
laundering is requested, the laundry thread computes the number of page
daemon wakeups that have taken place since the last laundering. If this
number is large enough relative to the ratio of the laundry and (global)
inactive queue sizes, we will launder vm_background_launder_target pages at
vm_background_launder_rate KB/s. Otherwise, the laundry thread goes back
to sleep without doing any work. When scanning the laundry queue during
background laundering, reactivated pages are counted towards the laundry
thread's target.
In contrast, shortfall laundering is requested when an inactive queue scan
fails to meet its target. In this case, the laundry thread attempts to
launder enough pages to meet v_free_target within 0.5s, which is the
inactive queue scan period.
A laundry request can be latched while another is currently being
serviced. In particular, a shortfall request will immediately preempt a
background laundering.
This change also redefines the meaning of vm_cnt.v_reactivated and removes
the functions vm_page_cache() and vm_page_try_to_cache(). The new meaning
of vm_cnt.v_reactivated now better reflects its name. It represents the
number of inactive or laundry pages that are returned to the active queue
on account of a reference.
In collaboration with: markj
Reviewed by: kib
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8302
2016-11-09 18:48:37 +00:00
|
|
|
*/
|
2019-09-16 15:04:45 +00:00
|
|
|
if (object->ref_count != 0)
|
|
|
|
act_delta = pmap_ts_referenced(m);
|
|
|
|
else {
|
|
|
|
KASSERT(!pmap_page_is_mapped(m),
|
|
|
|
("page %p is mapped", m));
|
|
|
|
act_delta = 0;
|
|
|
|
}
|
|
|
|
if ((m->aflags & PGA_REFERENCED) != 0) {
|
|
|
|
vm_page_aflag_clear(m, PGA_REFERENCED);
|
|
|
|
act_delta++;
|
|
|
|
}
|
|
|
|
if (act_delta != 0) {
|
|
|
|
if (object->ref_count != 0) {
|
|
|
|
VM_CNT_INC(v_reactivated);
|
|
|
|
vm_page_activate(m);
|
Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty
pages, specificially, dirty pages that have passed once through the inactive
queue. A new, dedicated thread is responsible for both deciding when to
launder pages and actually laundering them. The new policy uses the
relative sizes of the inactive and laundry queues to determine whether to
launder pages at a given point in time. In general, this leads to more
intelligent swapping behavior, since the laundry thread will avoid pageouts
when the marginal benefit of doing so is low. Previously, without a
dedicated queue for dirty pages, the page daemon didn't have the information
to determine whether pageout provides any benefit to the system. Thus, the
previous policy often resulted in small but steadily increasing amounts of
swap usage when the system is under memory pressure, even when the inactive
queue consisted mostly of clean pages. This change addresses that issue,
and also paves the way for some future virtual memory system improvements by
removing the last source of object-cached clean pages, i.e., PG_CACHE pages.
The new laundry thread sleeps while waiting for a request from the page
daemon thread(s). A request is raised by setting the variable
vm_laundry_request and waking the laundry thread. We request launderings
for two reasons: to try and balance the inactive and laundry queue sizes
("background laundering"), and to quickly make up for a shortage of free
pages and clean inactive pages ("shortfall laundering"). When background
laundering is requested, the laundry thread computes the number of page
daemon wakeups that have taken place since the last laundering. If this
number is large enough relative to the ratio of the laundry and (global)
inactive queue sizes, we will launder vm_background_launder_target pages at
vm_background_launder_rate KB/s. Otherwise, the laundry thread goes back
to sleep without doing any work. When scanning the laundry queue during
background laundering, reactivated pages are counted towards the laundry
thread's target.
In contrast, shortfall laundering is requested when an inactive queue scan
fails to meet its target. In this case, the laundry thread attempts to
launder enough pages to meet v_free_target within 0.5s, which is the
inactive queue scan period.
A laundry request can be latched while another is currently being
serviced. In particular, a shortfall request will immediately preempt a
background laundering.
This change also redefines the meaning of vm_cnt.v_reactivated and removes
the functions vm_page_cache() and vm_page_try_to_cache(). The new meaning
of vm_cnt.v_reactivated now better reflects its name. It represents the
number of inactive or laundry pages that are returned to the active queue
on account of a reference.
In collaboration with: markj
Reviewed by: kib
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8302
2016-11-09 18:48:37 +00:00
|
|
|
|
2019-09-16 15:04:45 +00:00
|
|
|
/*
|
|
|
|
* Increase the activation count if the page
|
|
|
|
* was referenced while in the laundry queue.
|
|
|
|
* This makes it less likely that the page will
|
|
|
|
* be returned prematurely to the inactive
|
|
|
|
* queue.
|
|
|
|
*/
|
|
|
|
m->act_count += act_delta + ACT_ADVANCE;
|
2019-09-16 15:03:12 +00:00
|
|
|
|
2019-09-16 15:04:45 +00:00
|
|
|
/*
|
|
|
|
* If this was a background laundering, count
|
|
|
|
* activated pages towards our target. The
|
|
|
|
* purpose of background laundering is to ensure
|
|
|
|
* that pages are eventually cycled through the
|
|
|
|
* laundry queue, and an activation is a valid
|
|
|
|
* way out.
|
|
|
|
*/
|
|
|
|
if (!in_shortfall)
|
|
|
|
launder--;
|
|
|
|
continue;
|
|
|
|
} else if ((object->flags & OBJ_DEAD) == 0) {
|
|
|
|
vm_page_requeue(m);
|
|
|
|
continue;
|
2018-04-24 21:15:54 +00:00
|
|
|
}
|
Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty
pages, specificially, dirty pages that have passed once through the inactive
queue. A new, dedicated thread is responsible for both deciding when to
launder pages and actually laundering them. The new policy uses the
relative sizes of the inactive and laundry queues to determine whether to
launder pages at a given point in time. In general, this leads to more
intelligent swapping behavior, since the laundry thread will avoid pageouts
when the marginal benefit of doing so is low. Previously, without a
dedicated queue for dirty pages, the page daemon didn't have the information
to determine whether pageout provides any benefit to the system. Thus, the
previous policy often resulted in small but steadily increasing amounts of
swap usage when the system is under memory pressure, even when the inactive
queue consisted mostly of clean pages. This change addresses that issue,
and also paves the way for some future virtual memory system improvements by
removing the last source of object-cached clean pages, i.e., PG_CACHE pages.
The new laundry thread sleeps while waiting for a request from the page
daemon thread(s). A request is raised by setting the variable
vm_laundry_request and waking the laundry thread. We request launderings
for two reasons: to try and balance the inactive and laundry queue sizes
("background laundering"), and to quickly make up for a shortage of free
pages and clean inactive pages ("shortfall laundering"). When background
laundering is requested, the laundry thread computes the number of page
daemon wakeups that have taken place since the last laundering. If this
number is large enough relative to the ratio of the laundry and (global)
inactive queue sizes, we will launder vm_background_launder_target pages at
vm_background_launder_rate KB/s. Otherwise, the laundry thread goes back
to sleep without doing any work. When scanning the laundry queue during
background laundering, reactivated pages are counted towards the laundry
thread's target.
In contrast, shortfall laundering is requested when an inactive queue scan
fails to meet its target. In this case, the laundry thread attempts to
launder enough pages to meet v_free_target within 0.5s, which is the
inactive queue scan period.
A laundry request can be latched while another is currently being
serviced. In particular, a shortfall request will immediately preempt a
background laundering.
This change also redefines the meaning of vm_cnt.v_reactivated and removes
the functions vm_page_cache() and vm_page_try_to_cache(). The new meaning
of vm_cnt.v_reactivated now better reflects its name. It represents the
number of inactive or laundry pages that are returned to the active queue
on account of a reference.
In collaboration with: markj
Reviewed by: kib
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8302
2016-11-09 18:48:37 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If the page appears to be clean at the machine-independent
|
|
|
|
* layer, then remove all of its mappings from the pmap in
|
|
|
|
* anticipation of freeing it. If, however, any of the page's
|
|
|
|
* mappings allow write access, then the page may still be
|
|
|
|
* modified until the last of those mappings are removed.
|
|
|
|
*/
|
|
|
|
if (object->ref_count != 0) {
|
|
|
|
vm_page_test_dirty(m);
|
Change synchonization rules for vm_page reference counting.
There are several mechanisms by which a vm_page reference is held,
preventing the page from being freed back to the page allocator. In
particular, holding the page's object lock is sufficient to prevent the
page from being freed; holding the busy lock or a wiring is sufficent as
well. These references are protected by the page lock, which must
therefore be acquired for many per-page operations. This results in
false sharing since the page locks are external to the vm_page
structures themselves and each lock protects multiple structures.
Transition to using an atomically updated per-page reference counter.
The object's reference is counted using a flag bit in the counter. A
second flag bit is used to atomically block new references via
pmap_extract_and_hold() while removing managed mappings of a page.
Thus, the reference count of a page is guaranteed not to increase if the
page is unbusied, unmapped, and the object's write lock is held. As
a consequence of this, the page lock no longer protects a page's
identity; operations which move pages between objects are now
synchronized solely by the objects' locks.
The vm_page_wire() and vm_page_unwire() KPIs are changed. The former
requires that either the object lock or the busy lock is held. The
latter no longer has a return value and may free the page if it releases
the last reference to that page. vm_page_unwire_noq() behaves the same
as before; the caller is responsible for checking its return value and
freeing or enqueuing the page as appropriate. vm_page_wire_mapped() is
introduced for use in pmap_extract_and_hold(). It fails if the page is
concurrently being unmapped, typically triggering a fallback to the
fault handler. vm_page_wire() no longer requires the page lock and
vm_page_unwire() now internally acquires the page lock when releasing
the last wiring of a page (since the page lock still protects a page's
queue state). In particular, synchronization details are no longer
leaked into the caller.
The change excises the page lock from several frequently executed code
paths. In particular, vm_object_terminate() no longer bounces between
page locks as it releases an object's pages, and direct I/O and
sendfile(SF_NOCACHE) completions no longer require the page lock. In
these latter cases we now get linear scalability in the common scenario
where different threads are operating on different files.
__FreeBSD_version is bumped. The DRM ports have been updated to
accomodate the KPI changes.
Reviewed by: jeff (earlier version)
Tested by: gallatin (earlier version), pho
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20486
2019-09-09 21:32:42 +00:00
|
|
|
if (m->dirty == 0 && !vm_page_try_remove_all(m)) {
|
2019-09-16 15:04:45 +00:00
|
|
|
vm_page_dequeue_deferred(m);
|
Change synchonization rules for vm_page reference counting.
There are several mechanisms by which a vm_page reference is held,
preventing the page from being freed back to the page allocator. In
particular, holding the page's object lock is sufficient to prevent the
page from being freed; holding the busy lock or a wiring is sufficent as
well. These references are protected by the page lock, which must
therefore be acquired for many per-page operations. This results in
false sharing since the page locks are external to the vm_page
structures themselves and each lock protects multiple structures.
Transition to using an atomically updated per-page reference counter.
The object's reference is counted using a flag bit in the counter. A
second flag bit is used to atomically block new references via
pmap_extract_and_hold() while removing managed mappings of a page.
Thus, the reference count of a page is guaranteed not to increase if the
page is unbusied, unmapped, and the object's write lock is held. As
a consequence of this, the page lock no longer protects a page's
identity; operations which move pages between objects are now
synchronized solely by the objects' locks.
The vm_page_wire() and vm_page_unwire() KPIs are changed. The former
requires that either the object lock or the busy lock is held. The
latter no longer has a return value and may free the page if it releases
the last reference to that page. vm_page_unwire_noq() behaves the same
as before; the caller is responsible for checking its return value and
freeing or enqueuing the page as appropriate. vm_page_wire_mapped() is
introduced for use in pmap_extract_and_hold(). It fails if the page is
concurrently being unmapped, typically triggering a fallback to the
fault handler. vm_page_wire() no longer requires the page lock and
vm_page_unwire() now internally acquires the page lock when releasing
the last wiring of a page (since the page lock still protects a page's
queue state). In particular, synchronization details are no longer
leaked into the caller.
The change excises the page lock from several frequently executed code
paths. In particular, vm_object_terminate() no longer bounces between
page locks as it releases an object's pages, and direct I/O and
sendfile(SF_NOCACHE) completions no longer require the page lock. In
these latter cases we now get linear scalability in the common scenario
where different threads are operating on different files.
__FreeBSD_version is bumped. The DRM ports have been updated to
accomodate the KPI changes.
Reviewed by: jeff (earlier version)
Tested by: gallatin (earlier version), pho
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20486
2019-09-09 21:32:42 +00:00
|
|
|
continue;
|
|
|
|
}
|
Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty
pages, specificially, dirty pages that have passed once through the inactive
queue. A new, dedicated thread is responsible for both deciding when to
launder pages and actually laundering them. The new policy uses the
relative sizes of the inactive and laundry queues to determine whether to
launder pages at a given point in time. In general, this leads to more
intelligent swapping behavior, since the laundry thread will avoid pageouts
when the marginal benefit of doing so is low. Previously, without a
dedicated queue for dirty pages, the page daemon didn't have the information
to determine whether pageout provides any benefit to the system. Thus, the
previous policy often resulted in small but steadily increasing amounts of
swap usage when the system is under memory pressure, even when the inactive
queue consisted mostly of clean pages. This change addresses that issue,
and also paves the way for some future virtual memory system improvements by
removing the last source of object-cached clean pages, i.e., PG_CACHE pages.
The new laundry thread sleeps while waiting for a request from the page
daemon thread(s). A request is raised by setting the variable
vm_laundry_request and waking the laundry thread. We request launderings
for two reasons: to try and balance the inactive and laundry queue sizes
("background laundering"), and to quickly make up for a shortage of free
pages and clean inactive pages ("shortfall laundering"). When background
laundering is requested, the laundry thread computes the number of page
daemon wakeups that have taken place since the last laundering. If this
number is large enough relative to the ratio of the laundry and (global)
inactive queue sizes, we will launder vm_background_launder_target pages at
vm_background_launder_rate KB/s. Otherwise, the laundry thread goes back
to sleep without doing any work. When scanning the laundry queue during
background laundering, reactivated pages are counted towards the laundry
thread's target.
In contrast, shortfall laundering is requested when an inactive queue scan
fails to meet its target. In this case, the laundry thread attempts to
launder enough pages to meet v_free_target within 0.5s, which is the
inactive queue scan period.
A laundry request can be latched while another is currently being
serviced. In particular, a shortfall request will immediately preempt a
background laundering.
This change also redefines the meaning of vm_cnt.v_reactivated and removes
the functions vm_page_cache() and vm_page_try_to_cache(). The new meaning
of vm_cnt.v_reactivated now better reflects its name. It represents the
number of inactive or laundry pages that are returned to the active queue
on account of a reference.
In collaboration with: markj
Reviewed by: kib
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8302
2016-11-09 18:48:37 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Clean pages are freed, and dirty pages are paged out unless
|
|
|
|
* they belong to a dead object. Requeueing dirty pages from
|
|
|
|
* dead objects is pointless, as they are being paged out and
|
|
|
|
* freed by the thread that destroyed the object.
|
|
|
|
*/
|
|
|
|
if (m->dirty == 0) {
|
|
|
|
free_page:
|
|
|
|
vm_page_free(m);
|
- Remove 'struct vmmeter' from 'struct pcpu', leaving only global vmmeter
in place. To do per-cpu stats, convert all fields that previously were
maintained in the vmmeters that sit in pcpus to counter(9).
- Since some vmmeter stats may be touched at very early stages of boot,
before we have set up UMA and we can do counter_u64_alloc(), provide an
early counter mechanism:
o Leave one spare uint64_t in struct pcpu, named pc_early_dummy_counter.
o Point counter(9) fields of vmmeter to pcpu[0].pc_early_dummy_counter,
so that at early stages of boot, before counters are allocated we already
point to a counter that can be safely written to.
o For sparc64 that required a whole dummy pcpu[MAXCPU] array.
Further related changes:
- Don't include vmmeter.h into pcpu.h.
- vm.stats.vm.v_swappgsout and vm.stats.vm.v_swappgsin changed to 64-bit,
to match kernel representation.
- struct vmmeter hidden under _KERNEL, and only vmstat(1) is an exclusion.
This is based on benno@'s 4-year old patch:
https://lists.freebsd.org/pipermail/freebsd-arch/2013-July/014471.html
Reviewed by: kib, gallatin, marius, lidl
Differential Revision: https://reviews.freebsd.org/D10156
2017-04-17 17:34:47 +00:00
|
|
|
VM_CNT_INC(v_dfree);
|
Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty
pages, specificially, dirty pages that have passed once through the inactive
queue. A new, dedicated thread is responsible for both deciding when to
launder pages and actually laundering them. The new policy uses the
relative sizes of the inactive and laundry queues to determine whether to
launder pages at a given point in time. In general, this leads to more
intelligent swapping behavior, since the laundry thread will avoid pageouts
when the marginal benefit of doing so is low. Previously, without a
dedicated queue for dirty pages, the page daemon didn't have the information
to determine whether pageout provides any benefit to the system. Thus, the
previous policy often resulted in small but steadily increasing amounts of
swap usage when the system is under memory pressure, even when the inactive
queue consisted mostly of clean pages. This change addresses that issue,
and also paves the way for some future virtual memory system improvements by
removing the last source of object-cached clean pages, i.e., PG_CACHE pages.
The new laundry thread sleeps while waiting for a request from the page
daemon thread(s). A request is raised by setting the variable
vm_laundry_request and waking the laundry thread. We request launderings
for two reasons: to try and balance the inactive and laundry queue sizes
("background laundering"), and to quickly make up for a shortage of free
pages and clean inactive pages ("shortfall laundering"). When background
laundering is requested, the laundry thread computes the number of page
daemon wakeups that have taken place since the last laundering. If this
number is large enough relative to the ratio of the laundry and (global)
inactive queue sizes, we will launder vm_background_launder_target pages at
vm_background_launder_rate KB/s. Otherwise, the laundry thread goes back
to sleep without doing any work. When scanning the laundry queue during
background laundering, reactivated pages are counted towards the laundry
thread's target.
In contrast, shortfall laundering is requested when an inactive queue scan
fails to meet its target. In this case, the laundry thread attempts to
launder enough pages to meet v_free_target within 0.5s, which is the
inactive queue scan period.
A laundry request can be latched while another is currently being
serviced. In particular, a shortfall request will immediately preempt a
background laundering.
This change also redefines the meaning of vm_cnt.v_reactivated and removes
the functions vm_page_cache() and vm_page_try_to_cache(). The new meaning
of vm_cnt.v_reactivated now better reflects its name. It represents the
number of inactive or laundry pages that are returned to the active queue
on account of a reference.
In collaboration with: markj
Reviewed by: kib
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8302
2016-11-09 18:48:37 +00:00
|
|
|
} else if ((object->flags & OBJ_DEAD) == 0) {
|
|
|
|
if (object->type != OBJT_SWAP &&
|
|
|
|
object->type != OBJT_DEFAULT)
|
|
|
|
pageout_ok = true;
|
|
|
|
else if (disable_swap_pageouts)
|
|
|
|
pageout_ok = false;
|
|
|
|
else
|
|
|
|
pageout_ok = true;
|
|
|
|
if (!pageout_ok) {
|
2019-09-16 15:04:45 +00:00
|
|
|
vm_page_requeue(m);
|
2018-04-24 21:15:54 +00:00
|
|
|
continue;
|
Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty
pages, specificially, dirty pages that have passed once through the inactive
queue. A new, dedicated thread is responsible for both deciding when to
launder pages and actually laundering them. The new policy uses the
relative sizes of the inactive and laundry queues to determine whether to
launder pages at a given point in time. In general, this leads to more
intelligent swapping behavior, since the laundry thread will avoid pageouts
when the marginal benefit of doing so is low. Previously, without a
dedicated queue for dirty pages, the page daemon didn't have the information
to determine whether pageout provides any benefit to the system. Thus, the
previous policy often resulted in small but steadily increasing amounts of
swap usage when the system is under memory pressure, even when the inactive
queue consisted mostly of clean pages. This change addresses that issue,
and also paves the way for some future virtual memory system improvements by
removing the last source of object-cached clean pages, i.e., PG_CACHE pages.
The new laundry thread sleeps while waiting for a request from the page
daemon thread(s). A request is raised by setting the variable
vm_laundry_request and waking the laundry thread. We request launderings
for two reasons: to try and balance the inactive and laundry queue sizes
("background laundering"), and to quickly make up for a shortage of free
pages and clean inactive pages ("shortfall laundering"). When background
laundering is requested, the laundry thread computes the number of page
daemon wakeups that have taken place since the last laundering. If this
number is large enough relative to the ratio of the laundry and (global)
inactive queue sizes, we will launder vm_background_launder_target pages at
vm_background_launder_rate KB/s. Otherwise, the laundry thread goes back
to sleep without doing any work. When scanning the laundry queue during
background laundering, reactivated pages are counted towards the laundry
thread's target.
In contrast, shortfall laundering is requested when an inactive queue scan
fails to meet its target. In this case, the laundry thread attempts to
launder enough pages to meet v_free_target within 0.5s, which is the
inactive queue scan period.
A laundry request can be latched while another is currently being
serviced. In particular, a shortfall request will immediately preempt a
background laundering.
This change also redefines the meaning of vm_cnt.v_reactivated and removes
the functions vm_page_cache() and vm_page_try_to_cache(). The new meaning
of vm_cnt.v_reactivated now better reflects its name. It represents the
number of inactive or laundry pages that are returned to the active queue
on account of a reference.
In collaboration with: markj
Reviewed by: kib
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8302
2016-11-09 18:48:37 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Form a cluster with adjacent, dirty pages from the
|
|
|
|
* same object, and page out that entire cluster.
|
|
|
|
*
|
|
|
|
* The adjacent, dirty pages must also be in the
|
|
|
|
* laundry. However, their mappings are not checked
|
|
|
|
* for new references. Consequently, a recently
|
|
|
|
* referenced page may be paged out. However, that
|
|
|
|
* page will not be prematurely reclaimed. After page
|
|
|
|
* out, the page will be placed in the inactive queue,
|
|
|
|
* where any new references will be detected and the
|
|
|
|
* page reactivated.
|
|
|
|
*/
|
|
|
|
error = vm_pageout_clean(m, &numpagedout);
|
|
|
|
if (error == 0) {
|
|
|
|
launder -= numpagedout;
|
2018-04-24 21:15:54 +00:00
|
|
|
ss.scanned += numpagedout;
|
Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty
pages, specificially, dirty pages that have passed once through the inactive
queue. A new, dedicated thread is responsible for both deciding when to
launder pages and actually laundering them. The new policy uses the
relative sizes of the inactive and laundry queues to determine whether to
launder pages at a given point in time. In general, this leads to more
intelligent swapping behavior, since the laundry thread will avoid pageouts
when the marginal benefit of doing so is low. Previously, without a
dedicated queue for dirty pages, the page daemon didn't have the information
to determine whether pageout provides any benefit to the system. Thus, the
previous policy often resulted in small but steadily increasing amounts of
swap usage when the system is under memory pressure, even when the inactive
queue consisted mostly of clean pages. This change addresses that issue,
and also paves the way for some future virtual memory system improvements by
removing the last source of object-cached clean pages, i.e., PG_CACHE pages.
The new laundry thread sleeps while waiting for a request from the page
daemon thread(s). A request is raised by setting the variable
vm_laundry_request and waking the laundry thread. We request launderings
for two reasons: to try and balance the inactive and laundry queue sizes
("background laundering"), and to quickly make up for a shortage of free
pages and clean inactive pages ("shortfall laundering"). When background
laundering is requested, the laundry thread computes the number of page
daemon wakeups that have taken place since the last laundering. If this
number is large enough relative to the ratio of the laundry and (global)
inactive queue sizes, we will launder vm_background_launder_target pages at
vm_background_launder_rate KB/s. Otherwise, the laundry thread goes back
to sleep without doing any work. When scanning the laundry queue during
background laundering, reactivated pages are counted towards the laundry
thread's target.
In contrast, shortfall laundering is requested when an inactive queue scan
fails to meet its target. In this case, the laundry thread attempts to
launder enough pages to meet v_free_target within 0.5s, which is the
inactive queue scan period.
A laundry request can be latched while another is currently being
serviced. In particular, a shortfall request will immediately preempt a
background laundering.
This change also redefines the meaning of vm_cnt.v_reactivated and removes
the functions vm_page_cache() and vm_page_try_to_cache(). The new meaning
of vm_cnt.v_reactivated now better reflects its name. It represents the
number of inactive or laundry pages that are returned to the active queue
on account of a reference.
In collaboration with: markj
Reviewed by: kib
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8302
2016-11-09 18:48:37 +00:00
|
|
|
} else if (error == EDEADLK) {
|
|
|
|
pageout_lock_miss++;
|
|
|
|
vnodes_skipped++;
|
|
|
|
}
|
2019-09-16 15:04:45 +00:00
|
|
|
mtx = NULL;
|
2019-02-17 16:35:19 +00:00
|
|
|
object = NULL;
|
Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty
pages, specificially, dirty pages that have passed once through the inactive
queue. A new, dedicated thread is responsible for both deciding when to
launder pages and actually laundering them. The new policy uses the
relative sizes of the inactive and laundry queues to determine whether to
launder pages at a given point in time. In general, this leads to more
intelligent swapping behavior, since the laundry thread will avoid pageouts
when the marginal benefit of doing so is low. Previously, without a
dedicated queue for dirty pages, the page daemon didn't have the information
to determine whether pageout provides any benefit to the system. Thus, the
previous policy often resulted in small but steadily increasing amounts of
swap usage when the system is under memory pressure, even when the inactive
queue consisted mostly of clean pages. This change addresses that issue,
and also paves the way for some future virtual memory system improvements by
removing the last source of object-cached clean pages, i.e., PG_CACHE pages.
The new laundry thread sleeps while waiting for a request from the page
daemon thread(s). A request is raised by setting the variable
vm_laundry_request and waking the laundry thread. We request launderings
for two reasons: to try and balance the inactive and laundry queue sizes
("background laundering"), and to quickly make up for a shortage of free
pages and clean inactive pages ("shortfall laundering"). When background
laundering is requested, the laundry thread computes the number of page
daemon wakeups that have taken place since the last laundering. If this
number is large enough relative to the ratio of the laundry and (global)
inactive queue sizes, we will launder vm_background_launder_target pages at
vm_background_launder_rate KB/s. Otherwise, the laundry thread goes back
to sleep without doing any work. When scanning the laundry queue during
background laundering, reactivated pages are counted towards the laundry
thread's target.
In contrast, shortfall laundering is requested when an inactive queue scan
fails to meet its target. In this case, the laundry thread attempts to
launder enough pages to meet v_free_target within 0.5s, which is the
inactive queue scan period.
A laundry request can be latched while another is currently being
serviced. In particular, a shortfall request will immediately preempt a
background laundering.
This change also redefines the meaning of vm_cnt.v_reactivated and removes
the functions vm_page_cache() and vm_page_try_to_cache(). The new meaning
of vm_cnt.v_reactivated now better reflects its name. It represents the
number of inactive or laundry pages that are returned to the active queue
on account of a reference.
In collaboration with: markj
Reviewed by: kib
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8302
2016-11-09 18:48:37 +00:00
|
|
|
}
|
2019-09-16 15:04:45 +00:00
|
|
|
}
|
|
|
|
if (mtx != NULL) {
|
|
|
|
mtx_unlock(mtx);
|
|
|
|
mtx = NULL;
|
2019-02-21 15:44:32 +00:00
|
|
|
}
|
|
|
|
if (object != NULL) {
|
Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty
pages, specificially, dirty pages that have passed once through the inactive
queue. A new, dedicated thread is responsible for both deciding when to
launder pages and actually laundering them. The new policy uses the
relative sizes of the inactive and laundry queues to determine whether to
launder pages at a given point in time. In general, this leads to more
intelligent swapping behavior, since the laundry thread will avoid pageouts
when the marginal benefit of doing so is low. Previously, without a
dedicated queue for dirty pages, the page daemon didn't have the information
to determine whether pageout provides any benefit to the system. Thus, the
previous policy often resulted in small but steadily increasing amounts of
swap usage when the system is under memory pressure, even when the inactive
queue consisted mostly of clean pages. This change addresses that issue,
and also paves the way for some future virtual memory system improvements by
removing the last source of object-cached clean pages, i.e., PG_CACHE pages.
The new laundry thread sleeps while waiting for a request from the page
daemon thread(s). A request is raised by setting the variable
vm_laundry_request and waking the laundry thread. We request launderings
for two reasons: to try and balance the inactive and laundry queue sizes
("background laundering"), and to quickly make up for a shortage of free
pages and clean inactive pages ("shortfall laundering"). When background
laundering is requested, the laundry thread computes the number of page
daemon wakeups that have taken place since the last laundering. If this
number is large enough relative to the ratio of the laundry and (global)
inactive queue sizes, we will launder vm_background_launder_target pages at
vm_background_launder_rate KB/s. Otherwise, the laundry thread goes back
to sleep without doing any work. When scanning the laundry queue during
background laundering, reactivated pages are counted towards the laundry
thread's target.
In contrast, shortfall laundering is requested when an inactive queue scan
fails to meet its target. In this case, the laundry thread attempts to
launder enough pages to meet v_free_target within 0.5s, which is the
inactive queue scan period.
A laundry request can be latched while another is currently being
serviced. In particular, a shortfall request will immediately preempt a
background laundering.
This change also redefines the meaning of vm_cnt.v_reactivated and removes
the functions vm_page_cache() and vm_page_try_to_cache(). The new meaning
of vm_cnt.v_reactivated now better reflects its name. It represents the
number of inactive or laundry pages that are returned to the active queue
on account of a reference.
In collaboration with: markj
Reviewed by: kib
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8302
2016-11-09 18:48:37 +00:00
|
|
|
VM_OBJECT_WUNLOCK(object);
|
2019-02-21 15:44:32 +00:00
|
|
|
object = NULL;
|
|
|
|
}
|
2018-04-24 21:15:54 +00:00
|
|
|
vm_pagequeue_lock(pq);
|
|
|
|
vm_pageout_end_scan(&ss);
|
Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty
pages, specificially, dirty pages that have passed once through the inactive
queue. A new, dedicated thread is responsible for both deciding when to
launder pages and actually laundering them. The new policy uses the
relative sizes of the inactive and laundry queues to determine whether to
launder pages at a given point in time. In general, this leads to more
intelligent swapping behavior, since the laundry thread will avoid pageouts
when the marginal benefit of doing so is low. Previously, without a
dedicated queue for dirty pages, the page daemon didn't have the information
to determine whether pageout provides any benefit to the system. Thus, the
previous policy often resulted in small but steadily increasing amounts of
swap usage when the system is under memory pressure, even when the inactive
queue consisted mostly of clean pages. This change addresses that issue,
and also paves the way for some future virtual memory system improvements by
removing the last source of object-cached clean pages, i.e., PG_CACHE pages.
The new laundry thread sleeps while waiting for a request from the page
daemon thread(s). A request is raised by setting the variable
vm_laundry_request and waking the laundry thread. We request launderings
for two reasons: to try and balance the inactive and laundry queue sizes
("background laundering"), and to quickly make up for a shortage of free
pages and clean inactive pages ("shortfall laundering"). When background
laundering is requested, the laundry thread computes the number of page
daemon wakeups that have taken place since the last laundering. If this
number is large enough relative to the ratio of the laundry and (global)
inactive queue sizes, we will launder vm_background_launder_target pages at
vm_background_launder_rate KB/s. Otherwise, the laundry thread goes back
to sleep without doing any work. When scanning the laundry queue during
background laundering, reactivated pages are counted towards the laundry
thread's target.
In contrast, shortfall laundering is requested when an inactive queue scan
fails to meet its target. In this case, the laundry thread attempts to
launder enough pages to meet v_free_target within 0.5s, which is the
inactive queue scan period.
A laundry request can be latched while another is currently being
serviced. In particular, a shortfall request will immediately preempt a
background laundering.
This change also redefines the meaning of vm_cnt.v_reactivated and removes
the functions vm_page_cache() and vm_page_try_to_cache(). The new meaning
of vm_cnt.v_reactivated now better reflects its name. It represents the
number of inactive or laundry pages that are returned to the active queue
on account of a reference.
In collaboration with: markj
Reviewed by: kib
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8302
2016-11-09 18:48:37 +00:00
|
|
|
vm_pagequeue_unlock(pq);
|
|
|
|
|
2018-04-19 14:09:44 +00:00
|
|
|
if (launder > 0 && queue == PQ_UNSWAPPABLE) {
|
|
|
|
queue = PQ_LAUNDRY;
|
2017-01-03 00:05:44 +00:00
|
|
|
goto scan;
|
|
|
|
}
|
|
|
|
|
Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty
pages, specificially, dirty pages that have passed once through the inactive
queue. A new, dedicated thread is responsible for both deciding when to
launder pages and actually laundering them. The new policy uses the
relative sizes of the inactive and laundry queues to determine whether to
launder pages at a given point in time. In general, this leads to more
intelligent swapping behavior, since the laundry thread will avoid pageouts
when the marginal benefit of doing so is low. Previously, without a
dedicated queue for dirty pages, the page daemon didn't have the information
to determine whether pageout provides any benefit to the system. Thus, the
previous policy often resulted in small but steadily increasing amounts of
swap usage when the system is under memory pressure, even when the inactive
queue consisted mostly of clean pages. This change addresses that issue,
and also paves the way for some future virtual memory system improvements by
removing the last source of object-cached clean pages, i.e., PG_CACHE pages.
The new laundry thread sleeps while waiting for a request from the page
daemon thread(s). A request is raised by setting the variable
vm_laundry_request and waking the laundry thread. We request launderings
for two reasons: to try and balance the inactive and laundry queue sizes
("background laundering"), and to quickly make up for a shortage of free
pages and clean inactive pages ("shortfall laundering"). When background
laundering is requested, the laundry thread computes the number of page
daemon wakeups that have taken place since the last laundering. If this
number is large enough relative to the ratio of the laundry and (global)
inactive queue sizes, we will launder vm_background_launder_target pages at
vm_background_launder_rate KB/s. Otherwise, the laundry thread goes back
to sleep without doing any work. When scanning the laundry queue during
background laundering, reactivated pages are counted towards the laundry
thread's target.
In contrast, shortfall laundering is requested when an inactive queue scan
fails to meet its target. In this case, the laundry thread attempts to
launder enough pages to meet v_free_target within 0.5s, which is the
inactive queue scan period.
A laundry request can be latched while another is currently being
serviced. In particular, a shortfall request will immediately preempt a
background laundering.
This change also redefines the meaning of vm_cnt.v_reactivated and removes
the functions vm_page_cache() and vm_page_try_to_cache(). The new meaning
of vm_cnt.v_reactivated now better reflects its name. It represents the
number of inactive or laundry pages that are returned to the active queue
on account of a reference.
In collaboration with: markj
Reviewed by: kib
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8302
2016-11-09 18:48:37 +00:00
|
|
|
/*
|
|
|
|
* Wakeup the sync daemon if we skipped a vnode in a writeable object
|
|
|
|
* and we didn't launder enough pages.
|
|
|
|
*/
|
|
|
|
if (vnodes_skipped > 0 && launder > 0)
|
|
|
|
(void)speedup_syncer();
|
|
|
|
|
|
|
|
return (starting_target - launder);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Compute the integer square root.
|
|
|
|
*/
|
|
|
|
static u_int
|
|
|
|
isqrt(u_int num)
|
|
|
|
{
|
|
|
|
u_int bit, root, tmp;
|
|
|
|
|
2019-05-03 02:55:54 +00:00
|
|
|
bit = num != 0 ? (1u << ((fls(num) - 1) & ~1)) : 0;
|
Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty
pages, specificially, dirty pages that have passed once through the inactive
queue. A new, dedicated thread is responsible for both deciding when to
launder pages and actually laundering them. The new policy uses the
relative sizes of the inactive and laundry queues to determine whether to
launder pages at a given point in time. In general, this leads to more
intelligent swapping behavior, since the laundry thread will avoid pageouts
when the marginal benefit of doing so is low. Previously, without a
dedicated queue for dirty pages, the page daemon didn't have the information
to determine whether pageout provides any benefit to the system. Thus, the
previous policy often resulted in small but steadily increasing amounts of
swap usage when the system is under memory pressure, even when the inactive
queue consisted mostly of clean pages. This change addresses that issue,
and also paves the way for some future virtual memory system improvements by
removing the last source of object-cached clean pages, i.e., PG_CACHE pages.
The new laundry thread sleeps while waiting for a request from the page
daemon thread(s). A request is raised by setting the variable
vm_laundry_request and waking the laundry thread. We request launderings
for two reasons: to try and balance the inactive and laundry queue sizes
("background laundering"), and to quickly make up for a shortage of free
pages and clean inactive pages ("shortfall laundering"). When background
laundering is requested, the laundry thread computes the number of page
daemon wakeups that have taken place since the last laundering. If this
number is large enough relative to the ratio of the laundry and (global)
inactive queue sizes, we will launder vm_background_launder_target pages at
vm_background_launder_rate KB/s. Otherwise, the laundry thread goes back
to sleep without doing any work. When scanning the laundry queue during
background laundering, reactivated pages are counted towards the laundry
thread's target.
In contrast, shortfall laundering is requested when an inactive queue scan
fails to meet its target. In this case, the laundry thread attempts to
launder enough pages to meet v_free_target within 0.5s, which is the
inactive queue scan period.
A laundry request can be latched while another is currently being
serviced. In particular, a shortfall request will immediately preempt a
background laundering.
This change also redefines the meaning of vm_cnt.v_reactivated and removes
the functions vm_page_cache() and vm_page_try_to_cache(). The new meaning
of vm_cnt.v_reactivated now better reflects its name. It represents the
number of inactive or laundry pages that are returned to the active queue
on account of a reference.
In collaboration with: markj
Reviewed by: kib
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8302
2016-11-09 18:48:37 +00:00
|
|
|
root = 0;
|
|
|
|
while (bit != 0) {
|
|
|
|
tmp = root + bit;
|
|
|
|
root >>= 1;
|
|
|
|
if (num >= tmp) {
|
|
|
|
num -= tmp;
|
|
|
|
root += bit;
|
|
|
|
}
|
|
|
|
bit >>= 2;
|
|
|
|
}
|
|
|
|
return (root);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Perform the work of the laundry thread: periodically wake up and determine
|
|
|
|
* whether any pages need to be laundered. If so, determine the number of pages
|
|
|
|
* that need to be laundered, and launder them.
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
vm_pageout_laundry_worker(void *arg)
|
|
|
|
{
|
2018-02-06 22:10:07 +00:00
|
|
|
struct vm_domain *vmd;
|
Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty
pages, specificially, dirty pages that have passed once through the inactive
queue. A new, dedicated thread is responsible for both deciding when to
launder pages and actually laundering them. The new policy uses the
relative sizes of the inactive and laundry queues to determine whether to
launder pages at a given point in time. In general, this leads to more
intelligent swapping behavior, since the laundry thread will avoid pageouts
when the marginal benefit of doing so is low. Previously, without a
dedicated queue for dirty pages, the page daemon didn't have the information
to determine whether pageout provides any benefit to the system. Thus, the
previous policy often resulted in small but steadily increasing amounts of
swap usage when the system is under memory pressure, even when the inactive
queue consisted mostly of clean pages. This change addresses that issue,
and also paves the way for some future virtual memory system improvements by
removing the last source of object-cached clean pages, i.e., PG_CACHE pages.
The new laundry thread sleeps while waiting for a request from the page
daemon thread(s). A request is raised by setting the variable
vm_laundry_request and waking the laundry thread. We request launderings
for two reasons: to try and balance the inactive and laundry queue sizes
("background laundering"), and to quickly make up for a shortage of free
pages and clean inactive pages ("shortfall laundering"). When background
laundering is requested, the laundry thread computes the number of page
daemon wakeups that have taken place since the last laundering. If this
number is large enough relative to the ratio of the laundry and (global)
inactive queue sizes, we will launder vm_background_launder_target pages at
vm_background_launder_rate KB/s. Otherwise, the laundry thread goes back
to sleep without doing any work. When scanning the laundry queue during
background laundering, reactivated pages are counted towards the laundry
thread's target.
In contrast, shortfall laundering is requested when an inactive queue scan
fails to meet its target. In this case, the laundry thread attempts to
launder enough pages to meet v_free_target within 0.5s, which is the
inactive queue scan period.
A laundry request can be latched while another is currently being
serviced. In particular, a shortfall request will immediately preempt a
background laundering.
This change also redefines the meaning of vm_cnt.v_reactivated and removes
the functions vm_page_cache() and vm_page_try_to_cache(). The new meaning
of vm_cnt.v_reactivated now better reflects its name. It represents the
number of inactive or laundry pages that are returned to the active queue
on account of a reference.
In collaboration with: markj
Reviewed by: kib
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8302
2016-11-09 18:48:37 +00:00
|
|
|
struct vm_pagequeue *pq;
|
2018-03-29 14:27:40 +00:00
|
|
|
uint64_t nclean, ndirty, nfreed;
|
2018-02-06 22:10:07 +00:00
|
|
|
int domain, last_target, launder, shortfall, shortfall_cycle, target;
|
Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty
pages, specificially, dirty pages that have passed once through the inactive
queue. A new, dedicated thread is responsible for both deciding when to
launder pages and actually laundering them. The new policy uses the
relative sizes of the inactive and laundry queues to determine whether to
launder pages at a given point in time. In general, this leads to more
intelligent swapping behavior, since the laundry thread will avoid pageouts
when the marginal benefit of doing so is low. Previously, without a
dedicated queue for dirty pages, the page daemon didn't have the information
to determine whether pageout provides any benefit to the system. Thus, the
previous policy often resulted in small but steadily increasing amounts of
swap usage when the system is under memory pressure, even when the inactive
queue consisted mostly of clean pages. This change addresses that issue,
and also paves the way for some future virtual memory system improvements by
removing the last source of object-cached clean pages, i.e., PG_CACHE pages.
The new laundry thread sleeps while waiting for a request from the page
daemon thread(s). A request is raised by setting the variable
vm_laundry_request and waking the laundry thread. We request launderings
for two reasons: to try and balance the inactive and laundry queue sizes
("background laundering"), and to quickly make up for a shortage of free
pages and clean inactive pages ("shortfall laundering"). When background
laundering is requested, the laundry thread computes the number of page
daemon wakeups that have taken place since the last laundering. If this
number is large enough relative to the ratio of the laundry and (global)
inactive queue sizes, we will launder vm_background_launder_target pages at
vm_background_launder_rate KB/s. Otherwise, the laundry thread goes back
to sleep without doing any work. When scanning the laundry queue during
background laundering, reactivated pages are counted towards the laundry
thread's target.
In contrast, shortfall laundering is requested when an inactive queue scan
fails to meet its target. In this case, the laundry thread attempts to
launder enough pages to meet v_free_target within 0.5s, which is the
inactive queue scan period.
A laundry request can be latched while another is currently being
serviced. In particular, a shortfall request will immediately preempt a
background laundering.
This change also redefines the meaning of vm_cnt.v_reactivated and removes
the functions vm_page_cache() and vm_page_try_to_cache(). The new meaning
of vm_cnt.v_reactivated now better reflects its name. It represents the
number of inactive or laundry pages that are returned to the active queue
on account of a reference.
In collaboration with: markj
Reviewed by: kib
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8302
2016-11-09 18:48:37 +00:00
|
|
|
bool in_shortfall;
|
|
|
|
|
2018-02-06 22:10:07 +00:00
|
|
|
domain = (uintptr_t)arg;
|
|
|
|
vmd = VM_DOMAIN(domain);
|
|
|
|
pq = &vmd->vmd_pagequeues[PQ_LAUNDRY];
|
|
|
|
KASSERT(vmd->vmd_segs != 0, ("domain without segments"));
|
Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty
pages, specificially, dirty pages that have passed once through the inactive
queue. A new, dedicated thread is responsible for both deciding when to
launder pages and actually laundering them. The new policy uses the
relative sizes of the inactive and laundry queues to determine whether to
launder pages at a given point in time. In general, this leads to more
intelligent swapping behavior, since the laundry thread will avoid pageouts
when the marginal benefit of doing so is low. Previously, without a
dedicated queue for dirty pages, the page daemon didn't have the information
to determine whether pageout provides any benefit to the system. Thus, the
previous policy often resulted in small but steadily increasing amounts of
swap usage when the system is under memory pressure, even when the inactive
queue consisted mostly of clean pages. This change addresses that issue,
and also paves the way for some future virtual memory system improvements by
removing the last source of object-cached clean pages, i.e., PG_CACHE pages.
The new laundry thread sleeps while waiting for a request from the page
daemon thread(s). A request is raised by setting the variable
vm_laundry_request and waking the laundry thread. We request launderings
for two reasons: to try and balance the inactive and laundry queue sizes
("background laundering"), and to quickly make up for a shortage of free
pages and clean inactive pages ("shortfall laundering"). When background
laundering is requested, the laundry thread computes the number of page
daemon wakeups that have taken place since the last laundering. If this
number is large enough relative to the ratio of the laundry and (global)
inactive queue sizes, we will launder vm_background_launder_target pages at
vm_background_launder_rate KB/s. Otherwise, the laundry thread goes back
to sleep without doing any work. When scanning the laundry queue during
background laundering, reactivated pages are counted towards the laundry
thread's target.
In contrast, shortfall laundering is requested when an inactive queue scan
fails to meet its target. In this case, the laundry thread attempts to
launder enough pages to meet v_free_target within 0.5s, which is the
inactive queue scan period.
A laundry request can be latched while another is currently being
serviced. In particular, a shortfall request will immediately preempt a
background laundering.
This change also redefines the meaning of vm_cnt.v_reactivated and removes
the functions vm_page_cache() and vm_page_try_to_cache(). The new meaning
of vm_cnt.v_reactivated now better reflects its name. It represents the
number of inactive or laundry pages that are returned to the active queue
on account of a reference.
In collaboration with: markj
Reviewed by: kib
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8302
2016-11-09 18:48:37 +00:00
|
|
|
|
|
|
|
shortfall = 0;
|
|
|
|
in_shortfall = false;
|
|
|
|
shortfall_cycle = 0;
|
2018-11-06 02:52:54 +00:00
|
|
|
last_target = target = 0;
|
2018-03-29 14:27:40 +00:00
|
|
|
nfreed = 0;
|
Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty
pages, specificially, dirty pages that have passed once through the inactive
queue. A new, dedicated thread is responsible for both deciding when to
launder pages and actually laundering them. The new policy uses the
relative sizes of the inactive and laundry queues to determine whether to
launder pages at a given point in time. In general, this leads to more
intelligent swapping behavior, since the laundry thread will avoid pageouts
when the marginal benefit of doing so is low. Previously, without a
dedicated queue for dirty pages, the page daemon didn't have the information
to determine whether pageout provides any benefit to the system. Thus, the
previous policy often resulted in small but steadily increasing amounts of
swap usage when the system is under memory pressure, even when the inactive
queue consisted mostly of clean pages. This change addresses that issue,
and also paves the way for some future virtual memory system improvements by
removing the last source of object-cached clean pages, i.e., PG_CACHE pages.
The new laundry thread sleeps while waiting for a request from the page
daemon thread(s). A request is raised by setting the variable
vm_laundry_request and waking the laundry thread. We request launderings
for two reasons: to try and balance the inactive and laundry queue sizes
("background laundering"), and to quickly make up for a shortage of free
pages and clean inactive pages ("shortfall laundering"). When background
laundering is requested, the laundry thread computes the number of page
daemon wakeups that have taken place since the last laundering. If this
number is large enough relative to the ratio of the laundry and (global)
inactive queue sizes, we will launder vm_background_launder_target pages at
vm_background_launder_rate KB/s. Otherwise, the laundry thread goes back
to sleep without doing any work. When scanning the laundry queue during
background laundering, reactivated pages are counted towards the laundry
thread's target.
In contrast, shortfall laundering is requested when an inactive queue scan
fails to meet its target. In this case, the laundry thread attempts to
launder enough pages to meet v_free_target within 0.5s, which is the
inactive queue scan period.
A laundry request can be latched while another is currently being
serviced. In particular, a shortfall request will immediately preempt a
background laundering.
This change also redefines the meaning of vm_cnt.v_reactivated and removes
the functions vm_page_cache() and vm_page_try_to_cache(). The new meaning
of vm_cnt.v_reactivated now better reflects its name. It represents the
number of inactive or laundry pages that are returned to the active queue
on account of a reference.
In collaboration with: markj
Reviewed by: kib
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8302
2016-11-09 18:48:37 +00:00
|
|
|
|
2017-01-03 00:05:44 +00:00
|
|
|
/*
|
|
|
|
* Calls to these handlers are serialized by the swap syscall lock.
|
|
|
|
*/
|
2018-02-06 22:10:07 +00:00
|
|
|
(void)EVENTHANDLER_REGISTER(swapon, vm_pageout_swapon, vmd,
|
2017-01-03 00:05:44 +00:00
|
|
|
EVENTHANDLER_PRI_ANY);
|
2018-02-06 22:10:07 +00:00
|
|
|
(void)EVENTHANDLER_REGISTER(swapoff, vm_pageout_swapoff, vmd,
|
2017-01-03 00:05:44 +00:00
|
|
|
EVENTHANDLER_PRI_ANY);
|
|
|
|
|
Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty
pages, specificially, dirty pages that have passed once through the inactive
queue. A new, dedicated thread is responsible for both deciding when to
launder pages and actually laundering them. The new policy uses the
relative sizes of the inactive and laundry queues to determine whether to
launder pages at a given point in time. In general, this leads to more
intelligent swapping behavior, since the laundry thread will avoid pageouts
when the marginal benefit of doing so is low. Previously, without a
dedicated queue for dirty pages, the page daemon didn't have the information
to determine whether pageout provides any benefit to the system. Thus, the
previous policy often resulted in small but steadily increasing amounts of
swap usage when the system is under memory pressure, even when the inactive
queue consisted mostly of clean pages. This change addresses that issue,
and also paves the way for some future virtual memory system improvements by
removing the last source of object-cached clean pages, i.e., PG_CACHE pages.
The new laundry thread sleeps while waiting for a request from the page
daemon thread(s). A request is raised by setting the variable
vm_laundry_request and waking the laundry thread. We request launderings
for two reasons: to try and balance the inactive and laundry queue sizes
("background laundering"), and to quickly make up for a shortage of free
pages and clean inactive pages ("shortfall laundering"). When background
laundering is requested, the laundry thread computes the number of page
daemon wakeups that have taken place since the last laundering. If this
number is large enough relative to the ratio of the laundry and (global)
inactive queue sizes, we will launder vm_background_launder_target pages at
vm_background_launder_rate KB/s. Otherwise, the laundry thread goes back
to sleep without doing any work. When scanning the laundry queue during
background laundering, reactivated pages are counted towards the laundry
thread's target.
In contrast, shortfall laundering is requested when an inactive queue scan
fails to meet its target. In this case, the laundry thread attempts to
launder enough pages to meet v_free_target within 0.5s, which is the
inactive queue scan period.
A laundry request can be latched while another is currently being
serviced. In particular, a shortfall request will immediately preempt a
background laundering.
This change also redefines the meaning of vm_cnt.v_reactivated and removes
the functions vm_page_cache() and vm_page_try_to_cache(). The new meaning
of vm_cnt.v_reactivated now better reflects its name. It represents the
number of inactive or laundry pages that are returned to the active queue
on account of a reference.
In collaboration with: markj
Reviewed by: kib
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8302
2016-11-09 18:48:37 +00:00
|
|
|
/*
|
|
|
|
* The pageout laundry worker is never done, so loop forever.
|
|
|
|
*/
|
|
|
|
for (;;) {
|
|
|
|
KASSERT(target >= 0, ("negative target %d", target));
|
|
|
|
KASSERT(shortfall_cycle >= 0,
|
|
|
|
("negative cycle %d", shortfall_cycle));
|
|
|
|
launder = 0;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* First determine whether we need to launder pages to meet a
|
|
|
|
* shortage of free pages.
|
|
|
|
*/
|
|
|
|
if (shortfall > 0) {
|
|
|
|
in_shortfall = true;
|
|
|
|
shortfall_cycle = VM_LAUNDER_RATE / VM_INACT_SCAN_RATE;
|
|
|
|
target = shortfall;
|
|
|
|
} else if (!in_shortfall)
|
|
|
|
goto trybackground;
|
2018-02-06 22:10:07 +00:00
|
|
|
else if (shortfall_cycle == 0 || vm_laundry_target(vmd) <= 0) {
|
Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty
pages, specificially, dirty pages that have passed once through the inactive
queue. A new, dedicated thread is responsible for both deciding when to
launder pages and actually laundering them. The new policy uses the
relative sizes of the inactive and laundry queues to determine whether to
launder pages at a given point in time. In general, this leads to more
intelligent swapping behavior, since the laundry thread will avoid pageouts
when the marginal benefit of doing so is low. Previously, without a
dedicated queue for dirty pages, the page daemon didn't have the information
to determine whether pageout provides any benefit to the system. Thus, the
previous policy often resulted in small but steadily increasing amounts of
swap usage when the system is under memory pressure, even when the inactive
queue consisted mostly of clean pages. This change addresses that issue,
and also paves the way for some future virtual memory system improvements by
removing the last source of object-cached clean pages, i.e., PG_CACHE pages.
The new laundry thread sleeps while waiting for a request from the page
daemon thread(s). A request is raised by setting the variable
vm_laundry_request and waking the laundry thread. We request launderings
for two reasons: to try and balance the inactive and laundry queue sizes
("background laundering"), and to quickly make up for a shortage of free
pages and clean inactive pages ("shortfall laundering"). When background
laundering is requested, the laundry thread computes the number of page
daemon wakeups that have taken place since the last laundering. If this
number is large enough relative to the ratio of the laundry and (global)
inactive queue sizes, we will launder vm_background_launder_target pages at
vm_background_launder_rate KB/s. Otherwise, the laundry thread goes back
to sleep without doing any work. When scanning the laundry queue during
background laundering, reactivated pages are counted towards the laundry
thread's target.
In contrast, shortfall laundering is requested when an inactive queue scan
fails to meet its target. In this case, the laundry thread attempts to
launder enough pages to meet v_free_target within 0.5s, which is the
inactive queue scan period.
A laundry request can be latched while another is currently being
serviced. In particular, a shortfall request will immediately preempt a
background laundering.
This change also redefines the meaning of vm_cnt.v_reactivated and removes
the functions vm_page_cache() and vm_page_try_to_cache(). The new meaning
of vm_cnt.v_reactivated now better reflects its name. It represents the
number of inactive or laundry pages that are returned to the active queue
on account of a reference.
In collaboration with: markj
Reviewed by: kib
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8302
2016-11-09 18:48:37 +00:00
|
|
|
/*
|
|
|
|
* We recently entered shortfall and began laundering
|
|
|
|
* pages. If we have completed that laundering run
|
|
|
|
* (and we are no longer in shortfall) or we have met
|
|
|
|
* our laundry target through other activity, then we
|
|
|
|
* can stop laundering pages.
|
|
|
|
*/
|
|
|
|
in_shortfall = false;
|
|
|
|
target = 0;
|
|
|
|
goto trybackground;
|
|
|
|
}
|
|
|
|
launder = target / shortfall_cycle--;
|
|
|
|
goto dolaundry;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* There's no immediate need to launder any pages; see if we
|
|
|
|
* meet the conditions to perform background laundering:
|
|
|
|
*
|
|
|
|
* 1. The ratio of dirty to clean inactive pages exceeds the
|
2018-03-29 14:27:40 +00:00
|
|
|
* background laundering threshold, or
|
Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty
pages, specificially, dirty pages that have passed once through the inactive
queue. A new, dedicated thread is responsible for both deciding when to
launder pages and actually laundering them. The new policy uses the
relative sizes of the inactive and laundry queues to determine whether to
launder pages at a given point in time. In general, this leads to more
intelligent swapping behavior, since the laundry thread will avoid pageouts
when the marginal benefit of doing so is low. Previously, without a
dedicated queue for dirty pages, the page daemon didn't have the information
to determine whether pageout provides any benefit to the system. Thus, the
previous policy often resulted in small but steadily increasing amounts of
swap usage when the system is under memory pressure, even when the inactive
queue consisted mostly of clean pages. This change addresses that issue,
and also paves the way for some future virtual memory system improvements by
removing the last source of object-cached clean pages, i.e., PG_CACHE pages.
The new laundry thread sleeps while waiting for a request from the page
daemon thread(s). A request is raised by setting the variable
vm_laundry_request and waking the laundry thread. We request launderings
for two reasons: to try and balance the inactive and laundry queue sizes
("background laundering"), and to quickly make up for a shortage of free
pages and clean inactive pages ("shortfall laundering"). When background
laundering is requested, the laundry thread computes the number of page
daemon wakeups that have taken place since the last laundering. If this
number is large enough relative to the ratio of the laundry and (global)
inactive queue sizes, we will launder vm_background_launder_target pages at
vm_background_launder_rate KB/s. Otherwise, the laundry thread goes back
to sleep without doing any work. When scanning the laundry queue during
background laundering, reactivated pages are counted towards the laundry
thread's target.
In contrast, shortfall laundering is requested when an inactive queue scan
fails to meet its target. In this case, the laundry thread attempts to
launder enough pages to meet v_free_target within 0.5s, which is the
inactive queue scan period.
A laundry request can be latched while another is currently being
serviced. In particular, a shortfall request will immediately preempt a
background laundering.
This change also redefines the meaning of vm_cnt.v_reactivated and removes
the functions vm_page_cache() and vm_page_try_to_cache(). The new meaning
of vm_cnt.v_reactivated now better reflects its name. It represents the
number of inactive or laundry pages that are returned to the active queue
on account of a reference.
In collaboration with: markj
Reviewed by: kib
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8302
2016-11-09 18:48:37 +00:00
|
|
|
* 2. we haven't yet reached the target of the current
|
|
|
|
* background laundering run.
|
|
|
|
*
|
|
|
|
* The background laundering threshold is not a constant.
|
|
|
|
* Instead, it is a slowly growing function of the number of
|
2018-03-29 14:27:40 +00:00
|
|
|
* clean pages freed by the page daemon since the last
|
|
|
|
* background laundering. Thus, as the ratio of dirty to
|
|
|
|
* clean inactive pages grows, the amount of memory pressure
|
2018-04-02 15:07:41 +00:00
|
|
|
* required to trigger laundering decreases. We ensure
|
|
|
|
* that the threshold is non-zero after an inactive queue
|
|
|
|
* scan, even if that scan failed to free a single clean page.
|
Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty
pages, specificially, dirty pages that have passed once through the inactive
queue. A new, dedicated thread is responsible for both deciding when to
launder pages and actually laundering them. The new policy uses the
relative sizes of the inactive and laundry queues to determine whether to
launder pages at a given point in time. In general, this leads to more
intelligent swapping behavior, since the laundry thread will avoid pageouts
when the marginal benefit of doing so is low. Previously, without a
dedicated queue for dirty pages, the page daemon didn't have the information
to determine whether pageout provides any benefit to the system. Thus, the
previous policy often resulted in small but steadily increasing amounts of
swap usage when the system is under memory pressure, even when the inactive
queue consisted mostly of clean pages. This change addresses that issue,
and also paves the way for some future virtual memory system improvements by
removing the last source of object-cached clean pages, i.e., PG_CACHE pages.
The new laundry thread sleeps while waiting for a request from the page
daemon thread(s). A request is raised by setting the variable
vm_laundry_request and waking the laundry thread. We request launderings
for two reasons: to try and balance the inactive and laundry queue sizes
("background laundering"), and to quickly make up for a shortage of free
pages and clean inactive pages ("shortfall laundering"). When background
laundering is requested, the laundry thread computes the number of page
daemon wakeups that have taken place since the last laundering. If this
number is large enough relative to the ratio of the laundry and (global)
inactive queue sizes, we will launder vm_background_launder_target pages at
vm_background_launder_rate KB/s. Otherwise, the laundry thread goes back
to sleep without doing any work. When scanning the laundry queue during
background laundering, reactivated pages are counted towards the laundry
thread's target.
In contrast, shortfall laundering is requested when an inactive queue scan
fails to meet its target. In this case, the laundry thread attempts to
launder enough pages to meet v_free_target within 0.5s, which is the
inactive queue scan period.
A laundry request can be latched while another is currently being
serviced. In particular, a shortfall request will immediately preempt a
background laundering.
This change also redefines the meaning of vm_cnt.v_reactivated and removes
the functions vm_page_cache() and vm_page_try_to_cache(). The new meaning
of vm_cnt.v_reactivated now better reflects its name. It represents the
number of inactive or laundry pages that are returned to the active queue
on account of a reference.
In collaboration with: markj
Reviewed by: kib
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8302
2016-11-09 18:48:37 +00:00
|
|
|
*/
|
|
|
|
trybackground:
|
2018-02-06 22:10:07 +00:00
|
|
|
nclean = vmd->vmd_free_count +
|
|
|
|
vmd->vmd_pagequeues[PQ_INACTIVE].pq_cnt;
|
|
|
|
ndirty = vmd->vmd_pagequeues[PQ_LAUNDRY].pq_cnt;
|
2018-04-02 15:07:41 +00:00
|
|
|
if (target == 0 && ndirty * isqrt(howmany(nfreed + 1,
|
|
|
|
vmd->vmd_free_target - vmd->vmd_free_min)) >= nclean) {
|
2018-02-06 22:10:07 +00:00
|
|
|
target = vmd->vmd_background_launder_target;
|
Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty
pages, specificially, dirty pages that have passed once through the inactive
queue. A new, dedicated thread is responsible for both deciding when to
launder pages and actually laundering them. The new policy uses the
relative sizes of the inactive and laundry queues to determine whether to
launder pages at a given point in time. In general, this leads to more
intelligent swapping behavior, since the laundry thread will avoid pageouts
when the marginal benefit of doing so is low. Previously, without a
dedicated queue for dirty pages, the page daemon didn't have the information
to determine whether pageout provides any benefit to the system. Thus, the
previous policy often resulted in small but steadily increasing amounts of
swap usage when the system is under memory pressure, even when the inactive
queue consisted mostly of clean pages. This change addresses that issue,
and also paves the way for some future virtual memory system improvements by
removing the last source of object-cached clean pages, i.e., PG_CACHE pages.
The new laundry thread sleeps while waiting for a request from the page
daemon thread(s). A request is raised by setting the variable
vm_laundry_request and waking the laundry thread. We request launderings
for two reasons: to try and balance the inactive and laundry queue sizes
("background laundering"), and to quickly make up for a shortage of free
pages and clean inactive pages ("shortfall laundering"). When background
laundering is requested, the laundry thread computes the number of page
daemon wakeups that have taken place since the last laundering. If this
number is large enough relative to the ratio of the laundry and (global)
inactive queue sizes, we will launder vm_background_launder_target pages at
vm_background_launder_rate KB/s. Otherwise, the laundry thread goes back
to sleep without doing any work. When scanning the laundry queue during
background laundering, reactivated pages are counted towards the laundry
thread's target.
In contrast, shortfall laundering is requested when an inactive queue scan
fails to meet its target. In this case, the laundry thread attempts to
launder enough pages to meet v_free_target within 0.5s, which is the
inactive queue scan period.
A laundry request can be latched while another is currently being
serviced. In particular, a shortfall request will immediately preempt a
background laundering.
This change also redefines the meaning of vm_cnt.v_reactivated and removes
the functions vm_page_cache() and vm_page_try_to_cache(). The new meaning
of vm_cnt.v_reactivated now better reflects its name. It represents the
number of inactive or laundry pages that are returned to the active queue
on account of a reference.
In collaboration with: markj
Reviewed by: kib
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8302
2016-11-09 18:48:37 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We have a non-zero background laundering target. If we've
|
|
|
|
* laundered up to our maximum without observing a page daemon
|
2017-12-11 15:33:24 +00:00
|
|
|
* request, just stop. This is a safety belt that ensures we
|
Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty
pages, specificially, dirty pages that have passed once through the inactive
queue. A new, dedicated thread is responsible for both deciding when to
launder pages and actually laundering them. The new policy uses the
relative sizes of the inactive and laundry queues to determine whether to
launder pages at a given point in time. In general, this leads to more
intelligent swapping behavior, since the laundry thread will avoid pageouts
when the marginal benefit of doing so is low. Previously, without a
dedicated queue for dirty pages, the page daemon didn't have the information
to determine whether pageout provides any benefit to the system. Thus, the
previous policy often resulted in small but steadily increasing amounts of
swap usage when the system is under memory pressure, even when the inactive
queue consisted mostly of clean pages. This change addresses that issue,
and also paves the way for some future virtual memory system improvements by
removing the last source of object-cached clean pages, i.e., PG_CACHE pages.
The new laundry thread sleeps while waiting for a request from the page
daemon thread(s). A request is raised by setting the variable
vm_laundry_request and waking the laundry thread. We request launderings
for two reasons: to try and balance the inactive and laundry queue sizes
("background laundering"), and to quickly make up for a shortage of free
pages and clean inactive pages ("shortfall laundering"). When background
laundering is requested, the laundry thread computes the number of page
daemon wakeups that have taken place since the last laundering. If this
number is large enough relative to the ratio of the laundry and (global)
inactive queue sizes, we will launder vm_background_launder_target pages at
vm_background_launder_rate KB/s. Otherwise, the laundry thread goes back
to sleep without doing any work. When scanning the laundry queue during
background laundering, reactivated pages are counted towards the laundry
thread's target.
In contrast, shortfall laundering is requested when an inactive queue scan
fails to meet its target. In this case, the laundry thread attempts to
launder enough pages to meet v_free_target within 0.5s, which is the
inactive queue scan period.
A laundry request can be latched while another is currently being
serviced. In particular, a shortfall request will immediately preempt a
background laundering.
This change also redefines the meaning of vm_cnt.v_reactivated and removes
the functions vm_page_cache() and vm_page_try_to_cache(). The new meaning
of vm_cnt.v_reactivated now better reflects its name. It represents the
number of inactive or laundry pages that are returned to the active queue
on account of a reference.
In collaboration with: markj
Reviewed by: kib
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8302
2016-11-09 18:48:37 +00:00
|
|
|
* don't launder an excessive amount if memory pressure is low
|
|
|
|
* and the ratio of dirty to clean pages is large. Otherwise,
|
|
|
|
* proceed at the background laundering rate.
|
|
|
|
*/
|
|
|
|
if (target > 0) {
|
2018-03-29 14:27:40 +00:00
|
|
|
if (nfreed > 0) {
|
|
|
|
nfreed = 0;
|
Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty
pages, specificially, dirty pages that have passed once through the inactive
queue. A new, dedicated thread is responsible for both deciding when to
launder pages and actually laundering them. The new policy uses the
relative sizes of the inactive and laundry queues to determine whether to
launder pages at a given point in time. In general, this leads to more
intelligent swapping behavior, since the laundry thread will avoid pageouts
when the marginal benefit of doing so is low. Previously, without a
dedicated queue for dirty pages, the page daemon didn't have the information
to determine whether pageout provides any benefit to the system. Thus, the
previous policy often resulted in small but steadily increasing amounts of
swap usage when the system is under memory pressure, even when the inactive
queue consisted mostly of clean pages. This change addresses that issue,
and also paves the way for some future virtual memory system improvements by
removing the last source of object-cached clean pages, i.e., PG_CACHE pages.
The new laundry thread sleeps while waiting for a request from the page
daemon thread(s). A request is raised by setting the variable
vm_laundry_request and waking the laundry thread. We request launderings
for two reasons: to try and balance the inactive and laundry queue sizes
("background laundering"), and to quickly make up for a shortage of free
pages and clean inactive pages ("shortfall laundering"). When background
laundering is requested, the laundry thread computes the number of page
daemon wakeups that have taken place since the last laundering. If this
number is large enough relative to the ratio of the laundry and (global)
inactive queue sizes, we will launder vm_background_launder_target pages at
vm_background_launder_rate KB/s. Otherwise, the laundry thread goes back
to sleep without doing any work. When scanning the laundry queue during
background laundering, reactivated pages are counted towards the laundry
thread's target.
In contrast, shortfall laundering is requested when an inactive queue scan
fails to meet its target. In this case, the laundry thread attempts to
launder enough pages to meet v_free_target within 0.5s, which is the
inactive queue scan period.
A laundry request can be latched while another is currently being
serviced. In particular, a shortfall request will immediately preempt a
background laundering.
This change also redefines the meaning of vm_cnt.v_reactivated and removes
the functions vm_page_cache() and vm_page_try_to_cache(). The new meaning
of vm_cnt.v_reactivated now better reflects its name. It represents the
number of inactive or laundry pages that are returned to the active queue
on account of a reference.
In collaboration with: markj
Reviewed by: kib
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8302
2016-11-09 18:48:37 +00:00
|
|
|
last_target = target;
|
|
|
|
} else if (last_target - target >=
|
|
|
|
vm_background_launder_max * PAGE_SIZE / 1024) {
|
|
|
|
target = 0;
|
|
|
|
}
|
|
|
|
launder = vm_background_launder_rate * PAGE_SIZE / 1024;
|
|
|
|
launder /= VM_LAUNDER_RATE;
|
|
|
|
if (launder > target)
|
|
|
|
launder = target;
|
|
|
|
}
|
|
|
|
|
|
|
|
dolaundry:
|
|
|
|
if (launder > 0) {
|
|
|
|
/*
|
|
|
|
* Because of I/O clustering, the number of laundered
|
|
|
|
* pages could exceed "target" by the maximum size of
|
|
|
|
* a cluster minus one.
|
|
|
|
*/
|
2018-02-06 22:10:07 +00:00
|
|
|
target -= min(vm_pageout_launder(vmd, launder,
|
Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty
pages, specificially, dirty pages that have passed once through the inactive
queue. A new, dedicated thread is responsible for both deciding when to
launder pages and actually laundering them. The new policy uses the
relative sizes of the inactive and laundry queues to determine whether to
launder pages at a given point in time. In general, this leads to more
intelligent swapping behavior, since the laundry thread will avoid pageouts
when the marginal benefit of doing so is low. Previously, without a
dedicated queue for dirty pages, the page daemon didn't have the information
to determine whether pageout provides any benefit to the system. Thus, the
previous policy often resulted in small but steadily increasing amounts of
swap usage when the system is under memory pressure, even when the inactive
queue consisted mostly of clean pages. This change addresses that issue,
and also paves the way for some future virtual memory system improvements by
removing the last source of object-cached clean pages, i.e., PG_CACHE pages.
The new laundry thread sleeps while waiting for a request from the page
daemon thread(s). A request is raised by setting the variable
vm_laundry_request and waking the laundry thread. We request launderings
for two reasons: to try and balance the inactive and laundry queue sizes
("background laundering"), and to quickly make up for a shortage of free
pages and clean inactive pages ("shortfall laundering"). When background
laundering is requested, the laundry thread computes the number of page
daemon wakeups that have taken place since the last laundering. If this
number is large enough relative to the ratio of the laundry and (global)
inactive queue sizes, we will launder vm_background_launder_target pages at
vm_background_launder_rate KB/s. Otherwise, the laundry thread goes back
to sleep without doing any work. When scanning the laundry queue during
background laundering, reactivated pages are counted towards the laundry
thread's target.
In contrast, shortfall laundering is requested when an inactive queue scan
fails to meet its target. In this case, the laundry thread attempts to
launder enough pages to meet v_free_target within 0.5s, which is the
inactive queue scan period.
A laundry request can be latched while another is currently being
serviced. In particular, a shortfall request will immediately preempt a
background laundering.
This change also redefines the meaning of vm_cnt.v_reactivated and removes
the functions vm_page_cache() and vm_page_try_to_cache(). The new meaning
of vm_cnt.v_reactivated now better reflects its name. It represents the
number of inactive or laundry pages that are returned to the active queue
on account of a reference.
In collaboration with: markj
Reviewed by: kib
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8302
2016-11-09 18:48:37 +00:00
|
|
|
in_shortfall), target);
|
|
|
|
pause("laundp", hz / VM_LAUNDER_RATE);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If we're not currently laundering pages and the page daemon
|
|
|
|
* hasn't posted a new request, sleep until the page daemon
|
|
|
|
* kicks us.
|
|
|
|
*/
|
|
|
|
vm_pagequeue_lock(pq);
|
2018-02-06 22:10:07 +00:00
|
|
|
if (target == 0 && vmd->vmd_laundry_request == VM_LAUNDRY_IDLE)
|
|
|
|
(void)mtx_sleep(&vmd->vmd_laundry_request,
|
Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty
pages, specificially, dirty pages that have passed once through the inactive
queue. A new, dedicated thread is responsible for both deciding when to
launder pages and actually laundering them. The new policy uses the
relative sizes of the inactive and laundry queues to determine whether to
launder pages at a given point in time. In general, this leads to more
intelligent swapping behavior, since the laundry thread will avoid pageouts
when the marginal benefit of doing so is low. Previously, without a
dedicated queue for dirty pages, the page daemon didn't have the information
to determine whether pageout provides any benefit to the system. Thus, the
previous policy often resulted in small but steadily increasing amounts of
swap usage when the system is under memory pressure, even when the inactive
queue consisted mostly of clean pages. This change addresses that issue,
and also paves the way for some future virtual memory system improvements by
removing the last source of object-cached clean pages, i.e., PG_CACHE pages.
The new laundry thread sleeps while waiting for a request from the page
daemon thread(s). A request is raised by setting the variable
vm_laundry_request and waking the laundry thread. We request launderings
for two reasons: to try and balance the inactive and laundry queue sizes
("background laundering"), and to quickly make up for a shortage of free
pages and clean inactive pages ("shortfall laundering"). When background
laundering is requested, the laundry thread computes the number of page
daemon wakeups that have taken place since the last laundering. If this
number is large enough relative to the ratio of the laundry and (global)
inactive queue sizes, we will launder vm_background_launder_target pages at
vm_background_launder_rate KB/s. Otherwise, the laundry thread goes back
to sleep without doing any work. When scanning the laundry queue during
background laundering, reactivated pages are counted towards the laundry
thread's target.
In contrast, shortfall laundering is requested when an inactive queue scan
fails to meet its target. In this case, the laundry thread attempts to
launder enough pages to meet v_free_target within 0.5s, which is the
inactive queue scan period.
A laundry request can be latched while another is currently being
serviced. In particular, a shortfall request will immediately preempt a
background laundering.
This change also redefines the meaning of vm_cnt.v_reactivated and removes
the functions vm_page_cache() and vm_page_try_to_cache(). The new meaning
of vm_cnt.v_reactivated now better reflects its name. It represents the
number of inactive or laundry pages that are returned to the active queue
on account of a reference.
In collaboration with: markj
Reviewed by: kib
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8302
2016-11-09 18:48:37 +00:00
|
|
|
vm_pagequeue_lockptr(pq), PVM, "launds", 0);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If the pagedaemon has indicated that it's in shortfall, start
|
|
|
|
* a shortfall laundering unless we're already in the middle of
|
|
|
|
* one. This may preempt a background laundering.
|
|
|
|
*/
|
2018-02-06 22:10:07 +00:00
|
|
|
if (vmd->vmd_laundry_request == VM_LAUNDRY_SHORTFALL &&
|
Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty
pages, specificially, dirty pages that have passed once through the inactive
queue. A new, dedicated thread is responsible for both deciding when to
launder pages and actually laundering them. The new policy uses the
relative sizes of the inactive and laundry queues to determine whether to
launder pages at a given point in time. In general, this leads to more
intelligent swapping behavior, since the laundry thread will avoid pageouts
when the marginal benefit of doing so is low. Previously, without a
dedicated queue for dirty pages, the page daemon didn't have the information
to determine whether pageout provides any benefit to the system. Thus, the
previous policy often resulted in small but steadily increasing amounts of
swap usage when the system is under memory pressure, even when the inactive
queue consisted mostly of clean pages. This change addresses that issue,
and also paves the way for some future virtual memory system improvements by
removing the last source of object-cached clean pages, i.e., PG_CACHE pages.
The new laundry thread sleeps while waiting for a request from the page
daemon thread(s). A request is raised by setting the variable
vm_laundry_request and waking the laundry thread. We request launderings
for two reasons: to try and balance the inactive and laundry queue sizes
("background laundering"), and to quickly make up for a shortage of free
pages and clean inactive pages ("shortfall laundering"). When background
laundering is requested, the laundry thread computes the number of page
daemon wakeups that have taken place since the last laundering. If this
number is large enough relative to the ratio of the laundry and (global)
inactive queue sizes, we will launder vm_background_launder_target pages at
vm_background_launder_rate KB/s. Otherwise, the laundry thread goes back
to sleep without doing any work. When scanning the laundry queue during
background laundering, reactivated pages are counted towards the laundry
thread's target.
In contrast, shortfall laundering is requested when an inactive queue scan
fails to meet its target. In this case, the laundry thread attempts to
launder enough pages to meet v_free_target within 0.5s, which is the
inactive queue scan period.
A laundry request can be latched while another is currently being
serviced. In particular, a shortfall request will immediately preempt a
background laundering.
This change also redefines the meaning of vm_cnt.v_reactivated and removes
the functions vm_page_cache() and vm_page_try_to_cache(). The new meaning
of vm_cnt.v_reactivated now better reflects its name. It represents the
number of inactive or laundry pages that are returned to the active queue
on account of a reference.
In collaboration with: markj
Reviewed by: kib
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8302
2016-11-09 18:48:37 +00:00
|
|
|
(!in_shortfall || shortfall_cycle == 0)) {
|
2018-02-06 22:10:07 +00:00
|
|
|
shortfall = vm_laundry_target(vmd) +
|
|
|
|
vmd->vmd_pageout_deficit;
|
Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty
pages, specificially, dirty pages that have passed once through the inactive
queue. A new, dedicated thread is responsible for both deciding when to
launder pages and actually laundering them. The new policy uses the
relative sizes of the inactive and laundry queues to determine whether to
launder pages at a given point in time. In general, this leads to more
intelligent swapping behavior, since the laundry thread will avoid pageouts
when the marginal benefit of doing so is low. Previously, without a
dedicated queue for dirty pages, the page daemon didn't have the information
to determine whether pageout provides any benefit to the system. Thus, the
previous policy often resulted in small but steadily increasing amounts of
swap usage when the system is under memory pressure, even when the inactive
queue consisted mostly of clean pages. This change addresses that issue,
and also paves the way for some future virtual memory system improvements by
removing the last source of object-cached clean pages, i.e., PG_CACHE pages.
The new laundry thread sleeps while waiting for a request from the page
daemon thread(s). A request is raised by setting the variable
vm_laundry_request and waking the laundry thread. We request launderings
for two reasons: to try and balance the inactive and laundry queue sizes
("background laundering"), and to quickly make up for a shortage of free
pages and clean inactive pages ("shortfall laundering"). When background
laundering is requested, the laundry thread computes the number of page
daemon wakeups that have taken place since the last laundering. If this
number is large enough relative to the ratio of the laundry and (global)
inactive queue sizes, we will launder vm_background_launder_target pages at
vm_background_launder_rate KB/s. Otherwise, the laundry thread goes back
to sleep without doing any work. When scanning the laundry queue during
background laundering, reactivated pages are counted towards the laundry
thread's target.
In contrast, shortfall laundering is requested when an inactive queue scan
fails to meet its target. In this case, the laundry thread attempts to
launder enough pages to meet v_free_target within 0.5s, which is the
inactive queue scan period.
A laundry request can be latched while another is currently being
serviced. In particular, a shortfall request will immediately preempt a
background laundering.
This change also redefines the meaning of vm_cnt.v_reactivated and removes
the functions vm_page_cache() and vm_page_try_to_cache(). The new meaning
of vm_cnt.v_reactivated now better reflects its name. It represents the
number of inactive or laundry pages that are returned to the active queue
on account of a reference.
In collaboration with: markj
Reviewed by: kib
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8302
2016-11-09 18:48:37 +00:00
|
|
|
target = 0;
|
|
|
|
} else
|
|
|
|
shortfall = 0;
|
|
|
|
|
|
|
|
if (target == 0)
|
2018-02-06 22:10:07 +00:00
|
|
|
vmd->vmd_laundry_request = VM_LAUNDRY_IDLE;
|
2018-03-29 14:27:40 +00:00
|
|
|
nfreed += vmd->vmd_clean_pages_freed;
|
|
|
|
vmd->vmd_clean_pages_freed = 0;
|
Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty
pages, specificially, dirty pages that have passed once through the inactive
queue. A new, dedicated thread is responsible for both deciding when to
launder pages and actually laundering them. The new policy uses the
relative sizes of the inactive and laundry queues to determine whether to
launder pages at a given point in time. In general, this leads to more
intelligent swapping behavior, since the laundry thread will avoid pageouts
when the marginal benefit of doing so is low. Previously, without a
dedicated queue for dirty pages, the page daemon didn't have the information
to determine whether pageout provides any benefit to the system. Thus, the
previous policy often resulted in small but steadily increasing amounts of
swap usage when the system is under memory pressure, even when the inactive
queue consisted mostly of clean pages. This change addresses that issue,
and also paves the way for some future virtual memory system improvements by
removing the last source of object-cached clean pages, i.e., PG_CACHE pages.
The new laundry thread sleeps while waiting for a request from the page
daemon thread(s). A request is raised by setting the variable
vm_laundry_request and waking the laundry thread. We request launderings
for two reasons: to try and balance the inactive and laundry queue sizes
("background laundering"), and to quickly make up for a shortage of free
pages and clean inactive pages ("shortfall laundering"). When background
laundering is requested, the laundry thread computes the number of page
daemon wakeups that have taken place since the last laundering. If this
number is large enough relative to the ratio of the laundry and (global)
inactive queue sizes, we will launder vm_background_launder_target pages at
vm_background_launder_rate KB/s. Otherwise, the laundry thread goes back
to sleep without doing any work. When scanning the laundry queue during
background laundering, reactivated pages are counted towards the laundry
thread's target.
In contrast, shortfall laundering is requested when an inactive queue scan
fails to meet its target. In this case, the laundry thread attempts to
launder enough pages to meet v_free_target within 0.5s, which is the
inactive queue scan period.
A laundry request can be latched while another is currently being
serviced. In particular, a shortfall request will immediately preempt a
background laundering.
This change also redefines the meaning of vm_cnt.v_reactivated and removes
the functions vm_page_cache() and vm_page_try_to_cache(). The new meaning
of vm_cnt.v_reactivated now better reflects its name. It represents the
number of inactive or laundry pages that are returned to the active queue
on account of a reference.
In collaboration with: markj
Reviewed by: kib
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8302
2016-11-09 18:48:37 +00:00
|
|
|
vm_pagequeue_unlock(pq);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-05-24 14:16:22 +00:00
|
|
|
/*
|
|
|
|
* Compute the number of pages we want to try to move from the
|
|
|
|
* active queue to either the inactive or laundry queue.
|
|
|
|
*
|
2018-05-24 20:26:37 +00:00
|
|
|
* When scanning active pages during a shortage, we make clean pages
|
|
|
|
* count more heavily towards the page shortage than dirty pages.
|
|
|
|
* This is because dirty pages must be laundered before they can be
|
|
|
|
* reused and thus have less utility when attempting to quickly
|
|
|
|
* alleviate a free page shortage. However, this weighting also
|
|
|
|
* causes the scan to deactivate dirty pages more aggressively,
|
|
|
|
* improving the effectiveness of clustering.
|
2018-05-24 14:16:22 +00:00
|
|
|
*/
|
|
|
|
static int
|
2018-05-24 20:26:37 +00:00
|
|
|
vm_pageout_active_target(struct vm_domain *vmd)
|
2018-05-24 14:16:22 +00:00
|
|
|
{
|
|
|
|
int shortage;
|
|
|
|
|
|
|
|
shortage = vmd->vmd_inactive_target + vm_paging_target(vmd) -
|
|
|
|
(vmd->vmd_pagequeues[PQ_INACTIVE].pq_cnt +
|
|
|
|
vmd->vmd_pagequeues[PQ_LAUNDRY].pq_cnt / act_scan_laundry_weight);
|
|
|
|
shortage *= act_scan_laundry_weight;
|
|
|
|
return (shortage);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Scan the active queue. If there is no shortage of inactive pages, scan a
|
|
|
|
* small portion of the queue in order to maintain quasi-LRU.
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
vm_pageout_scan_active(struct vm_domain *vmd, int page_shortage)
|
|
|
|
{
|
|
|
|
struct scan_state ss;
|
2019-09-16 15:04:45 +00:00
|
|
|
struct mtx *mtx;
|
Change synchonization rules for vm_page reference counting.
There are several mechanisms by which a vm_page reference is held,
preventing the page from being freed back to the page allocator. In
particular, holding the page's object lock is sufficient to prevent the
page from being freed; holding the busy lock or a wiring is sufficent as
well. These references are protected by the page lock, which must
therefore be acquired for many per-page operations. This results in
false sharing since the page locks are external to the vm_page
structures themselves and each lock protects multiple structures.
Transition to using an atomically updated per-page reference counter.
The object's reference is counted using a flag bit in the counter. A
second flag bit is used to atomically block new references via
pmap_extract_and_hold() while removing managed mappings of a page.
Thus, the reference count of a page is guaranteed not to increase if the
page is unbusied, unmapped, and the object's write lock is held. As
a consequence of this, the page lock no longer protects a page's
identity; operations which move pages between objects are now
synchronized solely by the objects' locks.
The vm_page_wire() and vm_page_unwire() KPIs are changed. The former
requires that either the object lock or the busy lock is held. The
latter no longer has a return value and may free the page if it releases
the last reference to that page. vm_page_unwire_noq() behaves the same
as before; the caller is responsible for checking its return value and
freeing or enqueuing the page as appropriate. vm_page_wire_mapped() is
introduced for use in pmap_extract_and_hold(). It fails if the page is
concurrently being unmapped, typically triggering a fallback to the
fault handler. vm_page_wire() no longer requires the page lock and
vm_page_unwire() now internally acquires the page lock when releasing
the last wiring of a page (since the page lock still protects a page's
queue state). In particular, synchronization details are no longer
leaked into the caller.
The change excises the page lock from several frequently executed code
paths. In particular, vm_object_terminate() no longer bounces between
page locks as it releases an object's pages, and direct I/O and
sendfile(SF_NOCACHE) completions no longer require the page lock. In
these latter cases we now get linear scalability in the common scenario
where different threads are operating on different files.
__FreeBSD_version is bumped. The DRM ports have been updated to
accomodate the KPI changes.
Reviewed by: jeff (earlier version)
Tested by: gallatin (earlier version), pho
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20486
2019-09-09 21:32:42 +00:00
|
|
|
vm_object_t object;
|
2018-05-24 14:16:22 +00:00
|
|
|
vm_page_t m, marker;
|
|
|
|
struct vm_pagequeue *pq;
|
|
|
|
long min_scan;
|
2019-09-16 15:04:45 +00:00
|
|
|
int act_delta, max_scan, scan_tick;
|
2018-05-24 14:16:22 +00:00
|
|
|
|
|
|
|
marker = &vmd->vmd_markers[PQ_ACTIVE];
|
|
|
|
pq = &vmd->vmd_pagequeues[PQ_ACTIVE];
|
|
|
|
vm_pagequeue_lock(pq);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If we're just idle polling attempt to visit every
|
|
|
|
* active page within 'update_period' seconds.
|
|
|
|
*/
|
|
|
|
scan_tick = ticks;
|
|
|
|
if (vm_pageout_update_period != 0) {
|
|
|
|
min_scan = pq->pq_cnt;
|
|
|
|
min_scan *= scan_tick - vmd->vmd_last_active_scan;
|
|
|
|
min_scan /= hz * vm_pageout_update_period;
|
|
|
|
} else
|
|
|
|
min_scan = 0;
|
|
|
|
if (min_scan > 0 || (page_shortage > 0 && pq->pq_cnt > 0))
|
|
|
|
vmd->vmd_last_active_scan = scan_tick;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Scan the active queue for pages that can be deactivated. Update
|
|
|
|
* the per-page activity counter and use it to identify deactivation
|
|
|
|
* candidates. Held pages may be deactivated.
|
|
|
|
*
|
|
|
|
* To avoid requeuing each page that remains in the active queue, we
|
2018-05-24 20:26:37 +00:00
|
|
|
* implement the CLOCK algorithm. To keep the implementation of the
|
|
|
|
* enqueue operation consistent for all page queues, we use two hands,
|
|
|
|
* represented by marker pages. Scans begin at the first hand, which
|
|
|
|
* precedes the second hand in the queue. When the two hands meet,
|
|
|
|
* they are moved back to the head and tail of the queue, respectively,
|
|
|
|
* and scanning resumes.
|
2018-05-24 14:16:22 +00:00
|
|
|
*/
|
|
|
|
max_scan = page_shortage > 0 ? pq->pq_cnt : min_scan;
|
2019-09-16 15:04:45 +00:00
|
|
|
mtx = NULL;
|
2018-05-24 14:16:22 +00:00
|
|
|
act_scan:
|
|
|
|
vm_pageout_init_scan(&ss, pq, marker, &vmd->vmd_clock[0], max_scan);
|
|
|
|
while ((m = vm_pageout_next(&ss, false)) != NULL) {
|
|
|
|
if (__predict_false(m == &vmd->vmd_clock[1])) {
|
|
|
|
vm_pagequeue_lock(pq);
|
|
|
|
TAILQ_REMOVE(&pq->pq_pl, &vmd->vmd_clock[0], plinks.q);
|
|
|
|
TAILQ_REMOVE(&pq->pq_pl, &vmd->vmd_clock[1], plinks.q);
|
|
|
|
TAILQ_INSERT_HEAD(&pq->pq_pl, &vmd->vmd_clock[0],
|
|
|
|
plinks.q);
|
|
|
|
TAILQ_INSERT_TAIL(&pq->pq_pl, &vmd->vmd_clock[1],
|
|
|
|
plinks.q);
|
|
|
|
max_scan -= ss.scanned;
|
|
|
|
vm_pageout_end_scan(&ss);
|
|
|
|
goto act_scan;
|
|
|
|
}
|
|
|
|
if (__predict_false((m->flags & PG_MARKER) != 0))
|
|
|
|
continue;
|
|
|
|
|
2019-09-16 15:04:45 +00:00
|
|
|
vm_page_change_lock(m, &mtx);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The page may have been disassociated from the queue
|
|
|
|
* or even freed while locks were dropped. We thus must be
|
|
|
|
* careful whenever modifying page state. Once the object lock
|
|
|
|
* has been acquired, we have a stable reference to the page.
|
|
|
|
*/
|
|
|
|
if (vm_page_queue(m) != PQ_ACTIVE)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Wired pages are dequeued lazily.
|
|
|
|
*/
|
|
|
|
if (vm_page_wired(m)) {
|
|
|
|
vm_page_dequeue_deferred(m);
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* A page's object pointer may be set to NULL before
|
|
|
|
* the object lock is acquired.
|
|
|
|
*/
|
Change synchonization rules for vm_page reference counting.
There are several mechanisms by which a vm_page reference is held,
preventing the page from being freed back to the page allocator. In
particular, holding the page's object lock is sufficient to prevent the
page from being freed; holding the busy lock or a wiring is sufficent as
well. These references are protected by the page lock, which must
therefore be acquired for many per-page operations. This results in
false sharing since the page locks are external to the vm_page
structures themselves and each lock protects multiple structures.
Transition to using an atomically updated per-page reference counter.
The object's reference is counted using a flag bit in the counter. A
second flag bit is used to atomically block new references via
pmap_extract_and_hold() while removing managed mappings of a page.
Thus, the reference count of a page is guaranteed not to increase if the
page is unbusied, unmapped, and the object's write lock is held. As
a consequence of this, the page lock no longer protects a page's
identity; operations which move pages between objects are now
synchronized solely by the objects' locks.
The vm_page_wire() and vm_page_unwire() KPIs are changed. The former
requires that either the object lock or the busy lock is held. The
latter no longer has a return value and may free the page if it releases
the last reference to that page. vm_page_unwire_noq() behaves the same
as before; the caller is responsible for checking its return value and
freeing or enqueuing the page as appropriate. vm_page_wire_mapped() is
introduced for use in pmap_extract_and_hold(). It fails if the page is
concurrently being unmapped, typically triggering a fallback to the
fault handler. vm_page_wire() no longer requires the page lock and
vm_page_unwire() now internally acquires the page lock when releasing
the last wiring of a page (since the page lock still protects a page's
queue state). In particular, synchronization details are no longer
leaked into the caller.
The change excises the page lock from several frequently executed code
paths. In particular, vm_object_terminate() no longer bounces between
page locks as it releases an object's pages, and direct I/O and
sendfile(SF_NOCACHE) completions no longer require the page lock. In
these latter cases we now get linear scalability in the common scenario
where different threads are operating on different files.
__FreeBSD_version is bumped. The DRM ports have been updated to
accomodate the KPI changes.
Reviewed by: jeff (earlier version)
Tested by: gallatin (earlier version), pho
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20486
2019-09-09 21:32:42 +00:00
|
|
|
object = (vm_object_t)atomic_load_ptr(&m->object);
|
|
|
|
if (__predict_false(object == NULL))
|
|
|
|
/*
|
|
|
|
* The page has been removed from its object.
|
|
|
|
*/
|
|
|
|
continue;
|
|
|
|
|
2018-05-24 14:16:22 +00:00
|
|
|
/*
|
|
|
|
* Check to see "how much" the page has been used.
|
2018-07-15 19:25:15 +00:00
|
|
|
*
|
|
|
|
* Test PGA_REFERENCED after calling pmap_ts_referenced() so
|
|
|
|
* that a reference from a concurrently destroyed mapping is
|
|
|
|
* observed here and now.
|
|
|
|
*
|
2019-09-16 15:04:45 +00:00
|
|
|
* Perform an unsynchronized object ref count check. While
|
|
|
|
* the page lock ensures that the page is not reallocated to
|
|
|
|
* another object, in particular, one with unmanaged mappings
|
|
|
|
* that cannot support pmap_ts_referenced(), two races are,
|
2018-05-24 14:16:22 +00:00
|
|
|
* nonetheless, possible:
|
|
|
|
* 1) The count was transitioning to zero, but we saw a non-
|
2019-09-16 15:04:45 +00:00
|
|
|
* zero value. pmap_ts_referenced() will return zero
|
|
|
|
* because the page is not mapped.
|
|
|
|
* 2) The count was transitioning to one, but we saw zero.
|
|
|
|
* This race delays the detection of a new reference. At
|
|
|
|
* worst, we will deactivate and reactivate the page.
|
2018-05-24 14:16:22 +00:00
|
|
|
*/
|
2019-09-16 15:04:45 +00:00
|
|
|
if (object->ref_count != 0)
|
|
|
|
act_delta = pmap_ts_referenced(m);
|
|
|
|
else
|
|
|
|
act_delta = 0;
|
|
|
|
if ((m->aflags & PGA_REFERENCED) != 0) {
|
|
|
|
vm_page_aflag_clear(m, PGA_REFERENCED);
|
|
|
|
act_delta++;
|
|
|
|
}
|
2019-09-16 15:03:12 +00:00
|
|
|
|
2019-09-16 15:04:45 +00:00
|
|
|
/*
|
|
|
|
* Advance or decay the act_count based on recent usage.
|
|
|
|
*/
|
|
|
|
if (act_delta != 0) {
|
|
|
|
m->act_count += ACT_ADVANCE + act_delta;
|
|
|
|
if (m->act_count > ACT_MAX)
|
|
|
|
m->act_count = ACT_MAX;
|
|
|
|
} else
|
|
|
|
m->act_count -= min(m->act_count, ACT_DECLINE);
|
2018-05-24 14:16:22 +00:00
|
|
|
|
2019-09-16 15:04:45 +00:00
|
|
|
if (m->act_count == 0) {
|
2018-05-24 14:16:22 +00:00
|
|
|
/*
|
2019-09-16 15:04:45 +00:00
|
|
|
* When not short for inactive pages, let dirty pages go
|
|
|
|
* through the inactive queue before moving to the
|
|
|
|
* laundry queues. This gives them some extra time to
|
|
|
|
* be reactivated, potentially avoiding an expensive
|
|
|
|
* pageout. However, during a page shortage, the
|
|
|
|
* inactive queue is necessarily small, and so dirty
|
|
|
|
* pages would only spend a trivial amount of time in
|
|
|
|
* the inactive queue. Therefore, we might as well
|
|
|
|
* place them directly in the laundry queue to reduce
|
|
|
|
* queuing overhead.
|
2018-05-24 14:16:22 +00:00
|
|
|
*/
|
2019-09-16 15:04:45 +00:00
|
|
|
if (page_shortage <= 0) {
|
|
|
|
vm_page_swapqueue(m, PQ_ACTIVE, PQ_INACTIVE);
|
2019-09-03 14:29:58 +00:00
|
|
|
} else {
|
2018-05-24 14:16:22 +00:00
|
|
|
/*
|
|
|
|
* Calling vm_page_test_dirty() here would
|
|
|
|
* require acquisition of the object's write
|
|
|
|
* lock. However, during a page shortage,
|
2019-09-16 15:04:45 +00:00
|
|
|
* directing dirty pages into the laundry
|
|
|
|
* queue is only an optimization and not a
|
2018-05-24 14:16:22 +00:00
|
|
|
* requirement. Therefore, we simply rely on
|
2019-09-16 15:04:45 +00:00
|
|
|
* the opportunistic updates to the page's
|
|
|
|
* dirty field by the pmap.
|
2018-05-24 14:16:22 +00:00
|
|
|
*/
|
2019-09-16 15:04:45 +00:00
|
|
|
if (m->dirty == 0) {
|
|
|
|
vm_page_swapqueue(m, PQ_ACTIVE,
|
|
|
|
PQ_INACTIVE);
|
|
|
|
page_shortage -=
|
|
|
|
act_scan_laundry_weight;
|
2018-05-24 14:16:22 +00:00
|
|
|
} else {
|
2019-09-16 15:04:45 +00:00
|
|
|
vm_page_swapqueue(m, PQ_ACTIVE,
|
|
|
|
PQ_LAUNDRY);
|
|
|
|
page_shortage--;
|
2018-05-24 14:16:22 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
2019-09-16 15:04:45 +00:00
|
|
|
if (mtx != NULL) {
|
|
|
|
mtx_unlock(mtx);
|
|
|
|
mtx = NULL;
|
|
|
|
}
|
2018-05-24 14:16:22 +00:00
|
|
|
vm_pagequeue_lock(pq);
|
|
|
|
TAILQ_REMOVE(&pq->pq_pl, &vmd->vmd_clock[0], plinks.q);
|
|
|
|
TAILQ_INSERT_AFTER(&pq->pq_pl, marker, &vmd->vmd_clock[0], plinks.q);
|
|
|
|
vm_pageout_end_scan(&ss);
|
|
|
|
vm_pagequeue_unlock(pq);
|
|
|
|
}
|
|
|
|
|
2018-04-24 21:15:54 +00:00
|
|
|
static int
|
|
|
|
vm_pageout_reinsert_inactive_page(struct scan_state *ss, vm_page_t m)
|
|
|
|
{
|
|
|
|
struct vm_domain *vmd;
|
|
|
|
|
2019-09-16 15:04:45 +00:00
|
|
|
if (m->queue != PQ_INACTIVE || (m->aflags & PGA_ENQUEUED) != 0)
|
|
|
|
return (0);
|
|
|
|
vm_page_aflag_set(m, PGA_ENQUEUED);
|
|
|
|
if ((m->aflags & PGA_REQUEUE_HEAD) != 0) {
|
|
|
|
vmd = vm_pagequeue_domain(m);
|
|
|
|
TAILQ_INSERT_BEFORE(&vmd->vmd_inacthead, m, plinks.q);
|
|
|
|
vm_page_aflag_clear(m, PGA_REQUEUE | PGA_REQUEUE_HEAD);
|
|
|
|
} else if ((m->aflags & PGA_REQUEUE) != 0) {
|
|
|
|
TAILQ_INSERT_TAIL(&ss->pq->pq_pl, m, plinks.q);
|
|
|
|
vm_page_aflag_clear(m, PGA_REQUEUE | PGA_REQUEUE_HEAD);
|
|
|
|
} else
|
|
|
|
TAILQ_INSERT_BEFORE(ss->marker, m, plinks.q);
|
|
|
|
return (1);
|
2018-04-24 21:15:54 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Re-add stuck pages to the inactive queue. We will examine them again
|
|
|
|
* during the next scan. If the queue state of a page has changed since
|
|
|
|
* it was physically removed from the page queue in
|
|
|
|
* vm_pageout_collect_batch(), don't do anything with that page.
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
vm_pageout_reinsert_inactive(struct scan_state *ss, struct vm_batchqueue *bq,
|
|
|
|
vm_page_t m)
|
|
|
|
{
|
|
|
|
struct vm_pagequeue *pq;
|
|
|
|
int delta;
|
|
|
|
|
|
|
|
delta = 0;
|
|
|
|
pq = ss->pq;
|
|
|
|
|
|
|
|
if (m != NULL) {
|
|
|
|
if (vm_batchqueue_insert(bq, m))
|
|
|
|
return;
|
|
|
|
vm_pagequeue_lock(pq);
|
|
|
|
delta += vm_pageout_reinsert_inactive_page(ss, m);
|
|
|
|
} else
|
|
|
|
vm_pagequeue_lock(pq);
|
|
|
|
while ((m = vm_batchqueue_pop(bq)) != NULL)
|
|
|
|
delta += vm_pageout_reinsert_inactive_page(ss, m);
|
|
|
|
vm_pagequeue_cnt_add(pq, delta);
|
|
|
|
vm_pagequeue_unlock(pq);
|
|
|
|
vm_batchqueue_init(bq);
|
|
|
|
}
|
|
|
|
|
1994-05-25 09:21:21 +00:00
|
|
|
/*
|
2018-06-04 16:46:36 +00:00
|
|
|
* Attempt to reclaim the requested number of pages from the inactive queue.
|
|
|
|
* Returns true if the shortage was addressed.
|
1994-05-25 09:21:21 +00:00
|
|
|
*/
|
2018-05-24 14:16:22 +00:00
|
|
|
static int
|
2018-06-02 00:01:07 +00:00
|
|
|
vm_pageout_scan_inactive(struct vm_domain *vmd, int shortage,
|
2018-05-24 14:16:22 +00:00
|
|
|
int *addl_shortage)
|
1994-05-25 09:21:21 +00:00
|
|
|
{
|
2018-04-24 21:15:54 +00:00
|
|
|
struct scan_state ss;
|
|
|
|
struct vm_batchqueue rq;
|
2019-09-16 15:04:45 +00:00
|
|
|
struct mtx *mtx;
|
2018-04-24 21:15:54 +00:00
|
|
|
vm_page_t m, marker;
|
2012-11-13 02:50:39 +00:00
|
|
|
struct vm_pagequeue *pq;
|
These changes embody the support of the fully coherent merged VM buffer cache,
much higher filesystem I/O performance, and much better paging performance. It
represents the culmination of over 6 months of R&D.
The majority of the merged VM/cache work is by John Dyson.
The following highlights the most significant changes. Additionally, there are
(mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to
support the new VM/buffer scheme.
vfs_bio.c:
Significant rewrite of most of vfs_bio to support the merged VM buffer cache
scheme. The scheme is almost fully compatible with the old filesystem
interface. Significant improvement in the number of opportunities for write
clustering.
vfs_cluster.c, vfs_subr.c
Upgrade and performance enhancements in vfs layer code to support merged
VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff.
vm_object.c:
Yet more improvements in the collapse code. Elimination of some windows that
can cause list corruption.
vm_pageout.c:
Fixed it, it really works better now. Somehow in 2.0, some "enhancements"
broke the code. This code has been reworked from the ground-up.
vm_fault.c, vm_page.c, pmap.c, vm_object.c
Support for small-block filesystems with merged VM/buffer cache scheme.
pmap.c vm_map.c
Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of
kernel PTs.
vm_glue.c
Much simpler and more effective swapping code. No more gratuitous swapping.
proc.h
Fixed the problem that the p_lock flag was not being cleared on a fork.
swap_pager.c, vnode_pager.c
Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the
code doesn't need it anymore.
machdep.c
Changes to better support the parameter values for the merged VM/buffer cache
scheme.
machdep.c, kern_exec.c, vm_glue.c
Implemented a seperate submap for temporary exec string space and another one
to contain process upages. This eliminates all map fragmentation problems
that previously existed.
ffs_inode.c, ufs_inode.c, ufs_readwrite.c
Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on
busy buffers.
Submitted by: John Dyson and David Greenman
1995-01-09 16:06:02 +00:00
|
|
|
vm_object_t object;
|
2019-09-16 15:04:45 +00:00
|
|
|
int act_delta, addl_page_shortage, deficit, page_shortage;
|
2018-05-24 14:16:22 +00:00
|
|
|
int starting_page_shortage;
|
These changes embody the support of the fully coherent merged VM buffer cache,
much higher filesystem I/O performance, and much better paging performance. It
represents the culmination of over 6 months of R&D.
The majority of the merged VM/cache work is by John Dyson.
The following highlights the most significant changes. Additionally, there are
(mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to
support the new VM/buffer scheme.
vfs_bio.c:
Significant rewrite of most of vfs_bio to support the merged VM buffer cache
scheme. The scheme is almost fully compatible with the old filesystem
interface. Significant improvement in the number of opportunities for write
clustering.
vfs_cluster.c, vfs_subr.c
Upgrade and performance enhancements in vfs layer code to support merged
VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff.
vm_object.c:
Yet more improvements in the collapse code. Elimination of some windows that
can cause list corruption.
vm_pageout.c:
Fixed it, it really works better now. Somehow in 2.0, some "enhancements"
broke the code. This code has been reworked from the ground-up.
vm_fault.c, vm_page.c, pmap.c, vm_object.c
Support for small-block filesystems with merged VM/buffer cache scheme.
pmap.c vm_map.c
Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of
kernel PTs.
vm_glue.c
Much simpler and more effective swapping code. No more gratuitous swapping.
proc.h
Fixed the problem that the p_lock flag was not being cleared on a fork.
swap_pager.c, vnode_pager.c
Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the
code doesn't need it anymore.
machdep.c
Changes to better support the parameter values for the merged VM/buffer cache
scheme.
machdep.c, kern_exec.c, vm_glue.c
Implemented a seperate submap for temporary exec string space and another one
to contain process upages. This eliminates all map fragmentation problems
that previously existed.
ffs_inode.c, ufs_inode.c, ufs_readwrite.c
Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on
busy buffers.
Submitted by: John Dyson and David Greenman
1995-01-09 16:06:02 +00:00
|
|
|
|
2012-07-26 09:06:48 +00:00
|
|
|
/*
|
2018-05-18 16:59:58 +00:00
|
|
|
* The addl_page_shortage is an estimate of the number of temporarily
|
2012-07-26 09:06:48 +00:00
|
|
|
* stuck pages in the inactive queue. In other words, the
|
Split the pagequeues per NUMA domains, and split pageademon process
into threads each processing queue in a single domain. The structure
of the pagedaemons and queues is kept intact, most of the changes come
from the need for code to find an owning page queue for given page,
calculated from the segment containing the page.
The tie between NUMA domain and pagedaemon thread/pagequeue split is
rather arbitrary, the multithreaded daemon could be allowed for the
single-domain machines, or one domain might be split into several page
domains, to further increase concurrency.
Right now, each pagedaemon thread tries to reach the global target,
precalculated at the start of the pass. This is not optimal, since it
could cause excessive page deactivation and freeing. The code should
be changed to re-check the global page deficit state in the loop after
some number of iterations.
The pagedaemons reach the quorum before starting the OOM, since one
thread inability to meet the target is normal for split queues. Only
when all pagedaemons fail to produce enough reusable pages, OOM is
started by single selected thread.
Launder is modified to take into account the segments layout with
regard to the region for which cleaning is performed.
Based on the preliminary patch by jeff, sponsored by EMC / Isilon
Storage Division.
Reviewed by: alc
Tested by: pho
Sponsored by: The FreeBSD Foundation
2013-08-07 16:36:38 +00:00
|
|
|
* number of pages from the inactive count that should be
|
2012-07-26 09:06:48 +00:00
|
|
|
* discounted in setting the target for the active queue scan.
|
|
|
|
*/
|
Correctly update the count of stuck pages, "addl_page_shortage", in
vm_pageout_scan(). There were missing increments in two less common cases.
Don't conflate the count of stuck pages and the pageout deficit provided by
vm_page_alloc{,_contig}(). (A proposed fix to the OOM code depends on this.)
Handle held pages consistently in the inactive queue scan. In the more
common case, we did not move the page to the tail of the queue. Whereas, in
the less common case, we did. There's no particular reason to move the page
in the less common case, so remove it.
Perform the calculation of the page shortage for the active queue scan a
little earlier, before the active queue lock is acquired. The correctness
of this calculation doesn't depend on the active queue lock being held.
Eliminate a redundant variable, "pcount". Use the more descriptive
variable, "maxscan", in its place.
Apply a few nearby style fixes, e.g., eliminate stray whitespace and excess
parentheses.
Reviewed by: kib
Sponsored by: EMC / Isilon Storage Division
2014-01-12 19:04:20 +00:00
|
|
|
addl_page_shortage = 0;
|
|
|
|
|
1999-01-21 08:29:12 +00:00
|
|
|
/*
|
2018-06-02 00:01:07 +00:00
|
|
|
* vmd_pageout_deficit counts the number of pages requested in
|
|
|
|
* allocations that failed because of a free page shortage. We assume
|
|
|
|
* that the allocations will be reattempted and thus include the deficit
|
|
|
|
* in our scan target.
|
1999-01-21 08:29:12 +00:00
|
|
|
*/
|
2018-06-02 00:01:07 +00:00
|
|
|
deficit = atomic_readandclear_int(&vmd->vmd_pageout_deficit);
|
|
|
|
starting_page_shortage = page_shortage = shortage + deficit;
|
1996-05-29 05:15:33 +00:00
|
|
|
|
2019-09-16 15:04:45 +00:00
|
|
|
mtx = NULL;
|
2018-04-24 21:15:54 +00:00
|
|
|
object = NULL;
|
|
|
|
vm_batchqueue_init(&rq);
|
|
|
|
|
2012-11-13 02:50:39 +00:00
|
|
|
/*
|
2016-07-28 22:30:48 +00:00
|
|
|
* Start scanning the inactive queue for pages that we can free. The
|
|
|
|
* scan will stop when we reach the target or we have scanned the
|
|
|
|
* entire queue. (Note that m->act_count is not used to make
|
|
|
|
* decisions for the inactive queue, only for the active queue.)
|
2012-11-13 02:50:39 +00:00
|
|
|
*/
|
2018-04-19 14:09:44 +00:00
|
|
|
marker = &vmd->vmd_markers[PQ_INACTIVE];
|
2018-04-24 21:15:54 +00:00
|
|
|
pq = &vmd->vmd_pagequeues[PQ_INACTIVE];
|
2012-11-13 02:50:39 +00:00
|
|
|
vm_pagequeue_lock(pq);
|
2018-04-24 21:15:54 +00:00
|
|
|
vm_pageout_init_scan(&ss, pq, marker, NULL, pq->pq_cnt);
|
|
|
|
while (page_shortage > 0 && (m = vm_pageout_next(&ss, true)) != NULL) {
|
|
|
|
KASSERT((m->flags & PG_MARKER) == 0,
|
|
|
|
("marker page %p was dequeued", m));
|
1994-05-24 10:09:53 +00:00
|
|
|
|
2019-09-16 15:04:45 +00:00
|
|
|
vm_page_change_lock(m, &mtx);
|
|
|
|
|
|
|
|
recheck:
|
Implement a low-memory deadlock solution.
Removed most of the hacks that were trying to deal with low-memory
situations prior to now.
The new code is based on the concept that I/O must be able to function in
a low memory situation. All major modules related to I/O (except
networking) have been adjusted to allow allocation out of the system
reserve memory pool. These modules now detect a low memory situation but
rather then block they instead continue to operate, then return resources
to the memory pool instead of cache them or leave them wired.
Code has been added to stall in a low-memory situation prior to a vnode
being locked.
Thus situations where a process blocks in a low-memory condition while
holding a locked vnode have been reduced to near nothing. Not only will
I/O continue to operate, but many prior deadlock conditions simply no
longer exist.
Implement a number of VFS/BIO fixes
(found by Ian): in biodone(), bogus-page replacement code, the loop
was not properly incrementing loop variables prior to a continue
statement. We do not believe this code can be hit anyway but we
aren't taking any chances. We'll turn the whole section into a
panic (as it already is in brelse()) after the release is rolled.
In biodone(), the foff calculation was incorrectly
clamped to the iosize, causing the wrong foff to be calculated
for pages in the case of an I/O error or biodone() called without
initiating I/O. The problem always caused a panic before. Now it
doesn't. The problem is mainly an issue with NFS.
Fixed casts for ~PAGE_MASK. This code worked properly before only
because the calculations use signed arithmatic. Better to properly
extend PAGE_MASK first before inverting it for the 64 bit masking
op.
In brelse(), the bogus_page fixup code was improperly throwing
away the original contents of 'm' when it did the j-loop to
fix the bogus pages. The result was that it would potentially
invalidate parts of the *WRONG* page(!), leading to corruption.
There may still be cases where a background bitmap write is
being duplicated, causing potential corruption. We have identified
a potentially serious bug related to this but the fix is still TBD.
So instead this patch contains a KASSERT to detect the problem
and panic the machine rather then continue to corrupt the filesystem.
The problem does not occur very often.. it is very hard to
reproduce, and it may or may not be the cause of the corruption
people have reported.
Review by: (VFS/BIO: mckusick, Ian Dowse <iedowse@maths.tcd.ie>)
Testing by: (VM/Deadlock) Paul Saab <ps@yahoo-inc.com>
2000-11-18 23:06:26 +00:00
|
|
|
/*
|
2019-09-16 15:04:45 +00:00
|
|
|
* The page may have been disassociated from the queue
|
|
|
|
* or even freed while locks were dropped. We thus must be
|
|
|
|
* careful whenever modifying page state. Once the object lock
|
|
|
|
* has been acquired, we have a stable reference to the page.
|
Implement a low-memory deadlock solution.
Removed most of the hacks that were trying to deal with low-memory
situations prior to now.
The new code is based on the concept that I/O must be able to function in
a low memory situation. All major modules related to I/O (except
networking) have been adjusted to allow allocation out of the system
reserve memory pool. These modules now detect a low memory situation but
rather then block they instead continue to operate, then return resources
to the memory pool instead of cache them or leave them wired.
Code has been added to stall in a low-memory situation prior to a vnode
being locked.
Thus situations where a process blocks in a low-memory condition while
holding a locked vnode have been reduced to near nothing. Not only will
I/O continue to operate, but many prior deadlock conditions simply no
longer exist.
Implement a number of VFS/BIO fixes
(found by Ian): in biodone(), bogus-page replacement code, the loop
was not properly incrementing loop variables prior to a continue
statement. We do not believe this code can be hit anyway but we
aren't taking any chances. We'll turn the whole section into a
panic (as it already is in brelse()) after the release is rolled.
In biodone(), the foff calculation was incorrectly
clamped to the iosize, causing the wrong foff to be calculated
for pages in the case of an I/O error or biodone() called without
initiating I/O. The problem always caused a panic before. Now it
doesn't. The problem is mainly an issue with NFS.
Fixed casts for ~PAGE_MASK. This code worked properly before only
because the calculations use signed arithmatic. Better to properly
extend PAGE_MASK first before inverting it for the 64 bit masking
op.
In brelse(), the bogus_page fixup code was improperly throwing
away the original contents of 'm' when it did the j-loop to
fix the bogus pages. The result was that it would potentially
invalidate parts of the *WRONG* page(!), leading to corruption.
There may still be cases where a background bitmap write is
being duplicated, causing potential corruption. We have identified
a potentially serious bug related to this but the fix is still TBD.
So instead this patch contains a KASSERT to detect the problem
and panic the machine rather then continue to corrupt the filesystem.
The problem does not occur very often.. it is very hard to
reproduce, and it may or may not be the cause of the corruption
people have reported.
Review by: (VFS/BIO: mckusick, Ian Dowse <iedowse@maths.tcd.ie>)
Testing by: (VM/Deadlock) Paul Saab <ps@yahoo-inc.com>
2000-11-18 23:06:26 +00:00
|
|
|
*/
|
2019-09-16 15:04:45 +00:00
|
|
|
if (vm_page_queue(m) != PQ_INACTIVE) {
|
|
|
|
addl_page_shortage++;
|
Implement a low-memory deadlock solution.
Removed most of the hacks that were trying to deal with low-memory
situations prior to now.
The new code is based on the concept that I/O must be able to function in
a low memory situation. All major modules related to I/O (except
networking) have been adjusted to allow allocation out of the system
reserve memory pool. These modules now detect a low memory situation but
rather then block they instead continue to operate, then return resources
to the memory pool instead of cache them or leave them wired.
Code has been added to stall in a low-memory situation prior to a vnode
being locked.
Thus situations where a process blocks in a low-memory condition while
holding a locked vnode have been reduced to near nothing. Not only will
I/O continue to operate, but many prior deadlock conditions simply no
longer exist.
Implement a number of VFS/BIO fixes
(found by Ian): in biodone(), bogus-page replacement code, the loop
was not properly incrementing loop variables prior to a continue
statement. We do not believe this code can be hit anyway but we
aren't taking any chances. We'll turn the whole section into a
panic (as it already is in brelse()) after the release is rolled.
In biodone(), the foff calculation was incorrectly
clamped to the iosize, causing the wrong foff to be calculated
for pages in the case of an I/O error or biodone() called without
initiating I/O. The problem always caused a panic before. Now it
doesn't. The problem is mainly an issue with NFS.
Fixed casts for ~PAGE_MASK. This code worked properly before only
because the calculations use signed arithmatic. Better to properly
extend PAGE_MASK first before inverting it for the 64 bit masking
op.
In brelse(), the bogus_page fixup code was improperly throwing
away the original contents of 'm' when it did the j-loop to
fix the bogus pages. The result was that it would potentially
invalidate parts of the *WRONG* page(!), leading to corruption.
There may still be cases where a background bitmap write is
being duplicated, causing potential corruption. We have identified
a potentially serious bug related to this but the fix is still TBD.
So instead this patch contains a KASSERT to detect the problem
and panic the machine rather then continue to corrupt the filesystem.
The problem does not occur very often.. it is very hard to
reproduce, and it may or may not be the cause of the corruption
people have reported.
Review by: (VFS/BIO: mckusick, Ian Dowse <iedowse@maths.tcd.ie>)
Testing by: (VM/Deadlock) Paul Saab <ps@yahoo-inc.com>
2000-11-18 23:06:26 +00:00
|
|
|
continue;
|
2019-09-16 15:04:45 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The page was re-enqueued after the page queue lock was
|
|
|
|
* dropped, or a requeue was requested. This page gets a second
|
|
|
|
* chance.
|
|
|
|
*/
|
|
|
|
if ((m->aflags & (PGA_ENQUEUED | PGA_REQUEUE |
|
|
|
|
PGA_REQUEUE_HEAD)) != 0)
|
|
|
|
goto reinsert;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Wired pages may not be freed. Complete their removal
|
|
|
|
* from the queue now to avoid needless revisits during
|
|
|
|
* future scans. This check is racy and must be reverified once
|
|
|
|
* we hold the object lock and have verified that the page
|
|
|
|
* is not busy.
|
|
|
|
*/
|
|
|
|
if (vm_page_wired(m)) {
|
|
|
|
vm_page_dequeue_deferred(m);
|
2018-04-24 21:15:54 +00:00
|
|
|
continue;
|
1994-05-25 09:21:21 +00:00
|
|
|
}
|
2018-04-24 21:15:54 +00:00
|
|
|
|
|
|
|
if (object != m->object) {
|
2019-02-17 16:35:19 +00:00
|
|
|
if (object != NULL)
|
2018-04-24 21:15:54 +00:00
|
|
|
VM_OBJECT_WUNLOCK(object);
|
2019-09-16 15:04:45 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* A page's object pointer may be set to NULL before
|
|
|
|
* the object lock is acquired.
|
|
|
|
*/
|
Change synchonization rules for vm_page reference counting.
There are several mechanisms by which a vm_page reference is held,
preventing the page from being freed back to the page allocator. In
particular, holding the page's object lock is sufficient to prevent the
page from being freed; holding the busy lock or a wiring is sufficent as
well. These references are protected by the page lock, which must
therefore be acquired for many per-page operations. This results in
false sharing since the page locks are external to the vm_page
structures themselves and each lock protects multiple structures.
Transition to using an atomically updated per-page reference counter.
The object's reference is counted using a flag bit in the counter. A
second flag bit is used to atomically block new references via
pmap_extract_and_hold() while removing managed mappings of a page.
Thus, the reference count of a page is guaranteed not to increase if the
page is unbusied, unmapped, and the object's write lock is held. As
a consequence of this, the page lock no longer protects a page's
identity; operations which move pages between objects are now
synchronized solely by the objects' locks.
The vm_page_wire() and vm_page_unwire() KPIs are changed. The former
requires that either the object lock or the busy lock is held. The
latter no longer has a return value and may free the page if it releases
the last reference to that page. vm_page_unwire_noq() behaves the same
as before; the caller is responsible for checking its return value and
freeing or enqueuing the page as appropriate. vm_page_wire_mapped() is
introduced for use in pmap_extract_and_hold(). It fails if the page is
concurrently being unmapped, typically triggering a fallback to the
fault handler. vm_page_wire() no longer requires the page lock and
vm_page_unwire() now internally acquires the page lock when releasing
the last wiring of a page (since the page lock still protects a page's
queue state). In particular, synchronization details are no longer
leaked into the caller.
The change excises the page lock from several frequently executed code
paths. In particular, vm_object_terminate() no longer bounces between
page locks as it releases an object's pages, and direct I/O and
sendfile(SF_NOCACHE) completions no longer require the page lock. In
these latter cases we now get linear scalability in the common scenario
where different threads are operating on different files.
__FreeBSD_version is bumped. The DRM ports have been updated to
accomodate the KPI changes.
Reviewed by: jeff (earlier version)
Tested by: gallatin (earlier version), pho
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20486
2019-09-09 21:32:42 +00:00
|
|
|
object = (vm_object_t)atomic_load_ptr(&m->object);
|
2019-09-16 15:04:45 +00:00
|
|
|
if (object != NULL && !VM_OBJECT_TRYWLOCK(object)) {
|
|
|
|
mtx_unlock(mtx);
|
|
|
|
/* Depends on type-stability. */
|
|
|
|
VM_OBJECT_WLOCK(object);
|
|
|
|
mtx_lock(mtx);
|
|
|
|
goto recheck;
|
2019-02-17 16:35:19 +00:00
|
|
|
}
|
2018-04-24 21:15:54 +00:00
|
|
|
}
|
2019-09-16 15:04:45 +00:00
|
|
|
if (__predict_false(m->object == NULL))
|
|
|
|
/*
|
|
|
|
* The page has been removed from its object.
|
|
|
|
*/
|
|
|
|
continue;
|
|
|
|
KASSERT(m->object == object, ("page %p does not belong to %p",
|
|
|
|
m, object));
|
2018-04-24 21:15:54 +00:00
|
|
|
|
2013-08-09 11:11:11 +00:00
|
|
|
if (vm_page_busied(m)) {
|
2015-09-01 06:21:12 +00:00
|
|
|
/*
|
|
|
|
* Don't mess with busy pages. Leave them at
|
|
|
|
* the front of the queue. Most likely, they
|
|
|
|
* are being paged out and will leave the
|
|
|
|
* queue shortly after the scan finishes. So,
|
|
|
|
* they ought to be discounted from the
|
|
|
|
* inactive count.
|
|
|
|
*/
|
1996-05-29 05:15:33 +00:00
|
|
|
addl_page_shortage++;
|
2018-04-24 21:15:54 +00:00
|
|
|
goto reinsert;
|
1994-05-25 09:21:21 +00:00
|
|
|
}
|
2012-07-07 19:39:08 +00:00
|
|
|
|
Change synchonization rules for vm_page reference counting.
There are several mechanisms by which a vm_page reference is held,
preventing the page from being freed back to the page allocator. In
particular, holding the page's object lock is sufficient to prevent the
page from being freed; holding the busy lock or a wiring is sufficent as
well. These references are protected by the page lock, which must
therefore be acquired for many per-page operations. This results in
false sharing since the page locks are external to the vm_page
structures themselves and each lock protects multiple structures.
Transition to using an atomically updated per-page reference counter.
The object's reference is counted using a flag bit in the counter. A
second flag bit is used to atomically block new references via
pmap_extract_and_hold() while removing managed mappings of a page.
Thus, the reference count of a page is guaranteed not to increase if the
page is unbusied, unmapped, and the object's write lock is held. As
a consequence of this, the page lock no longer protects a page's
identity; operations which move pages between objects are now
synchronized solely by the objects' locks.
The vm_page_wire() and vm_page_unwire() KPIs are changed. The former
requires that either the object lock or the busy lock is held. The
latter no longer has a return value and may free the page if it releases
the last reference to that page. vm_page_unwire_noq() behaves the same
as before; the caller is responsible for checking its return value and
freeing or enqueuing the page as appropriate. vm_page_wire_mapped() is
introduced for use in pmap_extract_and_hold(). It fails if the page is
concurrently being unmapped, typically triggering a fallback to the
fault handler. vm_page_wire() no longer requires the page lock and
vm_page_unwire() now internally acquires the page lock when releasing
the last wiring of a page (since the page lock still protects a page's
queue state). In particular, synchronization details are no longer
leaked into the caller.
The change excises the page lock from several frequently executed code
paths. In particular, vm_object_terminate() no longer bounces between
page locks as it releases an object's pages, and direct I/O and
sendfile(SF_NOCACHE) completions no longer require the page lock. In
these latter cases we now get linear scalability in the common scenario
where different threads are operating on different files.
__FreeBSD_version is bumped. The DRM ports have been updated to
accomodate the KPI changes.
Reviewed by: jeff (earlier version)
Tested by: gallatin (earlier version), pho
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20486
2019-09-09 21:32:42 +00:00
|
|
|
/*
|
2019-09-16 15:04:45 +00:00
|
|
|
* Re-check for wirings now that we hold the object lock and
|
|
|
|
* have verified that the page is unbusied. If the page is
|
|
|
|
* mapped, it may still be wired by pmap lookups. The call to
|
Change synchonization rules for vm_page reference counting.
There are several mechanisms by which a vm_page reference is held,
preventing the page from being freed back to the page allocator. In
particular, holding the page's object lock is sufficient to prevent the
page from being freed; holding the busy lock or a wiring is sufficent as
well. These references are protected by the page lock, which must
therefore be acquired for many per-page operations. This results in
false sharing since the page locks are external to the vm_page
structures themselves and each lock protects multiple structures.
Transition to using an atomically updated per-page reference counter.
The object's reference is counted using a flag bit in the counter. A
second flag bit is used to atomically block new references via
pmap_extract_and_hold() while removing managed mappings of a page.
Thus, the reference count of a page is guaranteed not to increase if the
page is unbusied, unmapped, and the object's write lock is held. As
a consequence of this, the page lock no longer protects a page's
identity; operations which move pages between objects are now
synchronized solely by the objects' locks.
The vm_page_wire() and vm_page_unwire() KPIs are changed. The former
requires that either the object lock or the busy lock is held. The
latter no longer has a return value and may free the page if it releases
the last reference to that page. vm_page_unwire_noq() behaves the same
as before; the caller is responsible for checking its return value and
freeing or enqueuing the page as appropriate. vm_page_wire_mapped() is
introduced for use in pmap_extract_and_hold(). It fails if the page is
concurrently being unmapped, typically triggering a fallback to the
fault handler. vm_page_wire() no longer requires the page lock and
vm_page_unwire() now internally acquires the page lock when releasing
the last wiring of a page (since the page lock still protects a page's
queue state). In particular, synchronization details are no longer
leaked into the caller.
The change excises the page lock from several frequently executed code
paths. In particular, vm_object_terminate() no longer bounces between
page locks as it releases an object's pages, and direct I/O and
sendfile(SF_NOCACHE) completions no longer require the page lock. In
these latter cases we now get linear scalability in the common scenario
where different threads are operating on different files.
__FreeBSD_version is bumped. The DRM ports have been updated to
accomodate the KPI changes.
Reviewed by: jeff (earlier version)
Tested by: gallatin (earlier version), pho
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20486
2019-09-09 21:32:42 +00:00
|
|
|
* vm_page_try_remove_all() below atomically checks for such
|
|
|
|
* wirings and removes mappings. If the page is unmapped, the
|
|
|
|
* wire count is guaranteed not to increase.
|
|
|
|
*/
|
|
|
|
if (__predict_false(vm_page_wired(m))) {
|
2019-09-16 15:04:45 +00:00
|
|
|
vm_page_dequeue_deferred(m);
|
Change synchonization rules for vm_page reference counting.
There are several mechanisms by which a vm_page reference is held,
preventing the page from being freed back to the page allocator. In
particular, holding the page's object lock is sufficient to prevent the
page from being freed; holding the busy lock or a wiring is sufficent as
well. These references are protected by the page lock, which must
therefore be acquired for many per-page operations. This results in
false sharing since the page locks are external to the vm_page
structures themselves and each lock protects multiple structures.
Transition to using an atomically updated per-page reference counter.
The object's reference is counted using a flag bit in the counter. A
second flag bit is used to atomically block new references via
pmap_extract_and_hold() while removing managed mappings of a page.
Thus, the reference count of a page is guaranteed not to increase if the
page is unbusied, unmapped, and the object's write lock is held. As
a consequence of this, the page lock no longer protects a page's
identity; operations which move pages between objects are now
synchronized solely by the objects' locks.
The vm_page_wire() and vm_page_unwire() KPIs are changed. The former
requires that either the object lock or the busy lock is held. The
latter no longer has a return value and may free the page if it releases
the last reference to that page. vm_page_unwire_noq() behaves the same
as before; the caller is responsible for checking its return value and
freeing or enqueuing the page as appropriate. vm_page_wire_mapped() is
introduced for use in pmap_extract_and_hold(). It fails if the page is
concurrently being unmapped, typically triggering a fallback to the
fault handler. vm_page_wire() no longer requires the page lock and
vm_page_unwire() now internally acquires the page lock when releasing
the last wiring of a page (since the page lock still protects a page's
queue state). In particular, synchronization details are no longer
leaked into the caller.
The change excises the page lock from several frequently executed code
paths. In particular, vm_object_terminate() no longer bounces between
page locks as it releases an object's pages, and direct I/O and
sendfile(SF_NOCACHE) completions no longer require the page lock. In
these latter cases we now get linear scalability in the common scenario
where different threads are operating on different files.
__FreeBSD_version is bumped. The DRM ports have been updated to
accomodate the KPI changes.
Reviewed by: jeff (earlier version)
Tested by: gallatin (earlier version), pho
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20486
2019-09-09 21:32:42 +00:00
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
2015-06-14 20:23:41 +00:00
|
|
|
/*
|
2015-10-15 19:07:38 +00:00
|
|
|
* Invalid pages can be easily freed. They cannot be
|
|
|
|
* mapped, vm_page_free() asserts this.
|
2015-06-14 20:23:41 +00:00
|
|
|
*/
|
2015-10-15 19:07:38 +00:00
|
|
|
if (m->valid == 0)
|
|
|
|
goto free_page;
|
2015-06-14 20:23:41 +00:00
|
|
|
|
1997-10-06 02:48:16 +00:00
|
|
|
/*
|
2015-09-05 17:34:49 +00:00
|
|
|
* If the page has been referenced and the object is not dead,
|
|
|
|
* reactivate or requeue the page depending on whether the
|
|
|
|
* object is mapped.
|
2018-07-15 19:25:15 +00:00
|
|
|
*
|
|
|
|
* Test PGA_REFERENCED after calling pmap_ts_referenced() so
|
|
|
|
* that a reference from a concurrently destroyed mapping is
|
|
|
|
* observed here and now.
|
1997-10-06 02:48:16 +00:00
|
|
|
*/
|
2019-09-16 15:04:45 +00:00
|
|
|
if (object->ref_count != 0)
|
|
|
|
act_delta = pmap_ts_referenced(m);
|
|
|
|
else {
|
|
|
|
KASSERT(!pmap_page_is_mapped(m),
|
|
|
|
("page %p is mapped", m));
|
|
|
|
act_delta = 0;
|
|
|
|
}
|
|
|
|
if ((m->aflags & PGA_REFERENCED) != 0) {
|
|
|
|
vm_page_aflag_clear(m, PGA_REFERENCED);
|
|
|
|
act_delta++;
|
|
|
|
}
|
|
|
|
if (act_delta != 0) {
|
|
|
|
if (object->ref_count != 0) {
|
|
|
|
VM_CNT_INC(v_reactivated);
|
|
|
|
vm_page_activate(m);
|
2019-09-16 15:03:12 +00:00
|
|
|
|
2019-09-16 15:04:45 +00:00
|
|
|
/*
|
|
|
|
* Increase the activation count if the page
|
|
|
|
* was referenced while in the inactive queue.
|
|
|
|
* This makes it less likely that the page will
|
|
|
|
* be returned prematurely to the inactive
|
|
|
|
* queue.
|
|
|
|
*/
|
|
|
|
m->act_count += act_delta + ACT_ADVANCE;
|
|
|
|
continue;
|
|
|
|
} else if ((object->flags & OBJ_DEAD) == 0) {
|
|
|
|
vm_page_aflag_set(m, PGA_REQUEUE);
|
|
|
|
goto reinsert;
|
2019-09-16 15:03:12 +00:00
|
|
|
}
|
1996-07-30 03:08:57 +00:00
|
|
|
}
|
|
|
|
|
1997-10-06 02:48:16 +00:00
|
|
|
/*
|
2012-11-01 16:20:02 +00:00
|
|
|
* If the page appears to be clean at the machine-independent
|
|
|
|
* layer, then remove all of its mappings from the pmap in
|
2016-07-27 03:49:00 +00:00
|
|
|
* anticipation of freeing it. If, however, any of the page's
|
|
|
|
* mappings allow write access, then the page may still be
|
|
|
|
* modified until the last of those mappings are removed.
|
1997-10-06 02:48:16 +00:00
|
|
|
*/
|
2015-06-21 01:22:35 +00:00
|
|
|
if (object->ref_count != 0) {
|
|
|
|
vm_page_test_dirty(m);
|
Change synchonization rules for vm_page reference counting.
There are several mechanisms by which a vm_page reference is held,
preventing the page from being freed back to the page allocator. In
particular, holding the page's object lock is sufficient to prevent the
page from being freed; holding the busy lock or a wiring is sufficent as
well. These references are protected by the page lock, which must
therefore be acquired for many per-page operations. This results in
false sharing since the page locks are external to the vm_page
structures themselves and each lock protects multiple structures.
Transition to using an atomically updated per-page reference counter.
The object's reference is counted using a flag bit in the counter. A
second flag bit is used to atomically block new references via
pmap_extract_and_hold() while removing managed mappings of a page.
Thus, the reference count of a page is guaranteed not to increase if the
page is unbusied, unmapped, and the object's write lock is held. As
a consequence of this, the page lock no longer protects a page's
identity; operations which move pages between objects are now
synchronized solely by the objects' locks.
The vm_page_wire() and vm_page_unwire() KPIs are changed. The former
requires that either the object lock or the busy lock is held. The
latter no longer has a return value and may free the page if it releases
the last reference to that page. vm_page_unwire_noq() behaves the same
as before; the caller is responsible for checking its return value and
freeing or enqueuing the page as appropriate. vm_page_wire_mapped() is
introduced for use in pmap_extract_and_hold(). It fails if the page is
concurrently being unmapped, typically triggering a fallback to the
fault handler. vm_page_wire() no longer requires the page lock and
vm_page_unwire() now internally acquires the page lock when releasing
the last wiring of a page (since the page lock still protects a page's
queue state). In particular, synchronization details are no longer
leaked into the caller.
The change excises the page lock from several frequently executed code
paths. In particular, vm_object_terminate() no longer bounces between
page locks as it releases an object's pages, and direct I/O and
sendfile(SF_NOCACHE) completions no longer require the page lock. In
these latter cases we now get linear scalability in the common scenario
where different threads are operating on different files.
__FreeBSD_version is bumped. The DRM ports have been updated to
accomodate the KPI changes.
Reviewed by: jeff (earlier version)
Tested by: gallatin (earlier version), pho
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20486
2019-09-09 21:32:42 +00:00
|
|
|
if (m->dirty == 0 && !vm_page_try_remove_all(m)) {
|
2019-09-16 15:04:45 +00:00
|
|
|
vm_page_dequeue_deferred(m);
|
Change synchonization rules for vm_page reference counting.
There are several mechanisms by which a vm_page reference is held,
preventing the page from being freed back to the page allocator. In
particular, holding the page's object lock is sufficient to prevent the
page from being freed; holding the busy lock or a wiring is sufficent as
well. These references are protected by the page lock, which must
therefore be acquired for many per-page operations. This results in
false sharing since the page locks are external to the vm_page
structures themselves and each lock protects multiple structures.
Transition to using an atomically updated per-page reference counter.
The object's reference is counted using a flag bit in the counter. A
second flag bit is used to atomically block new references via
pmap_extract_and_hold() while removing managed mappings of a page.
Thus, the reference count of a page is guaranteed not to increase if the
page is unbusied, unmapped, and the object's write lock is held. As
a consequence of this, the page lock no longer protects a page's
identity; operations which move pages between objects are now
synchronized solely by the objects' locks.
The vm_page_wire() and vm_page_unwire() KPIs are changed. The former
requires that either the object lock or the busy lock is held. The
latter no longer has a return value and may free the page if it releases
the last reference to that page. vm_page_unwire_noq() behaves the same
as before; the caller is responsible for checking its return value and
freeing or enqueuing the page as appropriate. vm_page_wire_mapped() is
introduced for use in pmap_extract_and_hold(). It fails if the page is
concurrently being unmapped, typically triggering a fallback to the
fault handler. vm_page_wire() no longer requires the page lock and
vm_page_unwire() now internally acquires the page lock when releasing
the last wiring of a page (since the page lock still protects a page's
queue state). In particular, synchronization details are no longer
leaked into the caller.
The change excises the page lock from several frequently executed code
paths. In particular, vm_object_terminate() no longer bounces between
page locks as it releases an object's pages, and direct I/O and
sendfile(SF_NOCACHE) completions no longer require the page lock. In
these latter cases we now get linear scalability in the common scenario
where different threads are operating on different files.
__FreeBSD_version is bumped. The DRM ports have been updated to
accomodate the KPI changes.
Reviewed by: jeff (earlier version)
Tested by: gallatin (earlier version), pho
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20486
2019-09-09 21:32:42 +00:00
|
|
|
continue;
|
|
|
|
}
|
2015-06-21 01:22:35 +00:00
|
|
|
}
|
2004-03-04 09:36:46 +00:00
|
|
|
|
Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty
pages, specificially, dirty pages that have passed once through the inactive
queue. A new, dedicated thread is responsible for both deciding when to
launder pages and actually laundering them. The new policy uses the
relative sizes of the inactive and laundry queues to determine whether to
launder pages at a given point in time. In general, this leads to more
intelligent swapping behavior, since the laundry thread will avoid pageouts
when the marginal benefit of doing so is low. Previously, without a
dedicated queue for dirty pages, the page daemon didn't have the information
to determine whether pageout provides any benefit to the system. Thus, the
previous policy often resulted in small but steadily increasing amounts of
swap usage when the system is under memory pressure, even when the inactive
queue consisted mostly of clean pages. This change addresses that issue,
and also paves the way for some future virtual memory system improvements by
removing the last source of object-cached clean pages, i.e., PG_CACHE pages.
The new laundry thread sleeps while waiting for a request from the page
daemon thread(s). A request is raised by setting the variable
vm_laundry_request and waking the laundry thread. We request launderings
for two reasons: to try and balance the inactive and laundry queue sizes
("background laundering"), and to quickly make up for a shortage of free
pages and clean inactive pages ("shortfall laundering"). When background
laundering is requested, the laundry thread computes the number of page
daemon wakeups that have taken place since the last laundering. If this
number is large enough relative to the ratio of the laundry and (global)
inactive queue sizes, we will launder vm_background_launder_target pages at
vm_background_launder_rate KB/s. Otherwise, the laundry thread goes back
to sleep without doing any work. When scanning the laundry queue during
background laundering, reactivated pages are counted towards the laundry
thread's target.
In contrast, shortfall laundering is requested when an inactive queue scan
fails to meet its target. In this case, the laundry thread attempts to
launder enough pages to meet v_free_target within 0.5s, which is the
inactive queue scan period.
A laundry request can be latched while another is currently being
serviced. In particular, a shortfall request will immediately preempt a
background laundering.
This change also redefines the meaning of vm_cnt.v_reactivated and removes
the functions vm_page_cache() and vm_page_try_to_cache(). The new meaning
of vm_cnt.v_reactivated now better reflects its name. It represents the
number of inactive or laundry pages that are returned to the active queue
on account of a reference.
In collaboration with: markj
Reviewed by: kib
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8302
2016-11-09 18:48:37 +00:00
|
|
|
/*
|
|
|
|
* Clean pages can be freed, but dirty pages must be sent back
|
|
|
|
* to the laundry, unless they belong to a dead object.
|
|
|
|
* Requeueing dirty pages from dead objects is pointless, as
|
|
|
|
* they are being paged out and freed by the thread that
|
|
|
|
* destroyed the object.
|
|
|
|
*/
|
2015-06-14 20:23:41 +00:00
|
|
|
if (m->dirty == 0) {
|
2015-10-15 19:07:38 +00:00
|
|
|
free_page:
|
2018-04-24 21:15:54 +00:00
|
|
|
/*
|
|
|
|
* Because we dequeued the page and have already
|
|
|
|
* checked for concurrent dequeue and enqueue
|
|
|
|
* requests, we can safely disassociate the page
|
|
|
|
* from the inactive queue.
|
|
|
|
*/
|
2019-09-16 15:04:45 +00:00
|
|
|
KASSERT((m->aflags & PGA_QUEUE_STATE_MASK) == 0,
|
|
|
|
("page %p has queue state", m));
|
|
|
|
m->queue = PQ_NONE;
|
2015-06-14 05:23:39 +00:00
|
|
|
vm_page_free(m);
|
2018-04-24 21:15:54 +00:00
|
|
|
page_shortage--;
|
Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty
pages, specificially, dirty pages that have passed once through the inactive
queue. A new, dedicated thread is responsible for both deciding when to
launder pages and actually laundering them. The new policy uses the
relative sizes of the inactive and laundry queues to determine whether to
launder pages at a given point in time. In general, this leads to more
intelligent swapping behavior, since the laundry thread will avoid pageouts
when the marginal benefit of doing so is low. Previously, without a
dedicated queue for dirty pages, the page daemon didn't have the information
to determine whether pageout provides any benefit to the system. Thus, the
previous policy often resulted in small but steadily increasing amounts of
swap usage when the system is under memory pressure, even when the inactive
queue consisted mostly of clean pages. This change addresses that issue,
and also paves the way for some future virtual memory system improvements by
removing the last source of object-cached clean pages, i.e., PG_CACHE pages.
The new laundry thread sleeps while waiting for a request from the page
daemon thread(s). A request is raised by setting the variable
vm_laundry_request and waking the laundry thread. We request launderings
for two reasons: to try and balance the inactive and laundry queue sizes
("background laundering"), and to quickly make up for a shortage of free
pages and clean inactive pages ("shortfall laundering"). When background
laundering is requested, the laundry thread computes the number of page
daemon wakeups that have taken place since the last laundering. If this
number is large enough relative to the ratio of the laundry and (global)
inactive queue sizes, we will launder vm_background_launder_target pages at
vm_background_launder_rate KB/s. Otherwise, the laundry thread goes back
to sleep without doing any work. When scanning the laundry queue during
background laundering, reactivated pages are counted towards the laundry
thread's target.
In contrast, shortfall laundering is requested when an inactive queue scan
fails to meet its target. In this case, the laundry thread attempts to
launder enough pages to meet v_free_target within 0.5s, which is the
inactive queue scan period.
A laundry request can be latched while another is currently being
serviced. In particular, a shortfall request will immediately preempt a
background laundering.
This change also redefines the meaning of vm_cnt.v_reactivated and removes
the functions vm_page_cache() and vm_page_try_to_cache(). The new meaning
of vm_cnt.v_reactivated now better reflects its name. It represents the
number of inactive or laundry pages that are returned to the active queue
on account of a reference.
In collaboration with: markj
Reviewed by: kib
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8302
2016-11-09 18:48:37 +00:00
|
|
|
} else if ((object->flags & OBJ_DEAD) == 0)
|
|
|
|
vm_page_launder(m);
|
2018-04-24 21:15:54 +00:00
|
|
|
continue;
|
|
|
|
reinsert:
|
|
|
|
vm_pageout_reinsert_inactive(&ss, &rq, m);
|
|
|
|
}
|
2019-09-16 15:04:45 +00:00
|
|
|
if (mtx != NULL)
|
|
|
|
mtx_unlock(mtx);
|
2019-02-17 16:35:19 +00:00
|
|
|
if (object != NULL)
|
2013-02-20 12:03:20 +00:00
|
|
|
VM_OBJECT_WUNLOCK(object);
|
2018-04-24 21:15:54 +00:00
|
|
|
vm_pageout_reinsert_inactive(&ss, &rq, NULL);
|
|
|
|
vm_pageout_reinsert_inactive(&ss, &ss.bq, NULL);
|
|
|
|
vm_pagequeue_lock(pq);
|
|
|
|
vm_pageout_end_scan(&ss);
|
2012-11-13 02:50:39 +00:00
|
|
|
vm_pagequeue_unlock(pq);
|
1994-05-24 10:09:53 +00:00
|
|
|
|
2018-04-24 21:15:54 +00:00
|
|
|
VM_CNT_ADD(v_dfree, starting_page_shortage - page_shortage);
|
|
|
|
|
Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty
pages, specificially, dirty pages that have passed once through the inactive
queue. A new, dedicated thread is responsible for both deciding when to
launder pages and actually laundering them. The new policy uses the
relative sizes of the inactive and laundry queues to determine whether to
launder pages at a given point in time. In general, this leads to more
intelligent swapping behavior, since the laundry thread will avoid pageouts
when the marginal benefit of doing so is low. Previously, without a
dedicated queue for dirty pages, the page daemon didn't have the information
to determine whether pageout provides any benefit to the system. Thus, the
previous policy often resulted in small but steadily increasing amounts of
swap usage when the system is under memory pressure, even when the inactive
queue consisted mostly of clean pages. This change addresses that issue,
and also paves the way for some future virtual memory system improvements by
removing the last source of object-cached clean pages, i.e., PG_CACHE pages.
The new laundry thread sleeps while waiting for a request from the page
daemon thread(s). A request is raised by setting the variable
vm_laundry_request and waking the laundry thread. We request launderings
for two reasons: to try and balance the inactive and laundry queue sizes
("background laundering"), and to quickly make up for a shortage of free
pages and clean inactive pages ("shortfall laundering"). When background
laundering is requested, the laundry thread computes the number of page
daemon wakeups that have taken place since the last laundering. If this
number is large enough relative to the ratio of the laundry and (global)
inactive queue sizes, we will launder vm_background_launder_target pages at
vm_background_launder_rate KB/s. Otherwise, the laundry thread goes back
to sleep without doing any work. When scanning the laundry queue during
background laundering, reactivated pages are counted towards the laundry
thread's target.
In contrast, shortfall laundering is requested when an inactive queue scan
fails to meet its target. In this case, the laundry thread attempts to
launder enough pages to meet v_free_target within 0.5s, which is the
inactive queue scan period.
A laundry request can be latched while another is currently being
serviced. In particular, a shortfall request will immediately preempt a
background laundering.
This change also redefines the meaning of vm_cnt.v_reactivated and removes
the functions vm_page_cache() and vm_page_try_to_cache(). The new meaning
of vm_cnt.v_reactivated now better reflects its name. It represents the
number of inactive or laundry pages that are returned to the active queue
on account of a reference.
In collaboration with: markj
Reviewed by: kib
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8302
2016-11-09 18:48:37 +00:00
|
|
|
/*
|
|
|
|
* Wake up the laundry thread so that it can perform any needed
|
|
|
|
* laundering. If we didn't meet our target, we're in shortfall and
|
2017-01-03 00:05:44 +00:00
|
|
|
* need to launder more aggressively. If PQ_LAUNDRY is empty and no
|
|
|
|
* swap devices are configured, the laundry thread has no work to do, so
|
|
|
|
* don't bother waking it up.
|
2017-12-11 15:33:24 +00:00
|
|
|
*
|
|
|
|
* The laundry thread uses the number of inactive queue scans elapsed
|
|
|
|
* since the last laundering to determine whether to launder again, so
|
|
|
|
* keep count.
|
Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty
pages, specificially, dirty pages that have passed once through the inactive
queue. A new, dedicated thread is responsible for both deciding when to
launder pages and actually laundering them. The new policy uses the
relative sizes of the inactive and laundry queues to determine whether to
launder pages at a given point in time. In general, this leads to more
intelligent swapping behavior, since the laundry thread will avoid pageouts
when the marginal benefit of doing so is low. Previously, without a
dedicated queue for dirty pages, the page daemon didn't have the information
to determine whether pageout provides any benefit to the system. Thus, the
previous policy often resulted in small but steadily increasing amounts of
swap usage when the system is under memory pressure, even when the inactive
queue consisted mostly of clean pages. This change addresses that issue,
and also paves the way for some future virtual memory system improvements by
removing the last source of object-cached clean pages, i.e., PG_CACHE pages.
The new laundry thread sleeps while waiting for a request from the page
daemon thread(s). A request is raised by setting the variable
vm_laundry_request and waking the laundry thread. We request launderings
for two reasons: to try and balance the inactive and laundry queue sizes
("background laundering"), and to quickly make up for a shortage of free
pages and clean inactive pages ("shortfall laundering"). When background
laundering is requested, the laundry thread computes the number of page
daemon wakeups that have taken place since the last laundering. If this
number is large enough relative to the ratio of the laundry and (global)
inactive queue sizes, we will launder vm_background_launder_target pages at
vm_background_launder_rate KB/s. Otherwise, the laundry thread goes back
to sleep without doing any work. When scanning the laundry queue during
background laundering, reactivated pages are counted towards the laundry
thread's target.
In contrast, shortfall laundering is requested when an inactive queue scan
fails to meet its target. In this case, the laundry thread attempts to
launder enough pages to meet v_free_target within 0.5s, which is the
inactive queue scan period.
A laundry request can be latched while another is currently being
serviced. In particular, a shortfall request will immediately preempt a
background laundering.
This change also redefines the meaning of vm_cnt.v_reactivated and removes
the functions vm_page_cache() and vm_page_try_to_cache(). The new meaning
of vm_cnt.v_reactivated now better reflects its name. It represents the
number of inactive or laundry pages that are returned to the active queue
on account of a reference.
In collaboration with: markj
Reviewed by: kib
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8302
2016-11-09 18:48:37 +00:00
|
|
|
*/
|
2017-12-11 15:33:24 +00:00
|
|
|
if (starting_page_shortage > 0) {
|
2018-02-06 22:10:07 +00:00
|
|
|
pq = &vmd->vmd_pagequeues[PQ_LAUNDRY];
|
Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty
pages, specificially, dirty pages that have passed once through the inactive
queue. A new, dedicated thread is responsible for both deciding when to
launder pages and actually laundering them. The new policy uses the
relative sizes of the inactive and laundry queues to determine whether to
launder pages at a given point in time. In general, this leads to more
intelligent swapping behavior, since the laundry thread will avoid pageouts
when the marginal benefit of doing so is low. Previously, without a
dedicated queue for dirty pages, the page daemon didn't have the information
to determine whether pageout provides any benefit to the system. Thus, the
previous policy often resulted in small but steadily increasing amounts of
swap usage when the system is under memory pressure, even when the inactive
queue consisted mostly of clean pages. This change addresses that issue,
and also paves the way for some future virtual memory system improvements by
removing the last source of object-cached clean pages, i.e., PG_CACHE pages.
The new laundry thread sleeps while waiting for a request from the page
daemon thread(s). A request is raised by setting the variable
vm_laundry_request and waking the laundry thread. We request launderings
for two reasons: to try and balance the inactive and laundry queue sizes
("background laundering"), and to quickly make up for a shortage of free
pages and clean inactive pages ("shortfall laundering"). When background
laundering is requested, the laundry thread computes the number of page
daemon wakeups that have taken place since the last laundering. If this
number is large enough relative to the ratio of the laundry and (global)
inactive queue sizes, we will launder vm_background_launder_target pages at
vm_background_launder_rate KB/s. Otherwise, the laundry thread goes back
to sleep without doing any work. When scanning the laundry queue during
background laundering, reactivated pages are counted towards the laundry
thread's target.
In contrast, shortfall laundering is requested when an inactive queue scan
fails to meet its target. In this case, the laundry thread attempts to
launder enough pages to meet v_free_target within 0.5s, which is the
inactive queue scan period.
A laundry request can be latched while another is currently being
serviced. In particular, a shortfall request will immediately preempt a
background laundering.
This change also redefines the meaning of vm_cnt.v_reactivated and removes
the functions vm_page_cache() and vm_page_try_to_cache(). The new meaning
of vm_cnt.v_reactivated now better reflects its name. It represents the
number of inactive or laundry pages that are returned to the active queue
on account of a reference.
In collaboration with: markj
Reviewed by: kib
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8302
2016-11-09 18:48:37 +00:00
|
|
|
vm_pagequeue_lock(pq);
|
2018-02-06 22:10:07 +00:00
|
|
|
if (vmd->vmd_laundry_request == VM_LAUNDRY_IDLE &&
|
2017-12-11 15:33:24 +00:00
|
|
|
(pq->pq_cnt > 0 || atomic_load_acq_int(&swapdev_enabled))) {
|
2017-01-03 00:05:44 +00:00
|
|
|
if (page_shortage > 0) {
|
2018-02-06 22:10:07 +00:00
|
|
|
vmd->vmd_laundry_request = VM_LAUNDRY_SHORTFALL;
|
- Remove 'struct vmmeter' from 'struct pcpu', leaving only global vmmeter
in place. To do per-cpu stats, convert all fields that previously were
maintained in the vmmeters that sit in pcpus to counter(9).
- Since some vmmeter stats may be touched at very early stages of boot,
before we have set up UMA and we can do counter_u64_alloc(), provide an
early counter mechanism:
o Leave one spare uint64_t in struct pcpu, named pc_early_dummy_counter.
o Point counter(9) fields of vmmeter to pcpu[0].pc_early_dummy_counter,
so that at early stages of boot, before counters are allocated we already
point to a counter that can be safely written to.
o For sparc64 that required a whole dummy pcpu[MAXCPU] array.
Further related changes:
- Don't include vmmeter.h into pcpu.h.
- vm.stats.vm.v_swappgsout and vm.stats.vm.v_swappgsin changed to 64-bit,
to match kernel representation.
- struct vmmeter hidden under _KERNEL, and only vmstat(1) is an exclusion.
This is based on benno@'s 4-year old patch:
https://lists.freebsd.org/pipermail/freebsd-arch/2013-July/014471.html
Reviewed by: kib, gallatin, marius, lidl
Differential Revision: https://reviews.freebsd.org/D10156
2017-04-17 17:34:47 +00:00
|
|
|
VM_CNT_INC(v_pdshortfalls);
|
2018-02-06 22:10:07 +00:00
|
|
|
} else if (vmd->vmd_laundry_request !=
|
|
|
|
VM_LAUNDRY_SHORTFALL)
|
|
|
|
vmd->vmd_laundry_request =
|
|
|
|
VM_LAUNDRY_BACKGROUND;
|
|
|
|
wakeup(&vmd->vmd_laundry_request);
|
2017-01-03 00:05:44 +00:00
|
|
|
}
|
2018-03-29 14:27:40 +00:00
|
|
|
vmd->vmd_clean_pages_freed +=
|
|
|
|
starting_page_shortage - page_shortage;
|
Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty
pages, specificially, dirty pages that have passed once through the inactive
queue. A new, dedicated thread is responsible for both deciding when to
launder pages and actually laundering them. The new policy uses the
relative sizes of the inactive and laundry queues to determine whether to
launder pages at a given point in time. In general, this leads to more
intelligent swapping behavior, since the laundry thread will avoid pageouts
when the marginal benefit of doing so is low. Previously, without a
dedicated queue for dirty pages, the page daemon didn't have the information
to determine whether pageout provides any benefit to the system. Thus, the
previous policy often resulted in small but steadily increasing amounts of
swap usage when the system is under memory pressure, even when the inactive
queue consisted mostly of clean pages. This change addresses that issue,
and also paves the way for some future virtual memory system improvements by
removing the last source of object-cached clean pages, i.e., PG_CACHE pages.
The new laundry thread sleeps while waiting for a request from the page
daemon thread(s). A request is raised by setting the variable
vm_laundry_request and waking the laundry thread. We request launderings
for two reasons: to try and balance the inactive and laundry queue sizes
("background laundering"), and to quickly make up for a shortage of free
pages and clean inactive pages ("shortfall laundering"). When background
laundering is requested, the laundry thread computes the number of page
daemon wakeups that have taken place since the last laundering. If this
number is large enough relative to the ratio of the laundry and (global)
inactive queue sizes, we will launder vm_background_launder_target pages at
vm_background_launder_rate KB/s. Otherwise, the laundry thread goes back
to sleep without doing any work. When scanning the laundry queue during
background laundering, reactivated pages are counted towards the laundry
thread's target.
In contrast, shortfall laundering is requested when an inactive queue scan
fails to meet its target. In this case, the laundry thread attempts to
launder enough pages to meet v_free_target within 0.5s, which is the
inactive queue scan period.
A laundry request can be latched while another is currently being
serviced. In particular, a shortfall request will immediately preempt a
background laundering.
This change also redefines the meaning of vm_cnt.v_reactivated and removes
the functions vm_page_cache() and vm_page_try_to_cache(). The new meaning
of vm_cnt.v_reactivated now better reflects its name. It represents the
number of inactive or laundry pages that are returned to the active queue
on account of a reference.
In collaboration with: markj
Reviewed by: kib
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8302
2016-11-09 18:48:37 +00:00
|
|
|
vm_pagequeue_unlock(pq);
|
|
|
|
}
|
|
|
|
|
2014-08-26 16:40:20 +00:00
|
|
|
/*
|
2016-07-28 22:30:48 +00:00
|
|
|
* Wakeup the swapout daemon if we didn't free the targeted number of
|
|
|
|
* pages.
|
2014-08-26 16:40:20 +00:00
|
|
|
*/
|
2017-10-20 09:10:49 +00:00
|
|
|
if (page_shortage > 0)
|
|
|
|
vm_swapout_run();
|
2014-08-26 16:40:20 +00:00
|
|
|
|
Rework the test which raises OOM condition. Right now, the code
checks for the swap space consumption plus checks that the amount of
the free pages exceeds some limit, in case pagedeamon did not coped
with the page shortage in one of the late passes. This is wrong
because it does not account for the presence of the reclamaible pages
in the queues which are not selectable for reclaim immediately. E.g.,
on the swap-less systems, large active queue easily triggered OOM.
Instead, only raise OOM when pagedaemon is unable to produce a free
page in several back-to-back passes. Track the failed passes per
pagedaemon thread.
The number of passes to trigger OOM was selected empirically and
tested both on small (32M-64M i386 VM) and large (32G amd64)
configurations. If the specifics of the load require tuning, sysctl
vm.pageout_oom_seq sets the number of back-to-back passes which must
fail before OOM is raised. Each pass takes 1/2 of seconds. Less the
value, more sensible the pagedaemon is to the page shortage.
In future, some heuristic to calculate the value of the tunable might
be designed based on the system configuration and load. But before it
can be done, the i/o system must be fixed to reliably time-out
pagedaemon writes, even if waiting for the memory to proceed. Then,
code can account for the in-flight page-outs and postpone OOM until
all of them finished, which should reduce the need in tuning. Right
now, ignoring the in-flight writes and the counter allows to break
deadlocks due to write path doing sleepable memory allocations.
Reported by: Dmitry Sivachenko, bde, many others
Tested by: pho, bde, tuexen (arm)
Reviewed by: alc
Discussed with: bde, imp
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
2015-11-16 06:26:26 +00:00
|
|
|
/*
|
|
|
|
* If the inactive queue scan fails repeatedly to meet its
|
|
|
|
* target, kill the largest process.
|
|
|
|
*/
|
|
|
|
vm_pageout_mightbe_oom(vmd, page_shortage, starting_page_shortage);
|
|
|
|
|
1999-01-21 08:29:12 +00:00
|
|
|
/*
|
2018-05-24 14:16:22 +00:00
|
|
|
* Reclaim pages by swapping out idle processes, if configured to do so.
|
2013-08-13 21:56:16 +00:00
|
|
|
*/
|
2018-06-02 00:01:07 +00:00
|
|
|
vm_swapout_run_idle();
|
1999-01-21 08:29:12 +00:00
|
|
|
|
|
|
|
/*
|
2018-05-24 14:16:22 +00:00
|
|
|
* See the description of addl_page_shortage above.
|
1999-01-21 08:29:12 +00:00
|
|
|
*/
|
2018-05-24 14:16:22 +00:00
|
|
|
*addl_shortage = addl_page_shortage + deficit;
|
1996-05-18 03:38:05 +00:00
|
|
|
|
2016-10-05 16:15:26 +00:00
|
|
|
return (page_shortage <= 0);
|
2008-09-29 19:45:12 +00:00
|
|
|
}
|
|
|
|
|
Split the pagequeues per NUMA domains, and split pageademon process
into threads each processing queue in a single domain. The structure
of the pagedaemons and queues is kept intact, most of the changes come
from the need for code to find an owning page queue for given page,
calculated from the segment containing the page.
The tie between NUMA domain and pagedaemon thread/pagequeue split is
rather arbitrary, the multithreaded daemon could be allowed for the
single-domain machines, or one domain might be split into several page
domains, to further increase concurrency.
Right now, each pagedaemon thread tries to reach the global target,
precalculated at the start of the pass. This is not optimal, since it
could cause excessive page deactivation and freeing. The code should
be changed to re-check the global page deficit state in the loop after
some number of iterations.
The pagedaemons reach the quorum before starting the OOM, since one
thread inability to meet the target is normal for split queues. Only
when all pagedaemons fail to produce enough reusable pages, OOM is
started by single selected thread.
Launder is modified to take into account the segments layout with
regard to the region for which cleaning is performed.
Based on the preliminary patch by jeff, sponsored by EMC / Isilon
Storage Division.
Reviewed by: alc
Tested by: pho
Sponsored by: The FreeBSD Foundation
2013-08-07 16:36:38 +00:00
|
|
|
static int vm_pageout_oom_vote;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The pagedaemon threads randlomly select one to perform the
|
|
|
|
* OOM. Trying to kill processes before all pagedaemons
|
|
|
|
* failed to reach free target is premature.
|
|
|
|
*/
|
|
|
|
static void
|
Rework the test which raises OOM condition. Right now, the code
checks for the swap space consumption plus checks that the amount of
the free pages exceeds some limit, in case pagedeamon did not coped
with the page shortage in one of the late passes. This is wrong
because it does not account for the presence of the reclamaible pages
in the queues which are not selectable for reclaim immediately. E.g.,
on the swap-less systems, large active queue easily triggered OOM.
Instead, only raise OOM when pagedaemon is unable to produce a free
page in several back-to-back passes. Track the failed passes per
pagedaemon thread.
The number of passes to trigger OOM was selected empirically and
tested both on small (32M-64M i386 VM) and large (32G amd64)
configurations. If the specifics of the load require tuning, sysctl
vm.pageout_oom_seq sets the number of back-to-back passes which must
fail before OOM is raised. Each pass takes 1/2 of seconds. Less the
value, more sensible the pagedaemon is to the page shortage.
In future, some heuristic to calculate the value of the tunable might
be designed based on the system configuration and load. But before it
can be done, the i/o system must be fixed to reliably time-out
pagedaemon writes, even if waiting for the memory to proceed. Then,
code can account for the in-flight page-outs and postpone OOM until
all of them finished, which should reduce the need in tuning. Right
now, ignoring the in-flight writes and the counter allows to break
deadlocks due to write path doing sleepable memory allocations.
Reported by: Dmitry Sivachenko, bde, many others
Tested by: pho, bde, tuexen (arm)
Reviewed by: alc
Discussed with: bde, imp
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
2015-11-16 06:26:26 +00:00
|
|
|
vm_pageout_mightbe_oom(struct vm_domain *vmd, int page_shortage,
|
|
|
|
int starting_page_shortage)
|
Split the pagequeues per NUMA domains, and split pageademon process
into threads each processing queue in a single domain. The structure
of the pagedaemons and queues is kept intact, most of the changes come
from the need for code to find an owning page queue for given page,
calculated from the segment containing the page.
The tie between NUMA domain and pagedaemon thread/pagequeue split is
rather arbitrary, the multithreaded daemon could be allowed for the
single-domain machines, or one domain might be split into several page
domains, to further increase concurrency.
Right now, each pagedaemon thread tries to reach the global target,
precalculated at the start of the pass. This is not optimal, since it
could cause excessive page deactivation and freeing. The code should
be changed to re-check the global page deficit state in the loop after
some number of iterations.
The pagedaemons reach the quorum before starting the OOM, since one
thread inability to meet the target is normal for split queues. Only
when all pagedaemons fail to produce enough reusable pages, OOM is
started by single selected thread.
Launder is modified to take into account the segments layout with
regard to the region for which cleaning is performed.
Based on the preliminary patch by jeff, sponsored by EMC / Isilon
Storage Division.
Reviewed by: alc
Tested by: pho
Sponsored by: The FreeBSD Foundation
2013-08-07 16:36:38 +00:00
|
|
|
{
|
|
|
|
int old_vote;
|
|
|
|
|
Rework the test which raises OOM condition. Right now, the code
checks for the swap space consumption plus checks that the amount of
the free pages exceeds some limit, in case pagedeamon did not coped
with the page shortage in one of the late passes. This is wrong
because it does not account for the presence of the reclamaible pages
in the queues which are not selectable for reclaim immediately. E.g.,
on the swap-less systems, large active queue easily triggered OOM.
Instead, only raise OOM when pagedaemon is unable to produce a free
page in several back-to-back passes. Track the failed passes per
pagedaemon thread.
The number of passes to trigger OOM was selected empirically and
tested both on small (32M-64M i386 VM) and large (32G amd64)
configurations. If the specifics of the load require tuning, sysctl
vm.pageout_oom_seq sets the number of back-to-back passes which must
fail before OOM is raised. Each pass takes 1/2 of seconds. Less the
value, more sensible the pagedaemon is to the page shortage.
In future, some heuristic to calculate the value of the tunable might
be designed based on the system configuration and load. But before it
can be done, the i/o system must be fixed to reliably time-out
pagedaemon writes, even if waiting for the memory to proceed. Then,
code can account for the in-flight page-outs and postpone OOM until
all of them finished, which should reduce the need in tuning. Right
now, ignoring the in-flight writes and the counter allows to break
deadlocks due to write path doing sleepable memory allocations.
Reported by: Dmitry Sivachenko, bde, many others
Tested by: pho, bde, tuexen (arm)
Reviewed by: alc
Discussed with: bde, imp
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
2015-11-16 06:26:26 +00:00
|
|
|
if (starting_page_shortage <= 0 || starting_page_shortage !=
|
|
|
|
page_shortage)
|
|
|
|
vmd->vmd_oom_seq = 0;
|
|
|
|
else
|
|
|
|
vmd->vmd_oom_seq++;
|
|
|
|
if (vmd->vmd_oom_seq < vm_pageout_oom_seq) {
|
Split the pagequeues per NUMA domains, and split pageademon process
into threads each processing queue in a single domain. The structure
of the pagedaemons and queues is kept intact, most of the changes come
from the need for code to find an owning page queue for given page,
calculated from the segment containing the page.
The tie between NUMA domain and pagedaemon thread/pagequeue split is
rather arbitrary, the multithreaded daemon could be allowed for the
single-domain machines, or one domain might be split into several page
domains, to further increase concurrency.
Right now, each pagedaemon thread tries to reach the global target,
precalculated at the start of the pass. This is not optimal, since it
could cause excessive page deactivation and freeing. The code should
be changed to re-check the global page deficit state in the loop after
some number of iterations.
The pagedaemons reach the quorum before starting the OOM, since one
thread inability to meet the target is normal for split queues. Only
when all pagedaemons fail to produce enough reusable pages, OOM is
started by single selected thread.
Launder is modified to take into account the segments layout with
regard to the region for which cleaning is performed.
Based on the preliminary patch by jeff, sponsored by EMC / Isilon
Storage Division.
Reviewed by: alc
Tested by: pho
Sponsored by: The FreeBSD Foundation
2013-08-07 16:36:38 +00:00
|
|
|
if (vmd->vmd_oom) {
|
|
|
|
vmd->vmd_oom = FALSE;
|
|
|
|
atomic_subtract_int(&vm_pageout_oom_vote, 1);
|
|
|
|
}
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
Rework the test which raises OOM condition. Right now, the code
checks for the swap space consumption plus checks that the amount of
the free pages exceeds some limit, in case pagedeamon did not coped
with the page shortage in one of the late passes. This is wrong
because it does not account for the presence of the reclamaible pages
in the queues which are not selectable for reclaim immediately. E.g.,
on the swap-less systems, large active queue easily triggered OOM.
Instead, only raise OOM when pagedaemon is unable to produce a free
page in several back-to-back passes. Track the failed passes per
pagedaemon thread.
The number of passes to trigger OOM was selected empirically and
tested both on small (32M-64M i386 VM) and large (32G amd64)
configurations. If the specifics of the load require tuning, sysctl
vm.pageout_oom_seq sets the number of back-to-back passes which must
fail before OOM is raised. Each pass takes 1/2 of seconds. Less the
value, more sensible the pagedaemon is to the page shortage.
In future, some heuristic to calculate the value of the tunable might
be designed based on the system configuration and load. But before it
can be done, the i/o system must be fixed to reliably time-out
pagedaemon writes, even if waiting for the memory to proceed. Then,
code can account for the in-flight page-outs and postpone OOM until
all of them finished, which should reduce the need in tuning. Right
now, ignoring the in-flight writes and the counter allows to break
deadlocks due to write path doing sleepable memory allocations.
Reported by: Dmitry Sivachenko, bde, many others
Tested by: pho, bde, tuexen (arm)
Reviewed by: alc
Discussed with: bde, imp
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
2015-11-16 06:26:26 +00:00
|
|
|
/*
|
|
|
|
* Do not follow the call sequence until OOM condition is
|
|
|
|
* cleared.
|
|
|
|
*/
|
|
|
|
vmd->vmd_oom_seq = 0;
|
|
|
|
|
Split the pagequeues per NUMA domains, and split pageademon process
into threads each processing queue in a single domain. The structure
of the pagedaemons and queues is kept intact, most of the changes come
from the need for code to find an owning page queue for given page,
calculated from the segment containing the page.
The tie between NUMA domain and pagedaemon thread/pagequeue split is
rather arbitrary, the multithreaded daemon could be allowed for the
single-domain machines, or one domain might be split into several page
domains, to further increase concurrency.
Right now, each pagedaemon thread tries to reach the global target,
precalculated at the start of the pass. This is not optimal, since it
could cause excessive page deactivation and freeing. The code should
be changed to re-check the global page deficit state in the loop after
some number of iterations.
The pagedaemons reach the quorum before starting the OOM, since one
thread inability to meet the target is normal for split queues. Only
when all pagedaemons fail to produce enough reusable pages, OOM is
started by single selected thread.
Launder is modified to take into account the segments layout with
regard to the region for which cleaning is performed.
Based on the preliminary patch by jeff, sponsored by EMC / Isilon
Storage Division.
Reviewed by: alc
Tested by: pho
Sponsored by: The FreeBSD Foundation
2013-08-07 16:36:38 +00:00
|
|
|
if (vmd->vmd_oom)
|
|
|
|
return;
|
|
|
|
|
|
|
|
vmd->vmd_oom = TRUE;
|
|
|
|
old_vote = atomic_fetchadd_int(&vm_pageout_oom_vote, 1);
|
|
|
|
if (old_vote != vm_ndomains - 1)
|
|
|
|
return;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The current pagedaemon thread is the last in the quorum to
|
|
|
|
* start OOM. Initiate the selection and signaling of the
|
|
|
|
* victim.
|
|
|
|
*/
|
|
|
|
vm_pageout_oom(VM_OOM_MEM);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* After one round of OOM terror, recall our vote. On the
|
|
|
|
* next pass, current pagedaemon would vote again if the low
|
|
|
|
* memory condition is still there, due to vmd_oom being
|
|
|
|
* false.
|
|
|
|
*/
|
|
|
|
vmd->vmd_oom = FALSE;
|
|
|
|
atomic_subtract_int(&vm_pageout_oom_vote, 1);
|
|
|
|
}
|
2008-09-29 19:45:12 +00:00
|
|
|
|
2015-11-16 06:02:11 +00:00
|
|
|
/*
|
|
|
|
* The OOM killer is the page daemon's action of last resort when
|
|
|
|
* memory allocation requests have been stalled for a prolonged period
|
|
|
|
* of time because it cannot reclaim memory. This function computes
|
|
|
|
* the approximate number of physical pages that could be reclaimed if
|
|
|
|
* the specified address space is destroyed.
|
|
|
|
*
|
|
|
|
* Private, anonymous memory owned by the address space is the
|
|
|
|
* principal resource that we expect to recover after an OOM kill.
|
|
|
|
* Since the physical pages mapped by the address space's COW entries
|
|
|
|
* are typically shared pages, they are unlikely to be released and so
|
|
|
|
* they are not counted.
|
|
|
|
*
|
|
|
|
* To get to the point where the page daemon runs the OOM killer, its
|
|
|
|
* efforts to write-back vnode-backed pages may have stalled. This
|
|
|
|
* could be caused by a memory allocation deadlock in the write path
|
|
|
|
* that might be resolved by an OOM kill. Therefore, physical pages
|
|
|
|
* belonging to vnode-backed objects are counted, because they might
|
|
|
|
* be freed without being written out first if the address space holds
|
|
|
|
* the last reference to an unlinked vnode.
|
|
|
|
*
|
|
|
|
* Similarly, physical pages belonging to OBJT_PHYS objects are
|
|
|
|
* counted because the address space might hold the last reference to
|
|
|
|
* the object.
|
|
|
|
*/
|
|
|
|
static long
|
|
|
|
vm_pageout_oom_pagecount(struct vmspace *vmspace)
|
|
|
|
{
|
|
|
|
vm_map_t map;
|
|
|
|
vm_map_entry_t entry;
|
|
|
|
vm_object_t obj;
|
|
|
|
long res;
|
|
|
|
|
|
|
|
map = &vmspace->vm_map;
|
|
|
|
KASSERT(!map->system_map, ("system map"));
|
|
|
|
sx_assert(&map->lock, SA_LOCKED);
|
|
|
|
res = 0;
|
|
|
|
for (entry = map->header.next; entry != &map->header;
|
|
|
|
entry = entry->next) {
|
|
|
|
if ((entry->eflags & MAP_ENTRY_IS_SUB_MAP) != 0)
|
|
|
|
continue;
|
|
|
|
obj = entry->object.vm_object;
|
|
|
|
if (obj == NULL)
|
|
|
|
continue;
|
|
|
|
if ((entry->eflags & MAP_ENTRY_NEEDS_COPY) != 0 &&
|
|
|
|
obj->ref_count != 1)
|
|
|
|
continue;
|
|
|
|
switch (obj->type) {
|
|
|
|
case OBJT_DEFAULT:
|
|
|
|
case OBJT_SWAP:
|
|
|
|
case OBJT_PHYS:
|
|
|
|
case OBJT_VNODE:
|
|
|
|
res += obj->resident_page_count;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
return (res);
|
|
|
|
}
|
|
|
|
|
2019-08-16 09:43:49 +00:00
|
|
|
static int vm_oom_ratelim_last;
|
|
|
|
static int vm_oom_pf_secs = 10;
|
|
|
|
SYSCTL_INT(_vm, OID_AUTO, oom_pf_secs, CTLFLAG_RWTUN, &vm_oom_pf_secs, 0,
|
|
|
|
"");
|
|
|
|
static struct mtx vm_oom_ratelim_mtx;
|
|
|
|
|
2008-09-29 19:45:12 +00:00
|
|
|
void
|
|
|
|
vm_pageout_oom(int shortage)
|
|
|
|
{
|
|
|
|
struct proc *p, *bigproc;
|
|
|
|
vm_offset_t size, bigsize;
|
|
|
|
struct thread *td;
|
2009-04-19 20:53:47 +00:00
|
|
|
struct vmspace *vm;
|
2019-08-16 09:43:49 +00:00
|
|
|
int now;
|
2017-06-05 18:07:56 +00:00
|
|
|
bool breakout;
|
2008-09-29 19:45:12 +00:00
|
|
|
|
2019-08-16 09:43:49 +00:00
|
|
|
/*
|
|
|
|
* For OOM requests originating from vm_fault(), there is a high
|
|
|
|
* chance that a single large process faults simultaneously in
|
|
|
|
* several threads. Also, on an active system running many
|
|
|
|
* processes of middle-size, like buildworld, all of them
|
|
|
|
* could fault almost simultaneously as well.
|
|
|
|
*
|
|
|
|
* To avoid killing too many processes, rate-limit OOMs
|
|
|
|
* initiated by vm_fault() time-outs on the waits for free
|
|
|
|
* pages.
|
|
|
|
*/
|
|
|
|
mtx_lock(&vm_oom_ratelim_mtx);
|
|
|
|
now = ticks;
|
|
|
|
if (shortage == VM_OOM_MEM_PF &&
|
|
|
|
(u_int)(now - vm_oom_ratelim_last) < hz * vm_oom_pf_secs) {
|
|
|
|
mtx_unlock(&vm_oom_ratelim_mtx);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
vm_oom_ratelim_last = now;
|
|
|
|
mtx_unlock(&vm_oom_ratelim_mtx);
|
|
|
|
|
2008-09-29 19:45:12 +00:00
|
|
|
/*
|
2001-05-17 22:49:03 +00:00
|
|
|
* We keep the process bigproc locked once we find it to keep anyone
|
|
|
|
* from messing with it; however, there is a possibility of
|
2016-11-08 23:59:41 +00:00
|
|
|
* deadlock if process B is bigproc and one of its child processes
|
2001-05-17 22:49:03 +00:00
|
|
|
* attempts to propagate a signal to B while we are waiting for A's
|
|
|
|
* lock while walking this list. To avoid this, we don't block on
|
|
|
|
* the process lock but just skip a process if it is already locked.
|
1994-10-22 02:18:03 +00:00
|
|
|
*/
|
2008-09-29 19:45:12 +00:00
|
|
|
bigproc = NULL;
|
|
|
|
bigsize = 0;
|
|
|
|
sx_slock(&allproc_lock);
|
|
|
|
FOREACH_PROC_IN_SYSTEM(p) {
|
2015-01-24 15:33:42 +00:00
|
|
|
PROC_LOCK(p);
|
|
|
|
|
2008-09-29 19:45:12 +00:00
|
|
|
/*
|
2010-04-06 10:43:01 +00:00
|
|
|
* If this is a system, protected or killed process, skip it.
|
2008-09-29 19:45:12 +00:00
|
|
|
*/
|
2015-01-24 15:33:42 +00:00
|
|
|
if (p->p_state != PRS_NORMAL || (p->p_flag & (P_INEXEC |
|
|
|
|
P_PROTECTED | P_SYSTEM | P_WEXIT)) != 0 ||
|
|
|
|
p->p_pid == 1 || P_KILLED(p) ||
|
|
|
|
(p->p_pid < 48 && swap_pager_avail != 0)) {
|
2008-09-29 19:45:12 +00:00
|
|
|
PROC_UNLOCK(p);
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
/*
|
|
|
|
* If the process is in a non-running type state,
|
|
|
|
* don't touch it. Check all the threads individually.
|
|
|
|
*/
|
2017-06-05 18:07:56 +00:00
|
|
|
breakout = false;
|
2008-09-29 19:45:12 +00:00
|
|
|
FOREACH_THREAD_IN_PROC(p, td) {
|
|
|
|
thread_lock(td);
|
|
|
|
if (!TD_ON_RUNQ(td) &&
|
|
|
|
!TD_IS_RUNNING(td) &&
|
2011-04-06 16:27:04 +00:00
|
|
|
!TD_IS_SLEEPING(td) &&
|
2015-11-16 05:52:04 +00:00
|
|
|
!TD_IS_SUSPENDED(td) &&
|
|
|
|
!TD_IS_SWAPPED(td)) {
|
Commit 14/14 of sched_lock decomposition.
- Use thread_lock() rather than sched_lock for per-thread scheduling
sychronization.
- Use the per-process spinlock rather than the sched_lock for per-process
scheduling synchronization.
Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
2007-06-05 00:00:57 +00:00
|
|
|
thread_unlock(td);
|
2017-06-05 18:07:56 +00:00
|
|
|
breakout = true;
|
2008-09-29 19:45:12 +00:00
|
|
|
break;
|
Part 1 of KSE-III
The ability to schedule multiple threads per process
(one one cpu) by making ALL system calls optionally asynchronous.
to come: ia64 and power-pc patches, patches for gdb, test program (in tools)
Reviewed by: Almost everyone who counts
(at various times, peter, jhb, matt, alfred, mini, bernd,
and a cast of thousands)
NOTE: this is still Beta code, and contains lots of debugging stuff.
expect slight instability in signals..
2002-06-29 17:26:22 +00:00
|
|
|
}
|
2008-09-29 19:45:12 +00:00
|
|
|
thread_unlock(td);
|
1994-10-22 02:18:03 +00:00
|
|
|
}
|
2008-09-29 19:45:12 +00:00
|
|
|
if (breakout) {
|
|
|
|
PROC_UNLOCK(p);
|
|
|
|
continue;
|
1994-10-22 02:18:03 +00:00
|
|
|
}
|
2008-09-29 19:45:12 +00:00
|
|
|
/*
|
|
|
|
* get the process size
|
|
|
|
*/
|
2009-04-19 20:53:47 +00:00
|
|
|
vm = vmspace_acquire_ref(p);
|
|
|
|
if (vm == NULL) {
|
|
|
|
PROC_UNLOCK(p);
|
|
|
|
continue;
|
|
|
|
}
|
Fix a LOR between vnode locks and allproc_lock.
There is an order between covered vnode lock and allproc_lock, which
is established by calling mountcheckdirs() while owning the covered
vnode lock. mountcheckdirs() iterates over the processes, protected by
allproc_lock. This order is needed and seems to be not avoidable.
On the other hand, various VM daemons also need to iterate over all
processes, and they lock and unlock user maps. Since unlock of the
user map may trigger processing of the deferred map entries, it causes
vnode locking to occur. Or, when vmspace is freed, dropping references
on the vnode-backed object also lock vnodes. We get reverted order
comparing with the mount/unmount order.
For VM daemons, there is no need to own allproc_lock while we operate
on vmspaces. If the process is held, it serves as the marker for
allproc list, which allows to continue the iteration.
Add _PHOLD_LITE() macro, similar to _PHOLD(), but not causing swap-in
of the kernel stacks. It is used instead of _PHOLD() in vm code,
since e.g. calling faultin() in OOM conditions only exaggerates the
problem.
Modernize comment describing PHOLD.
Reported by: lists@yamagi.org
Tested by: pho (previous version)
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 3 week
Approved by: re (gjb)
Differential revision: https://reviews.freebsd.org/D6679
2016-06-22 20:15:37 +00:00
|
|
|
_PHOLD_LITE(p);
|
|
|
|
PROC_UNLOCK(p);
|
|
|
|
sx_sunlock(&allproc_lock);
|
2009-04-19 20:53:47 +00:00
|
|
|
if (!vm_map_trylock_read(&vm->vm_map)) {
|
2015-01-24 15:33:42 +00:00
|
|
|
vmspace_free(vm);
|
Fix a LOR between vnode locks and allproc_lock.
There is an order between covered vnode lock and allproc_lock, which
is established by calling mountcheckdirs() while owning the covered
vnode lock. mountcheckdirs() iterates over the processes, protected by
allproc_lock. This order is needed and seems to be not avoidable.
On the other hand, various VM daemons also need to iterate over all
processes, and they lock and unlock user maps. Since unlock of the
user map may trigger processing of the deferred map entries, it causes
vnode locking to occur. Or, when vmspace is freed, dropping references
on the vnode-backed object also lock vnodes. We get reverted order
comparing with the mount/unmount order.
For VM daemons, there is no need to own allproc_lock while we operate
on vmspaces. If the process is held, it serves as the marker for
allproc list, which allows to continue the iteration.
Add _PHOLD_LITE() macro, similar to _PHOLD(), but not causing swap-in
of the kernel stacks. It is used instead of _PHOLD() in vm code,
since e.g. calling faultin() in OOM conditions only exaggerates the
problem.
Modernize comment describing PHOLD.
Reported by: lists@yamagi.org
Tested by: pho (previous version)
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 3 week
Approved by: re (gjb)
Differential revision: https://reviews.freebsd.org/D6679
2016-06-22 20:15:37 +00:00
|
|
|
sx_slock(&allproc_lock);
|
|
|
|
PRELE(p);
|
2008-09-29 19:45:12 +00:00
|
|
|
continue;
|
|
|
|
}
|
2009-04-28 11:45:36 +00:00
|
|
|
size = vmspace_swap_count(vm);
|
2019-08-16 09:43:49 +00:00
|
|
|
if (shortage == VM_OOM_MEM || shortage == VM_OOM_MEM_PF)
|
2015-11-16 06:02:11 +00:00
|
|
|
size += vm_pageout_oom_pagecount(vm);
|
|
|
|
vm_map_unlock_read(&vm->vm_map);
|
2009-04-19 20:53:47 +00:00
|
|
|
vmspace_free(vm);
|
Fix a LOR between vnode locks and allproc_lock.
There is an order between covered vnode lock and allproc_lock, which
is established by calling mountcheckdirs() while owning the covered
vnode lock. mountcheckdirs() iterates over the processes, protected by
allproc_lock. This order is needed and seems to be not avoidable.
On the other hand, various VM daemons also need to iterate over all
processes, and they lock and unlock user maps. Since unlock of the
user map may trigger processing of the deferred map entries, it causes
vnode locking to occur. Or, when vmspace is freed, dropping references
on the vnode-backed object also lock vnodes. We get reverted order
comparing with the mount/unmount order.
For VM daemons, there is no need to own allproc_lock while we operate
on vmspaces. If the process is held, it serves as the marker for
allproc list, which allows to continue the iteration.
Add _PHOLD_LITE() macro, similar to _PHOLD(), but not causing swap-in
of the kernel stacks. It is used instead of _PHOLD() in vm code,
since e.g. calling faultin() in OOM conditions only exaggerates the
problem.
Modernize comment describing PHOLD.
Reported by: lists@yamagi.org
Tested by: pho (previous version)
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 3 week
Approved by: re (gjb)
Differential revision: https://reviews.freebsd.org/D6679
2016-06-22 20:15:37 +00:00
|
|
|
sx_slock(&allproc_lock);
|
2015-11-16 06:02:11 +00:00
|
|
|
|
2008-09-29 19:45:12 +00:00
|
|
|
/*
|
2015-11-16 06:02:11 +00:00
|
|
|
* If this process is bigger than the biggest one,
|
2008-09-29 19:45:12 +00:00
|
|
|
* remember it.
|
|
|
|
*/
|
|
|
|
if (size > bigsize) {
|
|
|
|
if (bigproc != NULL)
|
2015-01-24 15:33:42 +00:00
|
|
|
PRELE(bigproc);
|
2008-09-29 19:45:12 +00:00
|
|
|
bigproc = p;
|
|
|
|
bigsize = size;
|
2015-01-24 15:33:42 +00:00
|
|
|
} else {
|
|
|
|
PRELE(p);
|
|
|
|
}
|
2008-09-29 19:45:12 +00:00
|
|
|
}
|
|
|
|
sx_sunlock(&allproc_lock);
|
|
|
|
if (bigproc != NULL) {
|
Add vm.panic_on_oom sysctl, which enables those who would rather panic than
kill a process, when the system runs out of memory. Defaults to off.
Usually, this is most useful when the OOM condition is due to mismanagement
of memory, on a system where the applications in question don't respond well
to being killed.
In theory, if the system is properly managed, it shouldn't be possible to
hit this condition. If it does, the panic can be more desirable for some
users (since it can be a good means of finding the root cause) rather than
killing the largest process and continuing on its merry way.
As kib@ mentions in the differential, there is also protect(1), which uses
procctl(PROC_SPROTECT) to ensure that some processes are immune. However,
a panic approach is still useful in some environments. This is primarily
intended as a development/debugging tool.
Differential Revision: D1627
Reviewed by: kib
MFC after: 1 week
2015-01-24 17:32:45 +00:00
|
|
|
if (vm_panic_on_oom != 0)
|
|
|
|
panic("out of swap space");
|
2015-01-24 15:33:42 +00:00
|
|
|
PROC_LOCK(bigproc);
|
2008-09-29 19:45:12 +00:00
|
|
|
killproc(bigproc, "out of swap space");
|
|
|
|
sched_nice(bigproc, PRIO_MIN);
|
2015-01-24 15:33:42 +00:00
|
|
|
_PRELE(bigproc);
|
2008-09-29 19:45:12 +00:00
|
|
|
PROC_UNLOCK(bigproc);
|
1994-10-22 02:18:03 +00:00
|
|
|
}
|
1994-05-25 09:21:21 +00:00
|
|
|
}
|
1994-05-24 10:09:53 +00:00
|
|
|
|
2018-08-09 18:25:49 +00:00
|
|
|
static bool
|
|
|
|
vm_pageout_lowmem(void)
|
2018-06-02 00:01:07 +00:00
|
|
|
{
|
2018-08-09 18:25:49 +00:00
|
|
|
static int lowmem_ticks = 0;
|
|
|
|
int last;
|
|
|
|
|
|
|
|
last = atomic_load_int(&lowmem_ticks);
|
|
|
|
while ((u_int)(ticks - last) / hz >= lowmem_period) {
|
|
|
|
if (atomic_fcmpset_int(&lowmem_ticks, &last, ticks) == 0)
|
|
|
|
continue;
|
2018-06-02 00:01:07 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Decrease registered cache sizes.
|
|
|
|
*/
|
|
|
|
SDT_PROBE0(vm, , , vm__lowmem_scan);
|
|
|
|
EVENTHANDLER_INVOKE(vm_lowmem, VM_LOW_PAGES);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We do this explicitly after the caches have been
|
2019-09-01 22:22:43 +00:00
|
|
|
* drained above. If we have a severe page shortage on
|
|
|
|
* our hands, completely drain all UMA zones. Otherwise,
|
|
|
|
* just prune the caches.
|
2018-06-02 00:01:07 +00:00
|
|
|
*/
|
2019-09-01 22:22:43 +00:00
|
|
|
uma_reclaim(vm_page_count_min() ? UMA_RECLAIM_DRAIN_CPU :
|
|
|
|
UMA_RECLAIM_TRIM);
|
2018-08-09 18:25:49 +00:00
|
|
|
return (true);
|
2018-06-02 00:01:07 +00:00
|
|
|
}
|
2018-08-09 18:25:49 +00:00
|
|
|
return (false);
|
2018-06-02 00:01:07 +00:00
|
|
|
}
|
|
|
|
|
Split the pagequeues per NUMA domains, and split pageademon process
into threads each processing queue in a single domain. The structure
of the pagedaemons and queues is kept intact, most of the changes come
from the need for code to find an owning page queue for given page,
calculated from the segment containing the page.
The tie between NUMA domain and pagedaemon thread/pagequeue split is
rather arbitrary, the multithreaded daemon could be allowed for the
single-domain machines, or one domain might be split into several page
domains, to further increase concurrency.
Right now, each pagedaemon thread tries to reach the global target,
precalculated at the start of the pass. This is not optimal, since it
could cause excessive page deactivation and freeing. The code should
be changed to re-check the global page deficit state in the loop after
some number of iterations.
The pagedaemons reach the quorum before starting the OOM, since one
thread inability to meet the target is normal for split queues. Only
when all pagedaemons fail to produce enough reusable pages, OOM is
started by single selected thread.
Launder is modified to take into account the segments layout with
regard to the region for which cleaning is performed.
Based on the preliminary patch by jeff, sponsored by EMC / Isilon
Storage Division.
Reviewed by: alc
Tested by: pho
Sponsored by: The FreeBSD Foundation
2013-08-07 16:36:38 +00:00
|
|
|
static void
|
|
|
|
vm_pageout_worker(void *arg)
|
|
|
|
{
|
2018-02-06 22:10:07 +00:00
|
|
|
struct vm_domain *vmd;
|
2018-08-09 18:25:49 +00:00
|
|
|
u_int ofree;
|
2018-06-02 00:01:07 +00:00
|
|
|
int addl_shortage, domain, shortage;
|
2016-10-05 16:15:26 +00:00
|
|
|
bool target_met;
|
Split the pagequeues per NUMA domains, and split pageademon process
into threads each processing queue in a single domain. The structure
of the pagedaemons and queues is kept intact, most of the changes come
from the need for code to find an owning page queue for given page,
calculated from the segment containing the page.
The tie between NUMA domain and pagedaemon thread/pagequeue split is
rather arbitrary, the multithreaded daemon could be allowed for the
single-domain machines, or one domain might be split into several page
domains, to further increase concurrency.
Right now, each pagedaemon thread tries to reach the global target,
precalculated at the start of the pass. This is not optimal, since it
could cause excessive page deactivation and freeing. The code should
be changed to re-check the global page deficit state in the loop after
some number of iterations.
The pagedaemons reach the quorum before starting the OOM, since one
thread inability to meet the target is normal for split queues. Only
when all pagedaemons fail to produce enough reusable pages, OOM is
started by single selected thread.
Launder is modified to take into account the segments layout with
regard to the region for which cleaning is performed.
Based on the preliminary patch by jeff, sponsored by EMC / Isilon
Storage Division.
Reviewed by: alc
Tested by: pho
Sponsored by: The FreeBSD Foundation
2013-08-07 16:36:38 +00:00
|
|
|
|
2018-02-06 22:10:07 +00:00
|
|
|
domain = (uintptr_t)arg;
|
|
|
|
vmd = VM_DOMAIN(domain);
|
2018-02-23 22:51:51 +00:00
|
|
|
shortage = 0;
|
2016-10-05 16:15:26 +00:00
|
|
|
target_met = true;
|
Split the pagequeues per NUMA domains, and split pageademon process
into threads each processing queue in a single domain. The structure
of the pagedaemons and queues is kept intact, most of the changes come
from the need for code to find an owning page queue for given page,
calculated from the segment containing the page.
The tie between NUMA domain and pagedaemon thread/pagequeue split is
rather arbitrary, the multithreaded daemon could be allowed for the
single-domain machines, or one domain might be split into several page
domains, to further increase concurrency.
Right now, each pagedaemon thread tries to reach the global target,
precalculated at the start of the pass. This is not optimal, since it
could cause excessive page deactivation and freeing. The code should
be changed to re-check the global page deficit state in the loop after
some number of iterations.
The pagedaemons reach the quorum before starting the OOM, since one
thread inability to meet the target is normal for split queues. Only
when all pagedaemons fail to produce enough reusable pages, OOM is
started by single selected thread.
Launder is modified to take into account the segments layout with
regard to the region for which cleaning is performed.
Based on the preliminary patch by jeff, sponsored by EMC / Isilon
Storage Division.
Reviewed by: alc
Tested by: pho
Sponsored by: The FreeBSD Foundation
2013-08-07 16:36:38 +00:00
|
|
|
|
|
|
|
/*
|
2013-08-17 07:10:01 +00:00
|
|
|
* XXXKIB It could be useful to bind pageout daemon threads to
|
|
|
|
* the cores belonging to the domain, from which vm_page_array
|
|
|
|
* is allocated.
|
Split the pagequeues per NUMA domains, and split pageademon process
into threads each processing queue in a single domain. The structure
of the pagedaemons and queues is kept intact, most of the changes come
from the need for code to find an owning page queue for given page,
calculated from the segment containing the page.
The tie between NUMA domain and pagedaemon thread/pagequeue split is
rather arbitrary, the multithreaded daemon could be allowed for the
single-domain machines, or one domain might be split into several page
domains, to further increase concurrency.
Right now, each pagedaemon thread tries to reach the global target,
precalculated at the start of the pass. This is not optimal, since it
could cause excessive page deactivation and freeing. The code should
be changed to re-check the global page deficit state in the loop after
some number of iterations.
The pagedaemons reach the quorum before starting the OOM, since one
thread inability to meet the target is normal for split queues. Only
when all pagedaemons fail to produce enough reusable pages, OOM is
started by single selected thread.
Launder is modified to take into account the segments layout with
regard to the region for which cleaning is performed.
Based on the preliminary patch by jeff, sponsored by EMC / Isilon
Storage Division.
Reviewed by: alc
Tested by: pho
Sponsored by: The FreeBSD Foundation
2013-08-07 16:36:38 +00:00
|
|
|
*/
|
|
|
|
|
2018-02-06 22:10:07 +00:00
|
|
|
KASSERT(vmd->vmd_segs != 0, ("domain without segments"));
|
|
|
|
vmd->vmd_last_active_scan = ticks;
|
Split the pagequeues per NUMA domains, and split pageademon process
into threads each processing queue in a single domain. The structure
of the pagedaemons and queues is kept intact, most of the changes come
from the need for code to find an owning page queue for given page,
calculated from the segment containing the page.
The tie between NUMA domain and pagedaemon thread/pagequeue split is
rather arbitrary, the multithreaded daemon could be allowed for the
single-domain machines, or one domain might be split into several page
domains, to further increase concurrency.
Right now, each pagedaemon thread tries to reach the global target,
precalculated at the start of the pass. This is not optimal, since it
could cause excessive page deactivation and freeing. The code should
be changed to re-check the global page deficit state in the loop after
some number of iterations.
The pagedaemons reach the quorum before starting the OOM, since one
thread inability to meet the target is normal for split queues. Only
when all pagedaemons fail to produce enough reusable pages, OOM is
started by single selected thread.
Launder is modified to take into account the segments layout with
regard to the region for which cleaning is performed.
Based on the preliminary patch by jeff, sponsored by EMC / Isilon
Storage Division.
Reviewed by: alc
Tested by: pho
Sponsored by: The FreeBSD Foundation
2013-08-07 16:36:38 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* The pageout daemon worker is never done, so loop forever.
|
|
|
|
*/
|
|
|
|
while (TRUE) {
|
2018-03-15 19:23:07 +00:00
|
|
|
vm_domain_pageout_lock(vmd);
|
2018-06-02 00:01:07 +00:00
|
|
|
|
2018-03-15 19:23:07 +00:00
|
|
|
/*
|
|
|
|
* We need to clear wanted before we check the limits. This
|
|
|
|
* prevents races with wakers who will check wanted after they
|
|
|
|
* reach the limit.
|
|
|
|
*/
|
|
|
|
atomic_store_int(&vmd->vmd_pageout_wanted, 0);
|
2016-05-27 19:15:45 +00:00
|
|
|
|
|
|
|
/*
|
2018-02-23 22:51:51 +00:00
|
|
|
* Might the page daemon need to run again?
|
2016-05-27 19:15:45 +00:00
|
|
|
*/
|
2018-02-23 22:51:51 +00:00
|
|
|
if (vm_paging_needed(vmd, vmd->vmd_free_count)) {
|
Split the pagequeues per NUMA domains, and split pageademon process
into threads each processing queue in a single domain. The structure
of the pagedaemons and queues is kept intact, most of the changes come
from the need for code to find an owning page queue for given page,
calculated from the segment containing the page.
The tie between NUMA domain and pagedaemon thread/pagequeue split is
rather arbitrary, the multithreaded daemon could be allowed for the
single-domain machines, or one domain might be split into several page
domains, to further increase concurrency.
Right now, each pagedaemon thread tries to reach the global target,
precalculated at the start of the pass. This is not optimal, since it
could cause excessive page deactivation and freeing. The code should
be changed to re-check the global page deficit state in the loop after
some number of iterations.
The pagedaemons reach the quorum before starting the OOM, since one
thread inability to meet the target is normal for split queues. Only
when all pagedaemons fail to produce enough reusable pages, OOM is
started by single selected thread.
Launder is modified to take into account the segments layout with
regard to the region for which cleaning is performed.
Based on the preliminary patch by jeff, sponsored by EMC / Isilon
Storage Division.
Reviewed by: alc
Tested by: pho
Sponsored by: The FreeBSD Foundation
2013-08-07 16:36:38 +00:00
|
|
|
/*
|
2018-06-02 00:01:07 +00:00
|
|
|
* Yes. If the scan failed to produce enough free
|
|
|
|
* pages, sleep uninterruptibly for some time in the
|
|
|
|
* hope that the laundry thread will clean some pages.
|
Split the pagequeues per NUMA domains, and split pageademon process
into threads each processing queue in a single domain. The structure
of the pagedaemons and queues is kept intact, most of the changes come
from the need for code to find an owning page queue for given page,
calculated from the segment containing the page.
The tie between NUMA domain and pagedaemon thread/pagequeue split is
rather arbitrary, the multithreaded daemon could be allowed for the
single-domain machines, or one domain might be split into several page
domains, to further increase concurrency.
Right now, each pagedaemon thread tries to reach the global target,
precalculated at the start of the pass. This is not optimal, since it
could cause excessive page deactivation and freeing. The code should
be changed to re-check the global page deficit state in the loop after
some number of iterations.
The pagedaemons reach the quorum before starting the OOM, since one
thread inability to meet the target is normal for split queues. Only
when all pagedaemons fail to produce enough reusable pages, OOM is
started by single selected thread.
Launder is modified to take into account the segments layout with
regard to the region for which cleaning is performed.
Based on the preliminary patch by jeff, sponsored by EMC / Isilon
Storage Division.
Reviewed by: alc
Tested by: pho
Sponsored by: The FreeBSD Foundation
2013-08-07 16:36:38 +00:00
|
|
|
*/
|
2018-03-15 19:23:07 +00:00
|
|
|
vm_domain_pageout_unlock(vmd);
|
2018-06-02 00:01:07 +00:00
|
|
|
if (!target_met)
|
2017-12-06 18:36:54 +00:00
|
|
|
pause("pwait", hz / VM_INACT_SCAN_RATE);
|
Split the pagequeues per NUMA domains, and split pageademon process
into threads each processing queue in a single domain. The structure
of the pagedaemons and queues is kept intact, most of the changes come
from the need for code to find an owning page queue for given page,
calculated from the segment containing the page.
The tie between NUMA domain and pagedaemon thread/pagequeue split is
rather arbitrary, the multithreaded daemon could be allowed for the
single-domain machines, or one domain might be split into several page
domains, to further increase concurrency.
Right now, each pagedaemon thread tries to reach the global target,
precalculated at the start of the pass. This is not optimal, since it
could cause excessive page deactivation and freeing. The code should
be changed to re-check the global page deficit state in the loop after
some number of iterations.
The pagedaemons reach the quorum before starting the OOM, since one
thread inability to meet the target is normal for split queues. Only
when all pagedaemons fail to produce enough reusable pages, OOM is
started by single selected thread.
Launder is modified to take into account the segments layout with
regard to the region for which cleaning is performed.
Based on the preliminary patch by jeff, sponsored by EMC / Isilon
Storage Division.
Reviewed by: alc
Tested by: pho
Sponsored by: The FreeBSD Foundation
2013-08-07 16:36:38 +00:00
|
|
|
} else {
|
|
|
|
/*
|
2018-02-23 22:51:51 +00:00
|
|
|
* No, sleep until the next wakeup or until pages
|
|
|
|
* need to have their reference stats updated.
|
Split the pagequeues per NUMA domains, and split pageademon process
into threads each processing queue in a single domain. The structure
of the pagedaemons and queues is kept intact, most of the changes come
from the need for code to find an owning page queue for given page,
calculated from the segment containing the page.
The tie between NUMA domain and pagedaemon thread/pagequeue split is
rather arbitrary, the multithreaded daemon could be allowed for the
single-domain machines, or one domain might be split into several page
domains, to further increase concurrency.
Right now, each pagedaemon thread tries to reach the global target,
precalculated at the start of the pass. This is not optimal, since it
could cause excessive page deactivation and freeing. The code should
be changed to re-check the global page deficit state in the loop after
some number of iterations.
The pagedaemons reach the quorum before starting the OOM, since one
thread inability to meet the target is normal for split queues. Only
when all pagedaemons fail to produce enough reusable pages, OOM is
started by single selected thread.
Launder is modified to take into account the segments layout with
regard to the region for which cleaning is performed.
Based on the preliminary patch by jeff, sponsored by EMC / Isilon
Storage Division.
Reviewed by: alc
Tested by: pho
Sponsored by: The FreeBSD Foundation
2013-08-07 16:36:38 +00:00
|
|
|
*/
|
2018-02-20 10:13:13 +00:00
|
|
|
if (mtx_sleep(&vmd->vmd_pageout_wanted,
|
2018-03-15 19:23:07 +00:00
|
|
|
vm_domain_pageout_lockptr(vmd), PDROP | PVM,
|
2018-02-23 22:51:51 +00:00
|
|
|
"psleep", hz / VM_INACT_SCAN_RATE) == 0)
|
- Remove 'struct vmmeter' from 'struct pcpu', leaving only global vmmeter
in place. To do per-cpu stats, convert all fields that previously were
maintained in the vmmeters that sit in pcpus to counter(9).
- Since some vmmeter stats may be touched at very early stages of boot,
before we have set up UMA and we can do counter_u64_alloc(), provide an
early counter mechanism:
o Leave one spare uint64_t in struct pcpu, named pc_early_dummy_counter.
o Point counter(9) fields of vmmeter to pcpu[0].pc_early_dummy_counter,
so that at early stages of boot, before counters are allocated we already
point to a counter that can be safely written to.
o For sparc64 that required a whole dummy pcpu[MAXCPU] array.
Further related changes:
- Don't include vmmeter.h into pcpu.h.
- vm.stats.vm.v_swappgsout and vm.stats.vm.v_swappgsin changed to 64-bit,
to match kernel representation.
- struct vmmeter hidden under _KERNEL, and only vmstat(1) is an exclusion.
This is based on benno@'s 4-year old patch:
https://lists.freebsd.org/pipermail/freebsd-arch/2013-July/014471.html
Reviewed by: kib, gallatin, marius, lidl
Differential Revision: https://reviews.freebsd.org/D10156
2017-04-17 17:34:47 +00:00
|
|
|
VM_CNT_INC(v_pdwakeups);
|
Split the pagequeues per NUMA domains, and split pageademon process
into threads each processing queue in a single domain. The structure
of the pagedaemons and queues is kept intact, most of the changes come
from the need for code to find an owning page queue for given page,
calculated from the segment containing the page.
The tie between NUMA domain and pagedaemon thread/pagequeue split is
rather arbitrary, the multithreaded daemon could be allowed for the
single-domain machines, or one domain might be split into several page
domains, to further increase concurrency.
Right now, each pagedaemon thread tries to reach the global target,
precalculated at the start of the pass. This is not optimal, since it
could cause excessive page deactivation and freeing. The code should
be changed to re-check the global page deficit state in the loop after
some number of iterations.
The pagedaemons reach the quorum before starting the OOM, since one
thread inability to meet the target is normal for split queues. Only
when all pagedaemons fail to produce enough reusable pages, OOM is
started by single selected thread.
Launder is modified to take into account the segments layout with
regard to the region for which cleaning is performed.
Based on the preliminary patch by jeff, sponsored by EMC / Isilon
Storage Division.
Reviewed by: alc
Tested by: pho
Sponsored by: The FreeBSD Foundation
2013-08-07 16:36:38 +00:00
|
|
|
}
|
2018-05-24 14:16:22 +00:00
|
|
|
|
2018-03-15 19:23:07 +00:00
|
|
|
/* Prevent spurious wakeups by ensuring that wanted is set. */
|
|
|
|
atomic_store_int(&vmd->vmd_pageout_wanted, 1);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Use the controller to calculate how many pages to free in
|
2018-08-09 18:25:49 +00:00
|
|
|
* this interval, and scan the inactive queue. If the lowmem
|
|
|
|
* handlers appear to have freed up some pages, subtract the
|
|
|
|
* difference from the inactive queue scan target.
|
2018-03-15 19:23:07 +00:00
|
|
|
*/
|
2018-02-23 22:51:51 +00:00
|
|
|
shortage = pidctrl_daemon(&vmd->vmd_pid, vmd->vmd_free_count);
|
2018-06-02 00:01:07 +00:00
|
|
|
if (shortage > 0) {
|
2018-08-09 18:25:49 +00:00
|
|
|
ofree = vmd->vmd_free_count;
|
|
|
|
if (vm_pageout_lowmem() && vmd->vmd_free_count > ofree)
|
|
|
|
shortage -= min(vmd->vmd_free_count - ofree,
|
|
|
|
(u_int)shortage);
|
2018-06-02 00:01:07 +00:00
|
|
|
target_met = vm_pageout_scan_inactive(vmd, shortage,
|
|
|
|
&addl_shortage);
|
|
|
|
} else
|
|
|
|
addl_shortage = 0;
|
2018-05-24 14:16:22 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Scan the active queue. A positive value for shortage
|
|
|
|
* indicates that we must aggressively deactivate pages to avoid
|
|
|
|
* a shortfall.
|
|
|
|
*/
|
2018-05-24 20:26:37 +00:00
|
|
|
shortage = vm_pageout_active_target(vmd) + addl_shortage;
|
2018-05-24 14:16:22 +00:00
|
|
|
vm_pageout_scan_active(vmd, shortage);
|
Split the pagequeues per NUMA domains, and split pageademon process
into threads each processing queue in a single domain. The structure
of the pagedaemons and queues is kept intact, most of the changes come
from the need for code to find an owning page queue for given page,
calculated from the segment containing the page.
The tie between NUMA domain and pagedaemon thread/pagequeue split is
rather arbitrary, the multithreaded daemon could be allowed for the
single-domain machines, or one domain might be split into several page
domains, to further increase concurrency.
Right now, each pagedaemon thread tries to reach the global target,
precalculated at the start of the pass. This is not optimal, since it
could cause excessive page deactivation and freeing. The code should
be changed to re-check the global page deficit state in the loop after
some number of iterations.
The pagedaemons reach the quorum before starting the OOM, since one
thread inability to meet the target is normal for split queues. Only
when all pagedaemons fail to produce enough reusable pages, OOM is
started by single selected thread.
Launder is modified to take into account the segments layout with
regard to the region for which cleaning is performed.
Based on the preliminary patch by jeff, sponsored by EMC / Isilon
Storage Division.
Reviewed by: alc
Tested by: pho
Sponsored by: The FreeBSD Foundation
2013-08-07 16:36:38 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
1994-05-24 10:09:53 +00:00
|
|
|
/*
|
2014-08-28 19:50:08 +00:00
|
|
|
* vm_pageout_init initialises basic pageout daemon settings.
|
1994-05-24 10:09:53 +00:00
|
|
|
*/
|
1995-08-28 09:19:25 +00:00
|
|
|
static void
|
2018-02-06 22:10:07 +00:00
|
|
|
vm_pageout_init_domain(int domain)
|
1994-05-24 10:09:53 +00:00
|
|
|
{
|
2018-02-06 22:10:07 +00:00
|
|
|
struct vm_domain *vmd;
|
2018-02-23 22:51:51 +00:00
|
|
|
struct sysctl_oid *oid;
|
2018-02-06 22:10:07 +00:00
|
|
|
|
|
|
|
vmd = VM_DOMAIN(domain);
|
|
|
|
vmd->vmd_interrupt_free_min = 2;
|
1995-04-09 06:03:56 +00:00
|
|
|
|
2003-09-19 05:03:45 +00:00
|
|
|
/*
|
|
|
|
* v_free_reserved needs to include enough for the largest
|
|
|
|
* swap pager structures plus enough for any pv_entry structs
|
|
|
|
* when paging.
|
|
|
|
*/
|
2018-02-06 22:10:07 +00:00
|
|
|
if (vmd->vmd_page_count > 1024)
|
|
|
|
vmd->vmd_free_min = 4 + (vmd->vmd_page_count - 1024) / 200;
|
2007-05-31 22:52:15 +00:00
|
|
|
else
|
2018-02-06 22:10:07 +00:00
|
|
|
vmd->vmd_free_min = 4;
|
2019-07-06 15:55:16 +00:00
|
|
|
vmd->vmd_pageout_free_min = 2 * MAXBSIZE / PAGE_SIZE +
|
2018-02-06 22:10:07 +00:00
|
|
|
vmd->vmd_interrupt_free_min;
|
|
|
|
vmd->vmd_free_reserved = vm_pageout_page_count +
|
|
|
|
vmd->vmd_pageout_free_min + (vmd->vmd_page_count / 768);
|
|
|
|
vmd->vmd_free_severe = vmd->vmd_free_min / 2;
|
|
|
|
vmd->vmd_free_target = 4 * vmd->vmd_free_min + vmd->vmd_free_reserved;
|
|
|
|
vmd->vmd_free_min += vmd->vmd_free_reserved;
|
|
|
|
vmd->vmd_free_severe += vmd->vmd_free_reserved;
|
|
|
|
vmd->vmd_inactive_target = (3 * vmd->vmd_free_target) / 2;
|
|
|
|
if (vmd->vmd_inactive_target > vmd->vmd_free_count / 3)
|
|
|
|
vmd->vmd_inactive_target = vmd->vmd_free_count / 3;
|
2003-09-19 05:03:45 +00:00
|
|
|
|
1994-09-12 11:31:36 +00:00
|
|
|
/*
|
2018-02-23 22:51:51 +00:00
|
|
|
* Set the default wakeup threshold to be 10% below the paging
|
|
|
|
* target. This keeps the steady state out of shortfall.
|
1994-09-12 11:31:36 +00:00
|
|
|
*/
|
2018-02-23 22:51:51 +00:00
|
|
|
vmd->vmd_pageout_wakeup_thresh = (vmd->vmd_free_target / 10) * 9;
|
2018-02-06 22:10:07 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Target amount of memory to move out of the laundry queue during a
|
|
|
|
* background laundering. This is proportional to the amount of system
|
|
|
|
* memory.
|
|
|
|
*/
|
|
|
|
vmd->vmd_background_launder_target = (vmd->vmd_free_target -
|
|
|
|
vmd->vmd_free_min) / 10;
|
2018-02-23 22:51:51 +00:00
|
|
|
|
|
|
|
/* Initialize the pageout daemon pid controller. */
|
|
|
|
pidctrl_init(&vmd->vmd_pid, hz / VM_INACT_SCAN_RATE,
|
|
|
|
vmd->vmd_free_target, PIDCTRL_BOUND,
|
|
|
|
PIDCTRL_KPD, PIDCTRL_KID, PIDCTRL_KDD);
|
|
|
|
oid = SYSCTL_ADD_NODE(NULL, SYSCTL_CHILDREN(vmd->vmd_oid), OID_AUTO,
|
|
|
|
"pidctrl", CTLFLAG_RD, NULL, "");
|
|
|
|
pidctrl_init_sysctl(&vmd->vmd_pid, SYSCTL_CHILDREN(oid));
|
2018-02-06 22:10:07 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
|
|
vm_pageout_init(void)
|
|
|
|
{
|
|
|
|
u_int freecount;
|
|
|
|
int i;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Initialize some paging parameters.
|
|
|
|
*/
|
|
|
|
if (vm_cnt.v_page_count < 2000)
|
|
|
|
vm_pageout_page_count = 8;
|
|
|
|
|
|
|
|
freecount = 0;
|
|
|
|
for (i = 0; i < vm_ndomains; i++) {
|
|
|
|
struct vm_domain *vmd;
|
|
|
|
|
|
|
|
vm_pageout_init_domain(i);
|
|
|
|
vmd = VM_DOMAIN(i);
|
|
|
|
vm_cnt.v_free_reserved += vmd->vmd_free_reserved;
|
|
|
|
vm_cnt.v_free_target += vmd->vmd_free_target;
|
|
|
|
vm_cnt.v_free_min += vmd->vmd_free_min;
|
|
|
|
vm_cnt.v_inactive_target += vmd->vmd_inactive_target;
|
|
|
|
vm_cnt.v_pageout_free_min += vmd->vmd_pageout_free_min;
|
|
|
|
vm_cnt.v_interrupt_free_min += vmd->vmd_interrupt_free_min;
|
|
|
|
vm_cnt.v_free_severe += vmd->vmd_free_severe;
|
|
|
|
freecount += vmd->vmd_free_count;
|
|
|
|
}
|
2013-08-13 21:56:16 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Set interval in seconds for active scan. We want to visit each
|
2013-08-19 23:54:24 +00:00
|
|
|
* page at least once every ten minutes. This is to prevent worst
|
|
|
|
* case paging behaviors with stale active LRU.
|
2013-08-13 21:56:16 +00:00
|
|
|
*/
|
|
|
|
if (vm_pageout_update_period == 0)
|
2013-08-19 23:54:24 +00:00
|
|
|
vm_pageout_update_period = 600;
|
1994-05-25 09:21:21 +00:00
|
|
|
|
Provide separate accounting for user-wired pages.
Historically we have not distinguished between kernel wirings and user
wirings for accounting purposes. User wirings (via mlock(2)) were
subject to a global limit on the number of wired pages, so if large
swaths of physical memory were wired by the kernel, as happens with
the ZFS ARC among other things, the limit could be exceeded, causing
user wirings to fail.
The change adds a new counter, v_user_wire_count, which counts the
number of virtual pages wired by user processes via mlock(2) and
mlockall(2). Only user-wired pages are subject to the system-wide
limit which helps provide some safety against deadlocks. In
particular, while sources of kernel wirings typically support some
backpressure mechanism, there is no way to reclaim user-wired pages
shorting of killing the wiring process. The limit is exported as
vm.max_user_wired, renamed from vm.max_wired, and changed from u_int
to u_long.
The choice to count virtual user-wired pages rather than physical
pages was done for simplicity. There are mechanisms that can cause
user-wired mappings to be destroyed while maintaining a wiring of
the backing physical page; these make it difficult to accurately
track user wirings at the physical page layer.
The change also closes some holes which allowed user wirings to succeed
even when they would cause the system limit to be exceeded. For
instance, mmap() may now fail with ENOMEM in a process that has called
mlockall(MCL_FUTURE) if the new mapping would cause the user wiring
limit to be exceeded.
Note that bhyve -S is subject to the user wiring limit, which defaults
to 1/3 of physical RAM. Users that wish to exceed the limit must tune
vm.max_user_wired.
Reviewed by: kib, ngie (mlock() test changes)
Tested by: pho (earlier version)
MFC after: 45 days
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D19908
2019-05-13 16:38:48 +00:00
|
|
|
if (vm_page_max_user_wired == 0)
|
|
|
|
vm_page_max_user_wired = freecount / 3;
|
2014-08-28 19:50:08 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* vm_pageout is the high level pageout daemon.
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
vm_pageout(void)
|
|
|
|
{
|
2018-10-30 17:57:40 +00:00
|
|
|
struct proc *p;
|
|
|
|
struct thread *td;
|
|
|
|
int error, first, i;
|
|
|
|
|
|
|
|
p = curproc;
|
|
|
|
td = curthread;
|
1994-05-24 10:09:53 +00:00
|
|
|
|
2019-08-16 09:43:49 +00:00
|
|
|
mtx_init(&vm_oom_ratelim_mtx, "vmoomr", NULL, MTX_DEF);
|
NOTE: libkvm, w, ps, 'top', and any other utility which depends on struct
proc or any VM system structure will have to be rebuilt!!!
Much needed overhaul of the VM system. Included in this first round of
changes:
1) Improved pager interfaces: init, alloc, dealloc, getpages, putpages,
haspage, and sync operations are supported. The haspage interface now
provides information about clusterability. All pager routines now take
struct vm_object's instead of "pagers".
2) Improved data structures. In the previous paradigm, there is constant
confusion caused by pagers being both a data structure ("allocate a
pager") and a collection of routines. The idea of a pager structure has
escentially been eliminated. Objects now have types, and this type is
used to index the appropriate pager. In most cases, items in the pager
structure were duplicated in the object data structure and thus were
unnecessary. In the few cases that remained, a un_pager structure union
was created in the object to contain these items.
3) Because of the cleanup of #1 & #2, a lot of unnecessary layering can now
be removed. For instance, vm_object_enter(), vm_object_lookup(),
vm_object_remove(), and the associated object hash list were some of the
things that were removed.
4) simple_lock's removed. Discussion with several people reveals that the
SMP locking primitives used in the VM system aren't likely the mechanism
that we'll be adopting. Even if it were, the locking that was in the code
was very inadequate and would have to be mostly re-done anyway. The
locking in a uni-processor kernel was a no-op but went a long way toward
making the code difficult to read and debug.
5) Places that attempted to kludge-up the fact that we don't have kernel
thread support have been fixed to reflect the reality that we are really
dealing with processes, not threads. The VM system didn't have complete
thread support, so the comments and mis-named routines were just wrong.
We now use tsleep and wakeup directly in the lock routines, for instance.
6) Where appropriate, the pagers have been improved, especially in the
pager_alloc routines. Most of the pager_allocs have been rewritten and
are now faster and easier to maintain.
7) The pagedaemon pageout clustering algorithm has been rewritten and
now tries harder to output an even number of pages before and after
the requested page. This is sort of the reverse of the ideal pagein
algorithm and should provide better overall performance.
8) Unnecessary (incorrect) casts to caddr_t in calls to tsleep & wakeup
have been removed. Some other unnecessary casts have also been removed.
9) Some almost useless debugging code removed.
10) Terminology of shadow objects vs. backing objects straightened out.
The fact that the vm_object data structure escentially had this
backwards really confused things. The use of "shadow" and "backing
object" throughout the code is now internally consistent and correct
in the Mach terminology.
11) Several minor bug fixes, including one in the vm daemon that caused
0 RSS objects to not get purged as intended.
12) A "default pager" has now been created which cleans up the transition
of objects to the "swap" type. The previous checks throughout the code
for swp->pg_data != NULL were really ugly. This change also provides
the rudiments for future backing of "anonymous" memory by something
other than the swap pager (via the vnode pager, for example), and it
allows the decision about which of these pagers to use to be made
dynamically (although will need some additional decision code to do
this, of course).
13) (dyson) MAP_COPY has been deprecated and the corresponding "copy
object" code has been removed. MAP_COPY was undocumented and non-
standard. It was furthermore broken in several ways which caused its
behavior to degrade to MAP_PRIVATE. Binaries that use MAP_COPY will
continue to work correctly, but via the slightly different semantics
of MAP_PRIVATE.
14) (dyson) Sharing maps have been removed. It's marginal usefulness in a
threads design can be worked around in other ways. Both #12 and #13
were done to simplify the code and improve readability and maintain-
ability. (As were most all of these changes)
TODO:
1) Rewrite most of the vnode pager to use VOP_GETPAGES/PUTPAGES. Doing
this will reduce the vnode pager to a mere fraction of its current size.
2) Rewrite vm_fault and the swap/vnode pagers to use the clustering
information provided by the new haspage pager interface. This will
substantially reduce the overhead by eliminating a large number of
VOP_BMAP() calls. The VOP_BMAP() filesystem interface should be
improved to provide both a "behind" and "ahead" indication of
contiguousness.
3) Implement the extended features of pager_haspage in swap_pager_haspage().
It currently just says 0 pages ahead/behind.
4) Re-implement the swap device (swstrategy) in a more elegant way, perhaps
via a much more general mechanism that could also be used for disk
striping of regular filesystems.
5) Do something to improve the architecture of vm_object_collapse(). The
fact that it makes calls into the swap pager and knows too much about
how the swap pager operates really bothers me. It also doesn't allow
for collapsing of non-swap pager objects ("unnamed" objects backed by
other pagers).
1995-07-13 08:48:48 +00:00
|
|
|
swap_pager_swap_init();
|
2018-10-30 17:57:40 +00:00
|
|
|
for (first = -1, i = 0; i < vm_ndomains; i++) {
|
2018-10-01 14:14:21 +00:00
|
|
|
if (VM_DOMAIN_EMPTY(i)) {
|
|
|
|
if (bootverbose)
|
|
|
|
printf("domain %d empty; skipping pageout\n",
|
|
|
|
i);
|
|
|
|
continue;
|
|
|
|
}
|
2018-10-30 17:57:40 +00:00
|
|
|
if (first == -1)
|
|
|
|
first = i;
|
|
|
|
else {
|
|
|
|
error = kthread_add(vm_pageout_worker,
|
|
|
|
(void *)(uintptr_t)i, p, NULL, 0, 0, "dom%d", i);
|
|
|
|
if (error != 0)
|
|
|
|
panic("starting pageout for domain %d: %d\n",
|
|
|
|
i, error);
|
Implement a low-memory deadlock solution.
Removed most of the hacks that were trying to deal with low-memory
situations prior to now.
The new code is based on the concept that I/O must be able to function in
a low memory situation. All major modules related to I/O (except
networking) have been adjusted to allow allocation out of the system
reserve memory pool. These modules now detect a low memory situation but
rather then block they instead continue to operate, then return resources
to the memory pool instead of cache them or leave them wired.
Code has been added to stall in a low-memory situation prior to a vnode
being locked.
Thus situations where a process blocks in a low-memory condition while
holding a locked vnode have been reduced to near nothing. Not only will
I/O continue to operate, but many prior deadlock conditions simply no
longer exist.
Implement a number of VFS/BIO fixes
(found by Ian): in biodone(), bogus-page replacement code, the loop
was not properly incrementing loop variables prior to a continue
statement. We do not believe this code can be hit anyway but we
aren't taking any chances. We'll turn the whole section into a
panic (as it already is in brelse()) after the release is rolled.
In biodone(), the foff calculation was incorrectly
clamped to the iosize, causing the wrong foff to be calculated
for pages in the case of an I/O error or biodone() called without
initiating I/O. The problem always caused a panic before. Now it
doesn't. The problem is mainly an issue with NFS.
Fixed casts for ~PAGE_MASK. This code worked properly before only
because the calculations use signed arithmatic. Better to properly
extend PAGE_MASK first before inverting it for the 64 bit masking
op.
In brelse(), the bogus_page fixup code was improperly throwing
away the original contents of 'm' when it did the j-loop to
fix the bogus pages. The result was that it would potentially
invalidate parts of the *WRONG* page(!), leading to corruption.
There may still be cases where a background bitmap write is
being duplicated, causing potential corruption. We have identified
a potentially serious bug related to this but the fix is still TBD.
So instead this patch contains a KASSERT to detect the problem
and panic the machine rather then continue to corrupt the filesystem.
The problem does not occur very often.. it is very hard to
reproduce, and it may or may not be the cause of the corruption
people have reported.
Review by: (VFS/BIO: mckusick, Ian Dowse <iedowse@maths.tcd.ie>)
Testing by: (VM/Deadlock) Paul Saab <ps@yahoo-inc.com>
2000-11-18 23:06:26 +00:00
|
|
|
}
|
2018-02-06 22:10:07 +00:00
|
|
|
error = kthread_add(vm_pageout_laundry_worker,
|
2018-10-30 17:57:40 +00:00
|
|
|
(void *)(uintptr_t)i, p, NULL, 0, 0, "laundry: dom%d", i);
|
2018-02-06 22:10:07 +00:00
|
|
|
if (error != 0)
|
2018-10-30 17:57:40 +00:00
|
|
|
panic("starting laundry for domain %d: %d", i, error);
|
1994-05-24 10:09:53 +00:00
|
|
|
}
|
2018-10-30 17:57:40 +00:00
|
|
|
error = kthread_add(uma_reclaim_worker, NULL, p, NULL, 0, 0, "uma");
|
2015-05-09 20:08:36 +00:00
|
|
|
if (error != 0)
|
|
|
|
panic("starting uma_reclaim helper, error %d\n", error);
|
2018-10-30 17:57:40 +00:00
|
|
|
|
|
|
|
snprintf(td->td_name, sizeof(td->td_name), "dom%d", first);
|
|
|
|
vm_pageout_worker((void *)(uintptr_t)first);
|
1994-05-24 10:09:53 +00:00
|
|
|
}
|
1994-05-25 09:21:21 +00:00
|
|
|
|
2003-02-09 20:40:36 +00:00
|
|
|
/*
|
2017-12-24 19:45:16 +00:00
|
|
|
* Perform an advisory wakeup of the page daemon.
|
2003-02-09 20:40:36 +00:00
|
|
|
*/
|
1996-11-28 23:15:07 +00:00
|
|
|
void
|
2018-02-06 22:10:07 +00:00
|
|
|
pagedaemon_wakeup(int domain)
|
1996-11-28 23:15:07 +00:00
|
|
|
{
|
2018-02-06 22:10:07 +00:00
|
|
|
struct vm_domain *vmd;
|
2003-02-02 07:16:40 +00:00
|
|
|
|
2018-02-06 22:10:07 +00:00
|
|
|
vmd = VM_DOMAIN(domain);
|
2018-03-15 19:23:07 +00:00
|
|
|
vm_domain_pageout_assert_unlocked(vmd);
|
|
|
|
if (curproc == pageproc)
|
|
|
|
return;
|
2017-12-24 19:45:16 +00:00
|
|
|
|
2018-03-15 19:23:07 +00:00
|
|
|
if (atomic_fetchadd_int(&vmd->vmd_pageout_wanted, 1) == 0) {
|
|
|
|
vm_domain_pageout_lock(vmd);
|
|
|
|
atomic_store_int(&vmd->vmd_pageout_wanted, 1);
|
2018-02-06 22:10:07 +00:00
|
|
|
wakeup(&vmd->vmd_pageout_wanted);
|
2018-03-15 19:23:07 +00:00
|
|
|
vm_domain_pageout_unlock(vmd);
|
1996-11-28 23:15:07 +00:00
|
|
|
}
|
|
|
|
}
|