3996 Commits

Author SHA1 Message Date
trasz
09059e926a Ifdef out the unused vm_rr_selectdomain().
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
2017-02-02 17:44:55 +00:00
markj
dfb4f0694a Avoid page lookups in the top-level object in vm_object_madvise().
We can iterate over consecutive resident pages in the top-level object
using the object's page list rather than by performing lookups in the
object radix tree. This extends one of the optimizations in r312208 to the
case where a shadow chain is present.

Suggested by:	alc
Reviewed by:	alc, kib (previous version)
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D9282
2017-01-30 18:51:43 +00:00
mjg
37825433d7 hwpmc: partially depessimize munmap handling if the module is not loaded
HWPMC_HOOKS is enabled in GENERIC and triggers some work avoidable in the
common (module not loaded) case.

In particular this avoids permission checks + lock downgrade
singlethreaded and in cases were an executable mapping is found the pmc
sx lock is no longer bounced.

Note this is a band aid.

MFC after:	1 week
2017-01-24 22:00:16 +00:00
markj
6543297f0e Avoid unnecessary page lookups in vm_object_madvise().
vm_object_madvise() is frequently used to apply advice to a contiguous
set of pages in an object with no backing object. Optimize this case by
skipping non-resident subranges in constant time, and by iterating over
resident pages using the object memq, thus avoiding radix tree lookups on
each page index in the specified range.

While here, move MADV_WILLNEED handling to vm_page_advise(), and rename the
"advise" parameter to vm_object_madvise() to "advice."

Reviewed by:	alc, kib
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D9098
2017-01-15 03:50:08 +00:00
glebius
232949ffad Fix the contiguity once more. 2017-01-12 20:26:02 +00:00
markj
a43ae72c82 Remove a redundant use of min().
Reported by:	rpokala
X-MFC With:	r311346
2017-01-05 03:13:45 +00:00
markj
d0b3c7cb46 Add a small allocator for exec_map entries.
Upon each execve, we allocate a KVA range for use in copying data to the
new image. Pages must be faulted into the range, and when the range is
freed, the backing pages are freed and their mappings are destroyed. This
is a lot of needless overhead, and the exec_map management becomes a
bottleneck when many CPUs are executing execve concurrently. Moreover, the
number of available ranges is fixed at 16, which is insufficient on large
systems and potentially excessive on 32-bit systems.

The new allocator reduces overhead by making exec_map allocations
persistent. When a range is freed, pages backing the range are marked clean
and made easy to reclaim. With this change, the exec_map is sized based on
the number of CPUs.

Reviewed by:	kib
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D8921
2017-01-05 01:44:12 +00:00
glebius
53ba35ff94 Fix assertion that checks that pages are consecutive to properly
handle bogus_page insertion(s).
2017-01-04 22:31:09 +00:00
glebius
01e1e94c27 Move bogus_page declaration to vm_page.h and initialization to vm_page.c.
Reviewed by:	kib
2017-01-04 22:27:19 +00:00
markj
bb34e01cc2 Add a page queue for holding dirty anonymous unswappable pages.
On systems without a configured swap device, an attempt to launder pages
from a swap object will always fail and result in the page being
reactivated. This means that the page daemon will continuously scan pages
that can never be evicted. With this change, anonymous pages are instead
moved to PQ_UNSWAPPABLE after a failed laundering attempt when no swap
devices are configured. PQ_UNSWAPPABLE is not scanned unless a swap device
is configured, so unreferenced unswappable pages are excluded from the page
daemon's workload.

Reviewed by:	alc
2017-01-03 00:05:44 +00:00
jhibbits
d63657bd9d Print flags in hex instead of decimal.
Hex is easier to grok for flags, and consistent with other prints.
2017-01-02 16:50:52 +00:00
kib
e345e3955b Style fixes for vm_map_insert().
Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-01-01 18:49:46 +00:00
kib
d67c4de4d0 Ansify vm/vm_pager.c. Style.
Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-12-31 19:30:22 +00:00
mjg
2f0eb1a5cb Use vrefact in vnode_pager_alloc. 2016-12-31 10:37:56 +00:00
kib
80cfc0c41c Fix two similar bugs in the populate vm_fault() code.
If pager' populate method succeeded, but other thread raced with us
and modified vm_map, we must unbusy all pages busied by the pager,
before we retry the whole fault handling.  If pager instantiated more
pages than fit into the current map entry, we must unbusy the pages
which are clipped.

Also do some refactoring, clarify comments and use more clear local
variable names.

Reported and tested by:	kargl, subbsd@gmail.com (previous version)
Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2016-12-30 18:55:33 +00:00
kib
8916f2e3c9 Assert that the pages found on the object queue by vm_page_next() and
vm_page_prev() have correct ownership.

In collaboration with:	alc
Sponsored by:	The FreeBSD Foundation (kib)
MFC after:	1 week
2016-12-30 17:37:06 +00:00
kib
9763564a8d Style.
Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-12-30 13:04:43 +00:00
mjg
f4dcd1882e Remove cpu_spinwait after seq_consistent.
It does not add any benefit as the read routine will do it as necessary.
2016-12-30 06:26:17 +00:00
alc
d619a90d54 Relax the object type restrictions on vm_page_alloc_contig(). Specifically,
add support for object types that were previously prohibited because they
could contain PG_CACHED pages.

Roughly halve the number of radix trie operations performed by
vm_page_alloc_contig() using the same approach that is employed by
vm_page_alloc().  Also, eliminate the radix trie lookup performed with the
free page queues lock held.

Tidy up the handling of radix trie insert failures in vm_page_alloc() and
vm_page_alloc_contig().

Reviewed by:	kib, markj
Tested by:	pho
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D8878
2016-12-28 18:32:13 +00:00
kib
02ea7af4a5 Remove redundancy in vmtotal().
There are two instances of inlined unlocks + continue in vmtotal()
switch statements, which are ordinary expressed with break from the
switch case and code after the switch.  Also, the combination of
continue and break statement is redundand.

Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-12-26 19:29:04 +00:00
kib
da005e5578 Fix argument type and microoptimize swp_pager_meta_free().
The count argument natural type if vm_pindex_t, but due to the loop
organization, it has to be signed type to detect the termination
condition.  Replace this logic by using distinguished counter for the
processed pages, and terminate loop when the counter exceeds the
argument.

Completely process one swblock for all relevant indexes instead of
doing relookup in hash when incrementing page index on the loop step.

Do not drop hash mutex around iterations.

Noted and reviewed by:	alc
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2016-12-24 09:57:31 +00:00
kib
e2725e89e1 Improve vm_object_scan_all_shadowed() to also check swap backing objects.
As noted in the removed comment, it is possible and not prohibitively
costly to look up the swap blocks for the given page index.  Implement
a swap_pager_find_least() function to do that, and use it to iterate
simultaneously over both backing object page queue and swap
allocations when looking for shadowed pages.

Testing shows that number of new succesful scans, enabled by this
addition, is small but non-zero.  When worked out, the change both
further reduces the depth of the shadow object chain, and frees unused
but allocated swap and memory.

Suggested and reviewed by:	alc
Tested by:	pho (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2016-12-18 20:56:14 +00:00
kib
a3cbc7800b In swp_pager_meta_free_all(), fix type of the index variable. Style.
Noted and reviewed by:	alc (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-12-16 23:33:37 +00:00
kib
565e916002 Provide introductory description of the default pager.
Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-12-14 23:36:32 +00:00
kib
3bda92f604 Remove locking around accounting initialization of the default object.
The object is not yet fully constructed and must not be available to
other threads.  This makes default_pager_alloc() almost identical to
swap_pager_alloc_init().

Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-12-14 23:34:25 +00:00
alc
d4c03e3af2 Tidy up. Mostly, remove or replace stale comments. Most of the comments
in this file actually described the operation of the swap pager, not the
default pager.  Given that this is the wrong place to discuss the
implementation of the swap pager, it shouldn't come as a surprise that as
the swap pager evolved these comments became increasingly stale.  In
addition, apply some style fixes, like modernizing a few remaining old-
style function definitions.

Reviewed by:	kib, markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D8781
2016-12-14 17:28:55 +00:00
jhb
6b0130d5b3 Use db_lookup_proc() in the DDB 'show procvm' command.
This allows processes to be identified by PID as well as a pointer address.

MFC after:	2 weeks
Sponsored by:	DARPA / AFRL
2016-12-13 19:22:43 +00:00
alc
023ed8da8d Eliminate every mention of PG_CACHED pages from the comments in the machine-
independent layer of the virtual memory system.  Update some of the nearby
comments to eliminate redundancy and improve clarity.

In vm/vm_reserv.c, do not use hyphens after adverbs ending in -ly per
The Chicago Manual of Style.

Update the comment in vm/vm_page.h defining the four types of page queues to
reflect the elimination of PG_CACHED pages and the introduction of the
laundry queue.

Reviewed by:	kib, markj
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D8752
2016-12-12 17:47:09 +00:00
glebius
6042079e9d Allow bogus_page to be passed to pager(s). 2016-12-09 21:21:24 +00:00
markj
b0a8bb2436 Conditionalize PG_CACHE sysctls on COMPAT_FREEBSD11.
Reviewed by:	glebius, imp, jhb
Differential Revision:	https://reviews.freebsd.org/D8736
2016-12-09 18:55:27 +00:00
kib
fd9c8a16f1 Implement the populate() pager method for phys pager.
It allows to provide configurable agressive prefaulting and useful
hints to page daemon about memory allocations, on faults for pages
managed by phys pager.  In fact, this implementation is superior to
the MAP_SHARED_PHYS hack from my Postgresql paper, while giving
similar benefits of reducing the page faults numbers on SysV shared
memory mappings.

Reviewed by:	alc
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2016-12-08 11:35:53 +00:00
kib
2819db13ca Add a new populate() pager method and extend device pager ops vector
with cdev_pg_populate() to provide device drivers access to it.  It
gives drivers fine control of the pages ownership and allows drivers
to implement arbitrary prefault policies.

The populate method is called on a page fault and is supposed to
populate the vm object with the page at the fault location and some
amount of pages around it, at pager's discretion.  VM provides the
pager with the hints about current range of the object mapping, to
avoid instantiation of immediately unused pages, if pager decides so.
Also, VM passes the fault type and map entry protection to the pager,
allowing it to force the optimal required ownership of the mapped
pages.

Installed pages must contiguously fill the returned region, be fully
valid and exclusively busied.  Of course, the pages must be compatible
with the object' type.

After populate() successfully returned, VM fault handler installs as
many instantiated pages into the process page tables as it sees
reasonable, while still obeying the correct semantic for COW and vm
map locking.

The method is opt-in, pager sets OBJ_POPULATE flag to indicate that
the method can be called.  If pager' vm objects can be shadowed, pager
must implement the traditional getpages() method in addition to the
populate().  Populate() might fall back to the getpages() on per-call
basis as well, by returning VM_PAGER_BAD error code.

For now for device pagers, the populate() method is only allowed to be
used by the managed device pagers, but the limitation is only made
because there is no unmanaged fault handlers which could use it right
now.

KPI designed together with, and reviewed by:	alc
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2016-12-08 11:26:11 +00:00
kib
a497cae7b6 Move map_generation snapshot value into struct faultstate.
Reviewed by:	alc
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-12-08 10:29:41 +00:00
kib
5671ac5680 Style.
Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-12-08 10:28:51 +00:00
alc
7571ef95c1 Previously, vm_radix_remove() would panic if the radix trie didn't
contain a vm_page_t at the specified index.  However, with this
change, vm_radix_remove() no longer panics.  Instead, it returns NULL
if there is no vm_page_t at the specified index.  Otherwise, it
returns the vm_page_t.  The motivation for this change is that it
simplifies the use of radix tries in the amd64, arm64, and i386 pmap
implementations.  Instead of performing a lookup before every remove,
the pmap can simply perform the remove.

Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D8708
2016-12-08 04:29:29 +00:00
markj
8fa0e5ab2d Use the official spelling for NULL arguments to typed sysctl handlers.
Reported by:	bde
2016-12-07 01:15:10 +00:00
markj
376b886316 Provide dummy sysctls for v_cache_count and v_tcached.
Some utilities (notably top(1)) exit if any of their input sysctls don't
exist, and the removal of the above-mentioned PG_CACHE-related sysctls
makes it difficult to run such utilities on different versions of the
kernel without recompiling.

Requested by:	bde
2016-12-06 22:52:45 +00:00
alc
1584fca18f Eliminate a stale comment; vm_radix_prealloc() was replaced in r254141.
MFC after:	3 days
2016-12-02 16:29:30 +00:00
alc
93bd5b27d2 During vm_page_cache()'s call to vm_radix_insert(), if vm_page_alloc() was
called to allocate a new page of radix trie nodes, there could be a call to
vm_radix_remove() on the same trie (of PG_CACHED pages) as the in-progress
vm_radix_insert().  With the removal of PG_CACHED pages, we can simplify
vm_radix_insert() and vm_radix_remove() by removing the flags on the root of
the trie that were used to detect this case and the code for restarting
vm_radix_insert() when it happened.

Reviewed by:	kib, markj
Tested by:	pho
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D8664
2016-12-01 17:26:37 +00:00
alc
d95f7c7503 Recursion on the free page queue mutex occurred when UMA needed to allocate
a new page of radix trie nodes to complete a vm_radix_insert() operation
that was requested by vm_page_cache().  Specifically, vm_page_cache()
already held the free page queue lock when UMA tried to acquire it through
a call to vm_page_alloc().  This code path no longer exists, so there is no
longer any reason to allow recursion on the free page queue mutex.

Improve nearby comments.

Reviewed by:	kib, markj
Tested by:	pho
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D8628
2016-11-27 01:42:53 +00:00
markj
4159d33f6b Release laundered vnode pages to the head of the inactive queue.
The swap pager enqueues laundered pages near the head of the inactive queue
to avoid another trip through LRU before reclamation. This change adds
support for this behaviour to the vnode pager and makes use of it in UFS and
ext2fs. Some ioflag handling is consolidated into a common subroutine so
that this support can be easily extended to other filesystems which make use
of the buffer cache. No changes are needed for ZFS since its putpages
routine always undirties the pages before returning, and the laundry
thread requeues the pages appropriately in this case.

Reviewed by:	alc, kib
Differential Revision:	https://reviews.freebsd.org/D8589
2016-11-23 17:53:07 +00:00
alc
4be9876033 Remove PG_CACHED-related fields from struct vmmeter, because they are no
longer used.  More precisely, they are always zero because the code that
decremented and incremented them no longer exists.

Bump __FreeBSD_version to mark this change.

Reviewed by:	kib, markj
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D8583
2016-11-22 18:13:46 +00:00
glebius
1cea9b7cc2 - If caller specifies readbehind and readahead that together with count
doesn't fit into a buf, then trim readbehind and readahead evenly.  If
  rbehind was limited by the previous BMAP, then roundup its trim to
  block size.
- Add KASSERT to check that b_blkno has proper offset from original
  blkno returned by BMAP. [1]
- Add KASSERT to check that pages in buf are consecutive.

Reviewed by:	kib
Submitted by:	kib [1]
2016-11-17 20:32:32 +00:00
kib
47319ab8b8 Move the fast fault path into the separate function.
Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-11-16 16:34:17 +00:00
alc
2fa3607305 Remove most of the code for implementing PG_CACHED pages. (This change does
not remove user-space visible fields from vm_cnt or all of the references to
cached pages from comments.  Those changes will come later.)

Reviewed by:	kib, markj
Tested by:	pho
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D8497
2016-11-15 18:22:50 +00:00
alc
5fd3767e0e Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty
pages, specificially, dirty pages that have passed once through the inactive
queue.  A new, dedicated thread is responsible for both deciding when to
launder pages and actually laundering them.  The new policy uses the
relative sizes of the inactive and laundry queues to determine whether to
launder pages at a given point in time.  In general, this leads to more
intelligent swapping behavior, since the laundry thread will avoid pageouts
when the marginal benefit of doing so is low.  Previously, without a
dedicated queue for dirty pages, the page daemon didn't have the information
to determine whether pageout provides any benefit to the system.  Thus, the
previous policy often resulted in small but steadily increasing amounts of
swap usage when the system is under memory pressure, even when the inactive
queue consisted mostly of clean pages.  This change addresses that issue,
and also paves the way for some future virtual memory system improvements by
removing the last source of object-cached clean pages, i.e., PG_CACHE pages.

The new laundry thread sleeps while waiting for a request from the page
daemon thread(s).  A request is raised by setting the variable
vm_laundry_request and waking the laundry thread.  We request launderings
for two reasons: to try and balance the inactive and laundry queue sizes
("background laundering"), and to quickly make up for a shortage of free
pages and clean inactive pages ("shortfall laundering").  When background
laundering is requested, the laundry thread computes the number of page
daemon wakeups that have taken place since the last laundering.  If this
number is large enough relative to the ratio of the laundry and (global)
inactive queue sizes, we will launder vm_background_launder_target pages at
vm_background_launder_rate KB/s.  Otherwise, the laundry thread goes back
to sleep without doing any work.  When scanning the laundry queue during
background laundering, reactivated pages are counted towards the laundry
thread's target.

In contrast, shortfall laundering is requested when an inactive queue scan
fails to meet its target.  In this case, the laundry thread attempts to
launder enough pages to meet v_free_target within 0.5s, which is the
inactive queue scan period.

A laundry request can be latched while another is currently being
serviced.  In particular, a shortfall request will immediately preempt a
background laundering.

This change also redefines the meaning of vm_cnt.v_reactivated and removes
the functions vm_page_cache() and vm_page_try_to_cache().  The new meaning
of vm_cnt.v_reactivated now better reflects its name.  It represents the
number of inactive or laundry pages that are returned to the active queue
on account of a reference.

In collaboration with:	markj
Reviewed by:	kib
Tested by:	pho
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D8302
2016-11-09 18:48:37 +00:00
bdrewery
30f99dbeef Fix improper use of "its".
Sponsored by:	Dell EMC Isilon
2016-11-08 23:59:41 +00:00
kib
1c7bf4098f Do not sleep in vm_wait() if pagedaemon did not yet started. Panic instead.
Requests which cannot be satisfied by allocators at boot time often
have unrealizable parameters.  Waiting for the pagedaemon' start would
hang the boot if done in the thread0 context and just never succeed if
executed from another thread.  In fact, for very early stages, sleep
attempt panics with obscure diagnostic about the scheduler state, and
explicit panic in vm_wait() makes the investigation much shorter by
cut off the examination of the thread and scheduler.

Theoretically, some subsystem might grab a resource to exhaustion, and
free it later in the boot process.  If this unlikely scenario does
appear for real, the way to diagnose the trouble can be revisited.

Reported by:	emaste
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D8421
2016-11-04 12:58:50 +00:00
alc
2e4bd96be8 In vm_fault()'s loop over the shadow chain, move a comment describing our
invariants to a better place.  Also, add two comments concerning the
relationship between the map and vnode locks.

Reviewed by:	kib
MFC after:	3 days
2016-11-03 16:44:55 +00:00
alc
00978d6f5e Move and revise a comment about the relation between the object's paging-
in-progress count and the vnode.  Prior to r188331, we always acquired
the vnode lock before incrementing the object's paging-in-progress count.
Now, we increment it before attempting to acquire the vnode lock with
LK_NOWAIT, but we never sleep acquiring the vnode lock while we have the
count incremented.

Reviewed by:	kib
MFC after:	3 days
2016-11-01 17:11:10 +00:00