3958 Commits

Author SHA1 Message Date
jeff
c17fd15c00 Fix arc after r326347 broke various memory limit queries. Use UMA features
rather than kmem arena size to determine available memory.

Initialize the UMA limit to LONG_MAX to avoid spurious wakeups on boot before
the real limit is set.

PR:		224330 (partial), 224080
Reviewed by:	markj, avg
Sponsored by:	Netflix / Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D13494
2018-01-02 04:35:56 +00:00
kib
0312e7bc4d Do not let vm_daemon run unbounded.
On a load where single anonymous object consumes almost all memory on
the large system, swapout code executes the iteration over the
corresponding object page queue for long time, owning the map and
object locks.  This blocks pagedaemon which tries to lock the object,
and blocks other threads in the process in vm_fault() waiting for the
map lock.

Handle the issue by terminating the deactivation loop if we executed
too long and by yielding at the top level in vm_daemon.

Reported by:	peterj, pho
Reviewed by:	alc
Tested by:	pho (as part of the larger patch)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D13671
2018-01-01 19:27:33 +00:00
alc
c4f7f60d06 The variable "minslptime" is pointless and always has been, ever since its
introduction in r83366.  (At that time, this code appeared in vm/vm_glue.c,
because vm/vm_swapout.c did not exist.)  When the FOREACH_THREAD loop
completes, we know that the sleep time for every thread is above whichever
threshold is being applied.

Reviewed by:	kib
X-MFC with:	r327354
2017-12-31 21:36:42 +00:00
alc
a29455420d Previously, swap_pager_copy() freed swap blocks one at at time, via
swp_pager_meta_ctl(), with no opportunity to recognize freeing of
consecutive blocks and free fewer block ranges.  To open that opportunity,
this change removes the SWM_FREE option from swp_pager_meta_ctl(), and
compels the caller to do the freeing when a valid block address is returned.
In swap_pager_copy(), these frees are aggregated, so that a sequence of them
can be done at one time.

The only other caller to swp_pager_meta_ctl() that passed SWM_FREE,
swp_pager_unswapped(), is also modified to handle its single free
explicitly.

Submitted by:	Doug Moore <dougm@rice.edu>
Reviewed by:	kib (an earlier version)
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D13290
2017-12-31 04:01:47 +00:00
kib
4c4b40eeeb Do not lock vm map in swapout_procs().
Neither swapout_procs() nor swapout() access the map.  Since the
process' vmspace is referenced only to obtain the pointer to the
vm_map, the reference is not needed as well.

Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D13681
2017-12-29 20:33:56 +00:00
kib
a1b116566d Style.
Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D13678
2017-12-29 19:05:07 +00:00
alc
51e141cad5 After r327168, the variable "vm_pageout_wanted" can be static.
MFC after:	2 weeks
2017-12-29 17:02:22 +00:00
kib
c8236b191f Clean up the comment.
Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D13671
2017-12-28 23:50:21 +00:00
kib
d50d0b2935 In vm_swapout_map_deactivate_pages(), it is enough to lock the map for read.
Reviewed by:	alc, markj (as part of the larger patch)
Tested by:	pho (again, as part of the larger patch)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D13671
2017-12-28 22:56:30 +00:00
alc
e5d6749a39 Refactor vm_map_find(), creating a separate function, vm_map_alignspace(),
for finding aligned free space in the given map.  With this change, we
always return KERN_NO_SPACE when we fail to find free space.  Whereas,
previously, we might return KERN_INVALID_ADDRESS.  Also, with this change,
we explicitly check for address wrap, rather than relying upon the map's
min and max addresses to establish sentinel-like regions.

This refactoring was inspired by the problem that we addressed in r326098.

Reviewed by:	kib
Tested by:	pho
Discussed with:	markj
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D13346
2017-12-26 17:59:37 +00:00
markj
391d859963 Ensure that pass > 0 when starting a scan with vm_pages_needed == 1.
Otherwise the page daemon will not reclaim pages and thus will not
wake threads sleeping in VM_WAIT.

Reported and tested by:	pho
Reviewed by:	alc, kib
X-MFC with:	r327168
Differential Revision:	https://reviews.freebsd.org/D13640
2017-12-26 16:29:39 +00:00
alc
5037be343c Make the vm object bypass and collapse counters per CPU.
Requested by:	mjg
Reviewed by:	kib, markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D13611
2017-12-25 19:36:04 +00:00
markj
1bd48ad884 Fix two problems with the page daemon control loop.
Both issues caused the page daemon to erroneously go to sleep when
applications are consuming free pages at a high rate, leaving the
application threads blocked in VM_WAIT.

1) After completing an inactive queue scan, concurrent allocations may
   have prevented the page daemon from meeting the v_free_min threshold.
   In this case, the page daemon was going to sleep even when the
   inactive queue contained plenty of clean pages.
2) pagedaemon_wakeup() may be called without the free queues lock held.
   This can lead to a lost wakeup if a call occurs after the page daemon
   clears vm_pageout_wanted but before going to sleep.

Fix 1) by ensuring that we start a new inactive queue scan immediately
if v_free_count < v_free_min after a prior scan.

Fix 2) by adding a new subroutine, pagedaemon_wait(), called from
vm_wait() and vm_waitpfault(). It wakes up the page daemon if either
vm_pages_needed or vm_pageout_wanted is false, and atomically sleeps
on v_free_count.

Reported by:	jeff
Reviewed by:	alc
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D13424
2017-12-24 19:45:16 +00:00
kib
d84b266e99 Perform all accesses to uma_reclaim_needed using atomic(9) KPI.
Reviewed by:	alc, jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D13534
2017-12-19 10:06:55 +00:00
markj
35e385af34 Use a dedicated counter for inactive queue scans.
The laundry thread keeps track of the number of inactive queue scans
performed by the page daemon, and was previously using the v_pdwakeups
counter to count them. However, in some cases the inactive queue may
be scanned multiple times after a single wakeup, so it's more accurate
to use a dedicated counter.

Reviewed by:	alc, kib (previous version)
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D13422
2017-12-11 15:33:24 +00:00
markj
3479530814 Fix the act_scan_laundry_weight mechanism.
r292392 modified the active queue scan to weigh clean pages differently
from dirty pages when attempting to meet the inactive queue target. When
r306706 was merged into the PQ_LAUNDRY branch, this mechanism was
broken. Fix it by scalaing the correct page shortage variable.

Reviewed by:	alc, kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D13423
2017-12-09 15:47:26 +00:00
markj
90ce7fadcd Fix the UMA reclaim worker after r326347.
atomic_set_*() sets a bit in the target memory location, so
atomic_set_int(&uma_reclaim_needed, 0) does not do what it looks like
it does.

PR:		224080
Reviewed by:	jeff, kib
Differential Revision:	https://reviews.freebsd.org/D13412
2017-12-07 19:38:09 +00:00
markj
01b67530d5 Use unique wait messages in the page daemon control loop.
Discussed with:	alc
MFC after:	1 week
2017-12-06 18:36:54 +00:00
andrew
c4fd89c6ba Print the correct value when freelist is out of range.
Security:	:
Sponsored by:	DARPA, AFRL
2017-12-04 11:16:51 +00:00
mizhka
20c3a101da [mips] [vm] restore translation of freelist to flind for page allocation
Commit r326346 moved domain iterators from physical layer to vm_page one,
but it also removed translation of freelist to flind for
vm_page_alloc_freelist() call. Before it expects VM_FREELIST_ parameter,
but after it expect freelist index.

On small WiFi boxes with few megabytes of RAM, there is only one freelist
VM_FREELIST_LOWMEM (1) and there is no VM_FREELIST_DEFAULT(0) (see file
sys/mips/include/vmparam.h). It results in freelist 1 with flind 0.

At first, this commit renames flind to freelist in vm_page_alloc_freelist
to avoid misunderstanding about input parameters. Then on physical layer it
restores translation for correct handling of freelist parameter.

Reported by:	landonf
Reviewed by:	jeff
Differential Revision:	https://reviews.freebsd.org/D13351
2017-12-04 08:08:55 +00:00
kib
2f5704ab57 Add comment for vm_map_find_min().
Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
X-Differential revision:	https://reviews.freebsd.org/D13155
2017-12-01 10:53:08 +00:00
pfg
155122ce53 SPDX: Consider code from Carnegie-Mellon University.
Interesting cases, most likely from CMU Mach sources.
2017-11-30 15:48:35 +00:00
pfg
8aa269d57f SPDX: wrong license. 2017-11-30 15:45:42 +00:00
markj
40d9c0da1e Verify the object/vnode association after vget() in vm_pageout_clean().
It's theoretically possible for the vnode and object to be disassociated
while locks are dropped around the vget() call, in which case we
shouldn't proceed with laundering.

Noted and reviewed by:	kib
MFC after:	1 week
2017-11-29 19:47:09 +00:00
markj
1643870966 Remove some comments that became incorrect with r325530. 2017-11-29 14:34:05 +00:00
jeff
990ca74cdc Eliminate kmem_arena and kmem_object in preparation for further NUMA commits.
The arena argument to kmem_*() is now only used in an assert.  A follow-up
commit will remove the argument altogether before we freeze the API for the
next release.

This replaces the hard limit on kmem size with a soft limit imposed by UMA.  When
the soft limit is exceeded we periodically wakeup the UMA reclaim thread to
attempt to shrink KVA.  On 32bit architectures this should behave much more
gracefully as we exhaust KVA.  On 64bit the limits are likely never hit.

Reviewed by:	markj, kib (some objections)
Discussed with:	alc
Tested by:	pho
Sponsored by:	Netflix / Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D13187
2017-11-28 23:40:54 +00:00
jeff
f93de233c6 Move domain iterators into the page layer where domain selection should take
place.  This makes the majority of the phys layer explicitly domain specific.

Reviewed by:	markj, kib (some objections)
Discussed with:	alc
Tested by:	pho
Sponsored by:	Netflix & Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D13014
2017-11-28 23:18:35 +00:00
alc
c123c7433b When the swap pager allocates space on disk, it requests contiguous
blocks in a single call to blist_alloc().  However, when it frees
that space, it previously called blist_free() on each block, one at a
time.  With this change, the swap pager identifies ranges of
contiguous blocks to be freed, and calls blist_free() once per
range.  In one extreme case, that is described in the review, the time
to perform an munmap(2) was reduced by 55%.

Submitted by:	Doug Moore <dougm@rice.edu>
Reviewed by:	kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D12397
2017-11-28 17:46:03 +00:00
markj
8d4225b0be Avoid unnecessary lookups when initializing the vm_page array.
This gives a marginal improvement in the vm_page_array initialization
time. Also garbage-collect the now-unused vm_phys_paddr_to_segind().

Reviewed by:	alc, kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D13270
2017-11-27 17:46:38 +00:00
pfg
78a6b08618 sys: general adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

No functional change intended.
2017-11-27 15:23:17 +00:00
markj
9bf667025b Move vm_phys_init_page() to vm_page.c.
Suggested by:	kib
Reviewed by:	alc, kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D13250
2017-11-26 19:17:55 +00:00
markj
963bf8102f Remove unneeded initializations from vm_phys_init_page().
The page allocator always initializes the aflags and oflags fields.

Reviewed by:	alc, kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D13242
2017-11-26 19:16:45 +00:00
kib
1c96478ead Return different error code for the guard page layout violation.
On KERN_NO_SPACE error, as it is returned now, vm_map_find() continues
the loop searching for the suitable range for the requested mapping
with specific alignment.  Since the vm_map_findspace() succesfully
finds the same place, the loop never ends.

The errors returned from vm_map_stack() completely repeat the behavior
of vm_map_insert() now, as suggested by Alan.

Reported by:	Arto Pekkanen <aksyom@gmail.com>
PR:	223732
Reviewed by:	alc, markj
Discussed with:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
Differential revision:	https://reviews.freebsd.org/D13186
2017-11-22 16:45:27 +00:00
alc
d21a0d7e26 When vm_map_find(find_space = VMFS_OPTIMAL_SPACE) fails to find space, a
second scan of the address space with find_space = VMFS_ANY_SPACE is
performed.  Previously, vm_map_find() released and reacquired the map lock
between the first and second scans.  However, there is no compelling
reason to do so.  This revision modifies vm_map_find() to retain the map
lock.

Reviewed by:	jhb, kib, markj
MFC after:	1 week
X-Differential Revision:	https://reviews.freebsd.org/D13155
2017-11-22 16:39:24 +00:00
markj
f761f23093 Allow for fictitious physical pages in vm_page_scan_contig().
Some drm2 drivers will set PG_FICTITIOUS in physical pages in order to
satisfy the OBJT_MGTDEVICE object interface, so a scan may encounter
fictitous pages. For now, allow for this possibility; such pages will be
skipped later in the scan since they are wired.

Reported by:	avg
Reviewed by:	kib
MFC after:	1 week
2017-11-21 13:17:40 +00:00
pfg
4736ccfd9c sys: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
2017-11-20 19:43:44 +00:00
pfg
9da7bdde06 spdx: initial adoption of licensing ID tags.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.

Initially, only tag files that use BSD 4-Clause "Original" license.

RelNotes:	yes
Differential Revision:	https://reviews.freebsd.org/D13133
2017-11-18 14:26:50 +00:00
kib
ea07ce4ed0 vmtotal: extend memory counters to accomodate for current and future
hardware sizes.

32bit counters already overflow on approachable virtual memory page
counts, and soon would overflow on the physical pages counts as well.
Bump sizes to 64bit types.  Bump __FreeBSD_version.

It is impossible to provide perfect backward ABI compat for this
change.  If a program requests an old structure, it can be detected by
size.  But if it queries the size first by passing NULL old req
pointer, there is almost nothing we can do to detect the desired ABI.
As a partial solution, check p_osrel of the quering process when
selecting the size to report.

Submitted by:	Pawel Biernacki <pawel.biernacki@gmail.com>
Differential revision:	https://reviews.freebsd.org/D13018
2017-11-15 13:41:03 +00:00
kib
c05cd25802 Fix operator priority.
Sponsored by:	The FreeBSD Foundation
2017-11-08 23:25:05 +00:00
markj
32541ba3eb Allow various page daemon parameters to be set from loader.conf.
MFC after:	1 week
2017-11-08 19:55:17 +00:00
jeff
3c355d849c Replace manyinstances of VM_WAIT with blocking page allocation flags
similar to the kernel memory allocator.

This simplifies NUMA allocation because the domain will be known at wait
time and races between failure and sleeping are eliminated.  This also
reduces boilerplate code and simplifies callers.

A wait primitive is supplied for uma zones for similar reasons.  This
eliminates some non-specific VM_WAIT calls in favor of more explicit
sleeps that may be satisfied without new pages.

Reviewed by:	alc, kib, markj
Tested by:	pho
Sponsored by:	Netflix, Dell/EMC Isilon
2017-11-08 02:39:37 +00:00
markj
59bc046828 Correct the type of foff.
No functional change intended.

Github PR:	124
Submitted by:	Wuyang Chung <wuyang.m.chung@outlook.com>
MFC after:	1 week
2017-11-08 01:53:03 +00:00
alc
1728bf4480 Micro-optimize the handling of fictitious pages in vm_page_free_prep().
A fictitious page is always wired, so there is no point in trying to
remove one from the page queues.

Completely remove one inaccurate comment from vm_page_free_prep() and
correct another.

Reviewed by:	kib, markj
MFC after:	1 week
2017-10-24 17:14:53 +00:00
trasz
b1c2085f71 Add OID for the vm.overcommit sysctl. This makes it possible to remove
one call to sysctl(2) from jemalloc startup code. (That also requires
changes to jemalloc, but I plan to push those to upstream first.)

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D12745
2017-10-22 10:35:29 +00:00
kib
09c8e869a8 Check that the page which is freed as zeroed, indeed has all-zero content.
This catches some rare mysterious failures at the source.  The check
is only performed on architectures which implement direct map, and
only enabled with option DIAGNOSTIC, similar to other costly
consistency checks.

Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2017-10-21 17:28:12 +00:00
markj
042c6a8f6c Free the right address range if kmem_back() fails in memguard_alloc().
MFC after:	1 week
Sponsored by:	Dell EMC Isilon
2017-10-20 21:13:19 +00:00
kib
3b0b7e2626 Take the vm object lock in read mode in vnode_generic_putpages().
Only upgrade it to write mode if we need to clear dirty bits of the
partially valid page after EOF.

Suggested and reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2017-10-20 18:40:29 +00:00
kib
20d037559d Move swapout code into vm/vm_swapout.c.
There is no NO_SWAPPING #ifdef left in the code.

Requested by:	alc
Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
Differential revision:	https://reviews.freebsd.org/D12663
2017-10-20 09:10:49 +00:00
kib
73e703b999 Do not overwrite clean blocks on pageout.
If filesystem block size is less than the page size, it is possible
that the page-out run contains partially clean pages.  E.g., the chunk
of the page might be bdwrite()-ed, or some thread performed bwrite()
on a buffer which references a chunk of the paged out page.  As
result, the assertion added in r319975, which checked that all pages
in the run are dirty, does not hold on such filesystems.

One solution is to remove the assert, but it is undesirable, because
we do overwrite the valid on-disk content. I cannot provide a scenario
where such write would corrupt the file data, but I do not like it on
principle.  Another, in my opinion proper, solution is to only write
parts of the pages still marked dirty.  The patch implements this, it
skips clean blocks and only writes the dirty block runs.

Note that due to clustering, write one page might clean other pages in
the run, so the next write range must be calculated only after the
current range is written out.

More, due to a possible invalidation, and the fact that the object
lock is dropped and reacquired before the checks, it is possible that
the whole page-out pages run appears to consist of only clean pages.
For this reason, it is impossible to assert that there is some work
for the pageout method to do (i.e. assert that there is at least one
dirty page in the run).  But such clearing can only occur due to
invalidation, and not due to a parallel write, because we own the
vnode lock exclusive.

Reported by:	fsu
In collaboration with:	pho
Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
Differential revision:	https://reviews.freebsd.org/D12668
2017-10-20 08:32:37 +00:00
kib
9a4db25ceb In vm_page_free_phys_pglist(), do not take vm_page_queue_free_mtx if
there is nothing to do.

Suggested by:	mjg
Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-10-20 08:25:49 +00:00