4393 Commits

Author SHA1 Message Date
andrew
1855df7194 As with r354905 use uint16_t to store aflags on the stack and as function
arguments as the aflags size in vm_page_t has increased.

Sponsored by:	DARPA, AFRL
2019-11-20 18:00:43 +00:00
andrew
872335752b Use atomic_load_16 to load aflags as it's a uint16_t after r354820.
Sponsored by:	DARPA, AFRL
2019-11-20 17:49:58 +00:00
dougm
66596a3c10 Instead of looking up a predecessor or successor to the current map
entry, when that entry has been seen already, keep the
already-looked-up value in a variable and use that instead of looking
it up again.

Approved by: alc, markj (earlier version), kib (earlier version)
Differential Revision: https://reviews.freebsd.org/D22348
2019-11-20 16:06:48 +00:00
jeff
65c224f0ae When we set OFFPAGE to limit fragmentation we should also set VTOSLAB
so that we avoid the hashtables.  The hashtable is now only required if
a zone is created with OFFPAGE specified initially, not internally.  This
flag signals to UMA that it can't touch the allocated memory and so
can't store a slab pointer in the containing page.

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D22453
2019-11-20 01:57:33 +00:00
jeff
24f24616c1 Only keep anonymous objects on shadow lists. This eliminates locking of
globally visible objects when they are part of a backing chain.

Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D22423
2019-11-20 00:31:14 +00:00
jeff
83848ec34e Remove unnecessary object locking from the vnode pager. Recent changes to
busy/valid/dirty locking make these acquires redundant.

Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D22186
2019-11-19 23:30:09 +00:00
jeff
be1b482c07 Simplify anonymous memory handling with an OBJ_ANON flag. This eliminates
reudundant complicated checks and additional locking required only for
anonymous memory.  Introduce vm_object_allocate_anon() to create these
objects.  DEFAULT and SWAP objects now have the correct settings for
non-anonymous consumers and so individual consumers need not modify the
default flags to create super-pages and avoid ONEMAPPING/NOSPLIT.

Reviewed by:	alc, dougm, kib, markj
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D22119
2019-11-19 23:19:43 +00:00
dougm
d9935577c8 Drop the extra argument from swp_pager_meta_ctl and have it do lookup
only.  Rename it swp_pager_meta_lookup.  Stop checking for obj->type
== swap there and assert it instead.  Make the caller responsible for
the obj->type check.

Move the meta_ctl 'pop' functionality to swap_pager_unswapped, the
only place that uses it, and assume obj->type == swap there too.

Assisted by: ota_j.email.ne.jp
Reviewed by: kib
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D22437
2019-11-19 08:06:31 +00:00
markj
e882edff4b Group per-domain reservation data in the same structure.
We currently have the per-domain partially populated reservation queues
and the per-domain queue locks.  Define a new per-domain padded
structure to contain both of them.  This puts the queue fields and lock
in the same cache line and avoids the false sharing within the old queue
array.

Also fix field packing in the reservation structure.  In many places we
assume that a domain index fits in 8 bits, so we can do the same there
as well.  This reduces the size of the structure by 8 bytes.

Update some comments while here.  No functional change intended.

Reviewed by:	dougm, kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22391
2019-11-18 18:25:51 +00:00
markj
be16e10a6f Widen the vm_page aflags field to 16 bits.
We are now out of aflags bits, whereas the "flags" field only makes use
of five of its sixteen bits, so narrow "flags" to eight bits.  I have no
intention of adding a new aflag in the near future, but would like to
combine the aflags, queue and act_count fields into a single atomically
updated word.  This will allow vm_page_pqstate_cmpset() to become much
simpler and is a step towards eliminating the use of the page lock array
in updating per-page queue state.

The change modifies the layout of struct vm_page, so bump
__FreeBSD_version.

Reviewed by:	alc, dougm, jeff, kib
Sponsored by:	Netflix, Intel
Differential Revision:	https://reviews.freebsd.org/D22397
2019-11-18 18:22:41 +00:00
dougm
c7bfb0c48b Add a helper function for testing a swap block and freeing it if empty.
Submitted by: ota_j.email.ne.jp
Approved by: alc, kib, dougm
Differential Revision: https://reviews.freebsd.org/D22402
2019-11-17 18:38:37 +00:00
kib
7e07cf21fd Add elf image flag to disable stack gap.
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D22379
2019-11-17 14:54:07 +00:00
dougm
e23fe5a951 The loop in vm_map_protect that verifies that all transition map
entries are stabilized, repeatedly verifies the same entry. Check each
entry in turn.

Reviewed by: kib (code only), alc
Tested by: pho
MFC after: 7 days
Differential Revision: https://reviews.freebsd.org/D22405
2019-11-17 06:50:36 +00:00
dougm
7d1d987a6f Define wrapper functions vm_map_entry_{succ,pred} to act as wrappers
around entry->{next,prev} when those are used for ordered list
traversal, and use those wrapper functions everywhere. Where the next
field is used for maintaining a stack of deferred operations, #define
defer_next to make that different usage clearer, and then use the
'right' pointer instead of 'next' for that purpose.

Approved by: markj
Tested by: pho (as part of a larger patch)
Differential Revision: https://reviews.freebsd.org/D22347
2019-11-13 15:56:07 +00:00
dougm
2db1a9391d swap_pager_meta_free() frees allocated blocks in a way that
exploits the sparsity of allocated blocks in a range, without
issuing an "are you there?" query for every block in the range.
swap_pager_copy() is not so smart.  Modify the implementation
of swap_pager_meta_free() slightly so that swap_pager_copy()
can use that smarter implementation too.

Based on an observation of: Yoshihiro Ota (ota_j.email.ne.jp)
Reviewed by: kib,alc
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D22280
2019-11-11 16:59:49 +00:00
kib
31e78c0a28 Include cache zones into zone_foreach() where appropriate.
The r354367 is reverted since it is subsumed by this, more complete, approach.

Suggested by:	markj
Reviewed by:	alc. glebius, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D22242
2019-11-10 09:25:19 +00:00
dougm
63c1432c42 For vm_map, #defining DIAGNOSTIC to turn on full assertion-based
consistency checking slows performance dramatically. This change
reduces the number of assertions checked by completely walking the
vm_map tree only when the write-lock is released, and only then if the
number of modifications to the tree since the last walk exceeds the
number of tree nodes.

Reviewed by: alc, kib
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D22163
2019-11-09 17:08:27 +00:00
markj
0151d2df1f Drop Giant before sleeping on a busy page.
Before the page busy code was converted to make direct use of
sleepqueues, this was handled by _sleep().

Reported by:	glebius
Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
2019-11-07 18:26:29 +00:00
markj
b75fed1661 Fix a race in release_page().
Since r354156 we may call release_page() without the page's object lock
held, specifically following the page copy during a CoW fault.
release_page() must therefore unbusy the page only after scheduling the
requeue, to avoid racing with a free of the page.  Previously, the
object lock prevented this race from occurring.

Add some assertions that were helpful in tracking this down.

Reported by:	pho, syzkaller
Tested by:	pho
Reviewed by:	alc, jeff, kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22234
2019-11-06 16:59:16 +00:00
kib
e4c3b9879f Switch cache zones from early counters to real implementation.
Early counter mock can be only used on BSP for amd64, when APs try to
update it that causes random memory corruption.

N.B.  This is a temporary patch to plug the corruption for now, while
a proper solution for handling cache zones in zone_foreach() is being
developed.

In collaboration with:	pho
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation, Mellanox Technologies
2019-11-05 21:38:48 +00:00
kib
f07bc81865 vm_page_wire_mapped: explain why failure does not affect correctness.
Reviewed by:	markj (previous version)
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D22196
2019-10-30 17:33:17 +00:00
jeff
bff69757f0 Replace OBJ_MIGHTBEDIRTY with a system using atomics. Remove the TMPFS_DIRTY
flag and use the same system.

This enables further fault locking improvements by allowing more faults to
proceed with a shared lock.

Reviewed by:	kib
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D22116
2019-10-29 21:06:34 +00:00
jeff
747d74bf1b Use atomics and a shared object lock to protect the object reference count.
Certain consumers still need to guarantee a stable reference so we can not
switch entirely to atomics yet.  Exclusive lock holders can still modify
and examine the refcount without using the ref api.

Reviewed by:	kib
Tested by:	pho
Sponsored by:	Netflix, Intel
Differential Revision:	https://reviews.freebsd.org/D21598
2019-10-29 20:58:46 +00:00
jeff
e367407259 Drop the object lock earlier in fault and don't relock it after pmap_enter().
Recent changes in object and page locking have enabled more lock pushdown.

Reviewed by:	kib
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D22036
2019-10-29 20:46:25 +00:00
glebius
fea9cffaa1 Add couple more assertions to vm_pager_assert_in(). The bogus page is
not allowed at ends of the request, and all non-bogus pages must be
consecutive.

Reviewed by:	kib
2019-10-25 16:59:54 +00:00
gallatin
fb7539ef5c Add a tunable to set the pgcache zone's maxcache
When it is set to 0 (the default), a heavy Netflix-style web workload
suffers from heavy lock contention on the vm page free queue called from
vm_page_zone_{import,release}() as the buckets are frequently drained.
When setting the maxcache, this contention goes away.

We should eventually try to autotune this, as well as make this
zone eligable for uma_reclaim().

Reviewed by:	alc, markj
Not Objected to by: jeff
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D22112
2019-10-24 18:39:05 +00:00
markj
7670099e0b Modify release_page() to handle a missing fault page.
r353890 introduced a case where we may call release_page() with
fs.m == NULL, since the fault handler may now lock the vnode prior
to allocating a page for a page-in.

Reported by:	jhb
Reviewed by:	kib
MFC with:	r353890
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22120
2019-10-23 20:39:21 +00:00
markj
3a3f11c990 Check for bogus_page in vnode_pager_generic_getpages_done().
We now assert that a page is busy when updating its validity-tracking
state, but bogus_page is not busied during a getpages operation.

Reported by:	syzkaller
Reviewed by:	alc, kib
Discussed with:	jeff
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22124
2019-10-23 18:00:22 +00:00
markj
a789d36dbe Verify identity after checking for WAITFAIL in vm_page_busy_acquire().
A caller that does not guarantee that a page's identity won't change
while sleeping for a busy lock must specify either NOWAIT or WAITFAIL.

Reported by:	syzkaller
Reviewed by:	alc, kib
Discussed with:	jeff
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22124
2019-10-23 17:58:19 +00:00
kib
b382fb1fa5 Assert that vm_fault_lock_vnode() returns locked saved vnode.
Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D22113
2019-10-23 07:36:26 +00:00
kib
6d8044f142 Assert that vnode_pager_setsize() is called with the vnode exclusively locked
except for filesystems that set the MNTK_VMSETSIZE_BUG,  Set the flag for ZFS.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D21883
2019-10-22 16:21:24 +00:00
kib
e970ed6a1e Add VV_VMSIZEVNLOCK flag.
The flag specifies that vm_fault() handler should check the vnode'
vm_object size under the vnode lock.  It is converted into the object'
OBJ_SIZEVNLOCK flag in vnode_pager_alloc().

Tested by:	pho
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D21883
2019-10-22 16:09:25 +00:00
kib
5c066fbc3a vm_fault(): extract code to lock the vnode into a helper vn_fault_lock_vnode().
Tested by:	pho
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D21883
2019-10-22 15:59:16 +00:00
markj
93c4e46434 Avoid reloading bucket pointers in uma_vm_zone_stats().
The correctness of per-CPU cache accounting in that function is
dependent on reading per-CPU pointers exactly once.  Ensure that
the compiler does not emit multiple loads of those pointers.

Reported and tested by:	pho
Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22081
2019-10-22 14:20:06 +00:00
markj
70e6052cef Further constrain the use of per-CPU caches for free pages.
In low memory conditions a significant number of pages may end up stuck
in the caches, and currently these caches cannot be reaped, leading to
spurious memory allocation failures and OOM kills.  So:

- Take into account the fact that we may cache up to two full buckets
  of pages per CPU, not just one.
- Increase the amount of RAM required per CPU to enable the caches.

This is a temporary measure until the page cache management policy is
improved.

PR:		241048
Reported and tested by:	Kevin Oberman <rkoberman@gmail.com>
Reviewed by:	alc, kib
Discussed with:	jeff
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22040
2019-10-18 17:36:42 +00:00
markj
360bcad613 Apply mapping protections to preloaded kernel modules on amd64.
With an upcoming change the amd64 kernel will map preloaded files RW
instead of RWX, so the kernel linker must adjust protections
appropriately using pmap_change_prot().

Reviewed by:	kib
MFC after:	1 month
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D21860
2019-10-18 13:56:45 +00:00
kib
4cf89d6e98 swapon_check_swzone(): use already calculated static variables.
Submitted by:	ota@j.email.ne.jp
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D22065
2019-10-17 13:49:47 +00:00
markj
84cd531f96 Remove page locking from pmap_mincore().
After r352110 the page lock no longer protects a page's identity, so
there is no purpose in locking the page in pmap_mincore().  Instead,
if vm.mincore_mapped is set to the non-default value of 0, re-lookup
the page after acquiring its object lock, which holds the page's
identity stable.

The change removes the last callers of vm_page_pa_tryrelock(), so
remove it.

Reviewed by:	kib
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D21823
2019-10-16 22:03:27 +00:00
markj
f4dbe34c21 Correct the range boundaries used by kern_mincore().
Reported by:	alc
Sponsored by:	Netflix
2019-10-16 21:47:58 +00:00
jeff
786dad5c20 (5/6) Move the VPO_NOSYNC to PGA_NOSYNC to eliminate the dependency on the
object lock in vm_page_set_validclean().

Reviewed by:    kib, markj
Tested by:      pho
Sponsored by:   Netflix, Intel
Differential Revision:	https://reviews.freebsd.org/D21595
2019-10-15 03:48:22 +00:00
jeff
e249e932a5 (4/6) Protect page valid with the busy lock.
Atomics are used for page busy and valid state when the shared busy is
held.  The details of the locking protocol and valid and dirty
synchronization are in the updated vm_page.h comments.

Reviewed by:    kib, markj
Tested by:      pho
Sponsored by:   Netflix, Intel
Differential Revision:        https://reviews.freebsd.org/D21594
2019-10-15 03:45:41 +00:00
jeff
0a6e7a4266 (3/6) Add a shared object busy synchronization mechanism that blocks new page
busy acquires while held.

This allows code that would need to acquire and release a very large number
of page busy locks to use the old mechanism where busy is only checked and
not held.  This comes at the cost of false positives but never false
negatives which the single consumer, vm_fault_soft_fast(), handles.

Reviewed by:    kib
Tested by:      pho
Sponsored by:   Netflix, Intel
Differential Revision:	https://reviews.freebsd.org/D21592
2019-10-15 03:41:36 +00:00
jeff
209fb8d357 (2/6) Don't release xbusy in vm_page_remove(), defer to vm_page_free_prep().
This persists busy state across operations like rename and replace.

Reviewed by:    kib, markj
Tested by:      pho
Sponsored by:   Netflix, Intel
Differential Revision:  https://reviews.freebsd.org/D21549
2019-10-15 03:38:02 +00:00
jeff
51ed6c3ace (1/6) Replace busy checks with acquires where it is trival to do so.
This is the first in a series of patches that promotes the page busy field
to a first class lock that no longer requires the object lock for
consistency.

Reviewed by:	kib, markj
Tested by:	pho
Sponsored by:	Netflix, Intel
Differential Revision:	https://reviews.freebsd.org/D21548
2019-10-15 03:35:11 +00:00
dougm
01d8b40816 Correct a transcription error that broke GENERIC introduced in r353496. 2019-10-14 17:51:57 +00:00
dougm
6db7785e8f Move the definition of _vm_map_assert_consistent so that it can use
vm_map_free_{left,right} rather than re-implementing them.  Use the
VM_MAP_FOREACH macro where applicable.  Fix some indentation.

Suggested by: kib (in a comment on D21964)
Tested by: pho (as part of D21964)
Differential Revision: https://reviews.freebsd.org/D22011
2019-10-14 17:15:42 +00:00
luporl
57d28447c8 [PPC64] Initial kernel minidump implementation
Based on POWER9BSD implementation, with all POWER9 specific code removed and
addition of new methods in PPC64 MMU interface, to isolate platform specific
code. Currently, the new methods are implemented on pseries and PowerNV
(D21643).

Reviewed by:	jhibbits
Differential Revision:	https://reviews.freebsd.org/D21551
2019-10-14 13:04:04 +00:00
kib
a3fd50a480 Restore nofaulting operations after r352807
The TDP_NOFAULTING flag should be checked in vm_fault(), not in
vm_fault_trap().  Otherwise kernel accesses to userspace, like
vn_io_fault(), enter vm locking when it should not.

Reported and tested by:	pho
Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
Differential revision:	https://reviews.freebsd.org/D21992
2019-10-13 06:56:45 +00:00
cem
917384c7c5 Fix braino in r353429
cy@ points out that I got parameter order backwards between definition and
invocation of the helper function.  He is totally correct.  The earlier
version of this patch predated the XFree column so this is one I introduced,
rather than the original author.

Submitted by:	cy
Reported by:	cy
X-MFC-With:	r353429
2019-10-11 06:02:03 +00:00
cem
43181b339c ddb: Add CSV option, sorting to 'show (malloc|uma)'
Add /i option for machine-parseable CSV output.  This allows ready copy/
pasting into more sophisticated tooling outside of DDB.

Add total zone size ("Memory Use") as a new column for UMA.

For both, sort the displayed list on size (print the largest zones/types
first).  This is handy for quickly diagnosing "where has my memory gone?" at
a high level.

Submitted by:	Emily Pettigrew <Emily.Pettigrew AT isilon.com> (earlier version)
Sponsored by:	Dell EMC Isilon
2019-10-11 01:31:31 +00:00