Commit Graph

4242 Commits

Author SHA1 Message Date
Jeff Roberson
d966c7615f Slightly optimize locking in vm_map_copy_swap_entry(). Anonymous objects
require the object lock to synchronize collapse.  Other swap objects such
as tmpfs do not.

Reported by:	mjg
Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D22747
2019-12-15 02:02:27 +00:00
Jeff Roberson
af00971419 Handle pagein clustering in vm_page_grab_valid() so that it can be used by
exec_map_first_page().  This will also enable pagein clustering for other
interested consumers (tmpfs, md, etc).

Discussed with:	alc
Approved by:	kib
Differential Revision:	https://reviews.freebsd.org/D22731
2019-12-15 02:00:32 +00:00
Ryan Libby
815db20425 uma dbg: flexible size for slab debug bitset too
Recently (r355315) the size of the struct uma_slab bitset field us_free
became dynamic instead of conservative.  Now, make the debug bitset
size dynamic too.  The debug bitset is INVARIANTS-only, so in fact we
don't care too much about the space savings that results from this, but
enabling minimally-sized slabs on INVARIANTS builds is still important
in order to be able to test new slab layouts effectively.

Reviewed by:	jeff (previous version), markj (previous version)
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D22759
2019-12-14 05:21:56 +00:00
Mark Johnston
325c4ced0d Restore the reservation of boot pages for bucket zones after r355707.
uma_startup2() sets booted = BOOT_BUCKETS after calling bucket_init(),
but before that assignment, startup_alloc() will use pages from the
reserved pool, so the bucket zones themselves are still allocated using
startup pages.

Reviewed by:	rlibby
Reported by:	Jenkins via lwhsu
Differential Revision:	https://reviews.freebsd.org/D22797
2019-12-13 18:28:01 +00:00
Ryan Libby
d82c8ffb16 Revert r355706 & r355710
The quick fix didn't work.  I'll sort it out tomorrow.

Revert r355710: "libmemstat: unbreak build"
Revert r355706: "uma dbg: flexible size for slab debug bitset too"
2019-12-13 11:21:28 +00:00
Ryan Libby
f7af501519 uma: report slab efficiency
Reviewed by:	jeff
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D22766
2019-12-13 09:32:09 +00:00
Ryan Libby
3182660a85 uma: delay bucket_init() until we might actually enable buckets
This helps with a bootstrapping problem in upcoming work.

We don't first enable buckets until uma_startup2(), so we can delay
bucket creation until then.  The other two paths to bucket_enable() are
both later, one in the pageout daemon (SI_SUB_KTHREAD_PAGE vs SI_SUB_VM)
and one in uma_timeout() (first activated in uma_startup3()).  Note that
although some bucket functions are accessible before uma_startup2()
(e.g. bucket_select() in zone_ctor()), none of them inspect ubz_zone.

Discussed with:	jeff
Reviewed by:	markj
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D22765
2019-12-13 09:32:03 +00:00
Ryan Libby
7508f15ff1 uma dbg: flexible size for slab debug bitset too
Recently (r355315) the size of the struct uma_slab bitset field us_free
became dynamic instead of conservative.  Now, make the debug bitset
size dynamic too.  The debug bitset is INVARIANTS-only, so in fact we
don't care too much about the space savings that results from this, but
enabling minimally-sized slabs on INVARIANTS builds is still important
in order to be able to test new slab layouts effectively.

Reviewed by:	jeff, markj
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D22759
2019-12-13 09:31:59 +00:00
Mark Johnston
cbc080b4c4 Avoid relying on silent type casting in the native atomic_load_32.
Reported by:	np
2019-12-12 23:55:34 +00:00
Mark Johnston
6fbaf6859c Implement atomic state updates using the new vm_page_astate_t structure.
Introduce primitives vm_page_astate_load() and vm_page_astate_fcmpset()
to operate on the 32-bit per-page atomic state.  Modify
vm_page_pqstate_fcmpset() to use them.  No functional change intended.

Introduce PGA_QUEUE_OP_MASK, a subset of PGA_QUEUE_STATE_MASK that only
includes queue operation flags.  This will be used in subsequent
patches.

Reviewed by:	alc, jeff, kib
Sponsored by:	Netflix, Intel
Differential Revision:	https://reviews.freebsd.org/D22753
2019-12-12 21:13:20 +00:00
Doug Moore
037c0994bf Extract code common to _vm_map_clip_start and _vm_map_clip_end into a
function, vm_map_entry_clone, that can be invoked by each.

Reviewed by: kib, markj
Differential Revision: https://reviews.freebsd.org/D22760
2019-12-11 16:09:57 +00:00
Ryan Libby
6d204a6a0e uma: pretty print zone flags sysctl
Requested by:	jeff
Reviewed by:	jeff, markj
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D22748
2019-12-11 06:50:55 +00:00
Mark Johnston
9be9ea420e Add a helper function to the swapout daemon's deactivation code.
vm_swapout_object_deactivate_pages() is renamed to
vm_swapout_object_deactivate(), and the loop body is moved into the new
vm_swapout_object_deactivate_page().  This makes the code a bit easier
to follow and is in preparation for some functional changes.

Reviewed by:	jeff, kib
Sponsored by:	Netflix, Intel
Differential Revision:	https://reviews.freebsd.org/D22651
2019-12-10 18:15:20 +00:00
Mark Johnston
5cff1f4dc3 Introduce vm_page_astate.
This is a 32-bit structure embedded in each vm_page, consisting mostly
of page queue state.  The use of a structure makes it easy to store a
snapshot of a page's queue state in a stack variable and use cmpset
loops to update that state without requiring the page lock.

This change merely adds the structure and updates references to atomic
state fields.  No functional change intended.

Reviewed by:	alc, jeff, kib
Sponsored by:	Netflix, Intel
Differential Revision:	https://reviews.freebsd.org/D22650
2019-12-10 18:14:50 +00:00
Doug Moore
7887f00d2b Revert r355505. The code that it allowed to compile has been removed. 2019-12-09 05:09:46 +00:00
Doug Moore
8b75b1ad0d Define a vm_map method for user-space for advancing from a map entry
to its successor in cases where examining a map entry requires a
helper like kvm_read_all.  Use that method, with kvm_read_all, to fix
procstat_getfiles_kvm, which tries to find the successor now without
using such a helper.  This addresses a problem introduced by r355491.

Reviewed by: markj (previous version)
Discussed with: kib
Differential Revision: https://reviews.freebsd.org/D22728
2019-12-08 22:33:51 +00:00
Mateusz Guzik
abd80ddb94 vfs: introduce v_irflag and make v_type smaller
The current vnode layout is not smp-friendly by having frequently read data
avoidably sharing cachelines with very frequently modified fields. In
particular v_iflag inspected for VI_DOOMED can be found in the same line with
v_usecount. Instead make it available in the same cacheline as the v_op, v_data
and v_type which all get read all the time.

v_type is avoidably 4 bytes while the necessary data will easily fit in 1.
Shrinking it frees up 3 bytes, 2 of which get used here to introduce a new
flag field with a new value: VIRF_DOOMED.

Reviewed by:	kib, jeff
Differential Revision:	https://reviews.freebsd.org/D22715
2019-12-08 21:30:04 +00:00
Jeff Roberson
3b490537f4 Fix two problems with r355149. The sysctl name collision code assumed that
zones would never be freed.  In the case of tmpfs this was not true.  While
here test for the right bit to disable the keg related sysctls for zones
that don't have kegs.

Reported by:	pho
Reviewed by:	rlibby
Differential Revision:	https://reviews.freebsd.org/D22655
2019-12-08 01:55:23 +00:00
Jeff Roberson
cff8481de4 It is safe to wire a page while the object is busy.
Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D22636
2019-12-08 01:49:53 +00:00
Jeff Roberson
2306558c54 It is now safe to rename a page that is still on a queue. Allowing this
is necessary for a forthcoming patch.

Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D22636
2019-12-08 01:49:03 +00:00
Jeff Roberson
d8ad7b7d6a Do not assert that the object lock is held in vm_object_set_writeable_dirty.
A valid reference is all that is required.  If we race with a deallocation
we will harmlessly misidentify the type of an already dead object.

Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D22636
2019-12-08 01:47:29 +00:00
Jeff Roberson
fb1d575ceb Reduce duplication in grab functions by providing allocflags based inlines.
Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D22635
2019-12-08 01:16:22 +00:00
Jeff Roberson
1e0701e1e5 Use a variant slab structure for offpage zones. This saves space in
embedded slabs but also is an opportunity to tidy up code and add
accessor inlines.

Reviewed by:	markj, rlibby
Differential Revision:	https://reviews.freebsd.org/D22609
2019-12-08 01:15:06 +00:00
Mark Johnston
c0829bb1d6 Add casts required by the 32-bit build after r355491. 2019-12-08 00:02:36 +00:00
Mark Johnston
3098cd73a3 Provide vm_map_entry traversal routines to userspace.
This is required for now to allow libprocstat to compile.

Discussed with:	dougm
2019-12-07 19:36:40 +00:00
Mateusz Guzik
91caa9b8c1 vm: fix sysctl vm.kstack_cache_size change report
Cache gets resized correctly, but sysctl reports the wrong number:
# sysctl vm.kstack_cache_size=512
vm.kstack_cache_size: 128 -> 128

patched:
vm.kstack_cache_size: 128 -> 512

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D22717
Fixes:	r355002 "Revise the page cache size policy."
2019-12-07 17:28:41 +00:00
Doug Moore
c1ad5342a6 Remove the next and prev fields from vm_map_entry, to save a bit of
space.  Where the vm_map tree now has null pointers, store pointers to
next and previous entries in right and left fields, making the binary
tree threaded.  Have the predecessor and successor functions compute
what the prev and next fields previously stored.

Reviewed by: markj, kib (previous version)
Tested by: pho (previous version)
Differential Revision: https://reviews.freebsd.org/D21964
2019-12-07 17:14:33 +00:00
Justin Hibbits
caef3e1280 powerpc/pmap: NUMA-ize vm_page_array on powerpc
Summary:
This matches r351198 from amd64.  This only applies to AIM64 and Book-E.
On AIM64 it short-circuits with one domain, to behave similar to
existing.  Otherwise it will allocate 16MB huge pages to hold the page
array, across all NUMA domains.  On the first domain it will shift the
page array base up, to "upper-align" the page array in that domain, so
as to reduce the number of pages from the next domain appearing in this
domain.  After the first domain, subsequent domains will be allocated in
full 16MB pages, until the final domain, which can be short.  This means
some inner domains may have pages accounted in earlier domains.

On Book-E the page array is setup at MMU bootstrap time so that it's
always mapped in TLB1, on both 32-bit and 64-bit.  This reduces the TLB0
overhead for touching the vm_page_array, which reduces up to one TLB
miss per array access.

Since page_range (vm_page_startup()) is no longer used on Book-E but is on
32-bit AIM, mark the variable as potentially unused, rather than using a
nasty #if defined() list.

Reviewed by:	luporl
Differential Revision:	https://reviews.freebsd.org/D21449
2019-12-07 03:34:03 +00:00
Mark Johnston
a6f21d15dd Fix fault_type handling in vm_map_lookup().
Suppose that the map entry is wired, so that we later assign
fault_type = entry->protection.  Suppose further that we jump back to
RetryLookup.  Then fault_type will no longer contain the original
fault protection mask, but instead that of the wired entry.

Submitted by:	Wuyang Chung <wuyang.chung1@gmail.com>
Reviewed by:	kib
MFC after:	3 days
Github PR:	https://github.com/freebsd/freebsd/pull/419
Differential Revision:	https://reviews.freebsd.org/D22683
2019-12-06 23:39:08 +00:00
Mark Johnston
ed2f945a39 Fix an off-by-one error in vm_map_pmap_enter().
If the starting pindex is equal to object->size, there is nothing to do.
This was harmless since the rest of vm_map_pmap_enter() has no effect
when psize == 0.

Submitted by:	Wuyang Chung <wuyang.chung1@gmail.com>
Reviewed by:	alc, dougm, kib
MFC after:	1 week
Github PR:	https://github.com/freebsd/freebsd/pull/417
Differential Revision:	https://reviews.freebsd.org/D22678
2019-12-04 19:46:48 +00:00
Andrew Turner
b75c4efcd2 Fix the signature for zone_import and zone_release
These are cast to uma_import and uma_release functions. Use the signature
for these in the zone functions.

This was found with an experimental Kernel CFI. It will complain if the
signature is different than what a function pointer expects. The
simplest way to fix these is to correct the signature.

Reviewed by:	rlibby
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D22671
2019-12-04 18:40:05 +00:00
Jeff Roberson
9b78b1f433 Use a precise bit count for the slab free items in UMA. This significantly
shrinks embedded slab structures.

Reviewed by:	markj, rlibby (prior version)
Differential Revision:	https://reviews.freebsd.org/D22584
2019-12-02 22:44:34 +00:00
Jeff Roberson
0f9e06e18b Fix a few places that free a page from an object without busy held. This is
tightening constraints on busy as a precursor to lockless page lookup and
should largely be a NOP for these cases.

Reviewed by:	alc, kib, markj
Differential Revision:	https://reviews.freebsd.org/D22611
2019-12-02 22:42:05 +00:00
Konstantin Belousov
67388836f3 Store the bottom of the shadow chain in OBJ_ANON object->handle member.
The handle value is stable for all shadow objects in the inheritance
chain.  This allows to avoid descending the shadow chain to get to the
bottom of it in vm_map_entry_set_vnode_text(), and eliminate
corresponding object relocking which appeared to be contending.

Change vm_object_allocate_anon() and vm_object_shadow() to handle more
of the cred/charge initialization for the new shadow object, in
addition to set up the handle.

Reported by:	jeff
Reviewed by:	alc (previous version), jeff (previous version)
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differrential revision:	https://reviews.freebsd.org/D22541
2019-12-01 20:43:04 +00:00
Jeff Roberson
886b90219a Restore swap space accounting for non-anonymous swap objects. This was
broken in r355082.  Reduce some locking in nearby related object type
checks.

Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D22565
2019-11-29 19:57:49 +00:00
Jeff Roberson
f2410510db Avoid acquiring the object lock if color is already set. It can not be
unset until the object is recycled so this check is stable.  Now that we
can acquire the ref without a lock it is not necessary to group these
operations and we can avoid it entirely in many cases.

Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D22565
2019-11-29 19:49:20 +00:00
Jeff Roberson
26c4e9831b Fix a perf regression from r355122. We can use a shared lock to drop the
last ref on vnodes.

Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D22565
2019-11-29 19:47:40 +00:00
Jeff Roberson
6d6a03d7a8 Handle large mallocs by going directly to kmem. Taking a detour through
UMA does not provide any additional value.

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D22563
2019-11-29 03:14:10 +00:00
Doug Moore
85b7bedb15 Functions that call vm_map_splay_merge sometimes set data fields
(e.g. root->left = NULL) to affect the behavior of that function. This
change stops that data manipulation, and instead calls a pair of
functions, one for the left direction and the other for the right,
with the function called depending whether or not we currently null
the root child in that direction to control the behavior of
vm_map_splay_merge.

Reviewed by: kib
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D22589
2019-11-29 02:06:45 +00:00
Jeff Roberson
584061b480 Garbage collect the mostly unused us_keg field. Use appropriately named
union members in vm_page.h to store the zone and slab.  Remove some nearby
dead code.

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D22564
2019-11-28 07:49:25 +00:00
Ryan Libby
35ec24f362 uma: move sysctl vm.uma defn out from under INVARIANTS
Fix non-INVARIANTS builds after r355149.

Reported by:	Michael Butler <imb@protected-networks.net>
Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D22588
2019-11-28 04:15:16 +00:00
Jeff Roberson
20a4e15451 Implement a sysctl tree for uma zones to assist in debugging and provide
more statistcs than are exported via the ABI stable vmstat interface.
Rename uz_count to uz_bucket_size because even I was confused by the
name after returning to the source years later.

Reviewed by:	rlibby
Differential Revision:	https://reviews.freebsd.org/D22554
2019-11-28 00:19:09 +00:00
Jeff Roberson
0a81b4395e Refactor uma_zfree_arg into several functions to make control flow more
clear and icache usage cleaner.

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D22491
2019-11-27 23:19:06 +00:00
Doug Moore
1867d2f2e9 Inline some splay helper functions to improve performance on a
micro-benchmark.

Reviewed by: markj
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D22544
2019-11-27 21:00:44 +00:00
Ryan Libby
ca293436d1 uma: trash memory when ctor/dtor supplied too
On INVARIANTS kernels, UMA has a use-after-free detection mechanism.
This mechanism previously required that all of the ctor/dtor/uminit/fini
arguments to uma_zcreate() be NULL in order to function.  Now, it only
requires that uminit and fini be NULL; now, the trash ctor and dtor will
be called in addition to any supplied ctor or dtor.

Also do a little refactoring for readability of the resulting logic.

This enables use-after-free detection for more zones, and will allow for
simplification of some callers that worked around the previous
restriction (see kern_mbuf.c).

Reviewed by:	jeff, markj
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D20722
2019-11-27 19:49:55 +00:00
Jeff Roberson
a67d540832 Use atomics in more cases for object references. We now can completely
omit the object lock if we are above a certain threshold.  Hold only a
single vnode reference when the vnode object has any ref > 0.  This
allows us to only lock the object and vnode on 0-1 and 1-0 transitions.

Differential Revision:	https://reviews.freebsd.org/D22452
2019-11-27 00:39:23 +00:00
Jeff Roberson
beb8beef81 Refactor uma_zalloc_arg(). It is a mess of gotos and code which doesn't
make sense after many partial refactors.  Attempt to make a smaller cache
footprint for the fast path.

Reviewed by:	markj, rlibby
Differential Revision:	https://reviews.freebsd.org/D22470
2019-11-26 22:17:02 +00:00
Ryan Libby
6a14746c01 vm_object_collapse_scan_wait: drop locks before reacquiring
Regression from r352174.  In the vm_page_rename() failure case we forgot
to unlock the vm object locks before sleeping and reacquiring them.

Reviewed by:	jeff
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D22542
2019-11-25 07:38:31 +00:00
Jeff Roberson
4d987866e6 Move anonymous object copying for fork into its own routine and so that we
can avoid locking non-anonymous objects.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D22472
2019-11-25 07:13:05 +00:00
Doug Moore
2767c9f36a Where 'current' is used to index over vm_map entries, use
'entry'. Where 'entry' is used to identify the starting point for
iteration, use 'first_entry'. These are the naming conventions used in
most of the vm_map.c code.  Where VM_MAP_ENTRY_FOREACH can be used, do
so. Squeeze a few lines to fit in 80 columns.  Where lines are being
modified for these reasons, look to remove style(9) violations.

Reviewed by: alc, markj
Differential Revision: https://reviews.freebsd.org/D22458
2019-11-25 02:19:47 +00:00