Commit Graph

4590 Commits

Author SHA1 Message Date
markj
afa3646dbf Re-check for wirings after busying the page in vm_page_release_locked().
A concurrent unlocked lookup can wire the page after
vm_page_release_locked() releases the last wiring, in which case
vm_page_release_locked() must not free the page.  Once the xbusy lock is
acquired, that, the object lock and the fact that the page is unmapped
ensure that the wire count cannot increase, so re-check for new wirings
after the page is xbusied.

Update the comment above vm_page_wired() to reflect the new
synchronization rules.

Reported by:	glebius
Reviewed by:	alc, jeff, kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D24592
2020-04-28 13:51:41 +00:00
markj
4b7a0ef908 Use a single VM object for kernel stacks.
Previously we allocated a separate VM object for each kernel stack.
However, fully constructed kernel stacks are cached by UMA, so there is
no harm in using a single global object for all stacks.  This reduces
memory consumption and makes it easier to define a memory allocation
policy for kernel stack pages, with the aim of reducing physical memory
fragmentation.

Add a global kstack_object, and use the stack KVA address to index into
the object like we do with kernel_object.

Reviewed by:	kib
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D24473
2020-04-26 20:08:57 +00:00
markj
fc30a4ce8c Factor out the kmem contig page alloc and reclamation code.
kmem_alloc_attr_domain() and kmem_alloc_contig_domain() duplicated each
other's page allocation and reclamation logic.  Place it in a single
function to make it easier to add additional consumers.  No functional
change intended.

Reviewed by:	jeff, kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D24475
2020-04-21 16:01:44 +00:00
markj
3cc5d844d4 Minimize conditional compilation for handling of M_EXEC.
This simplifies some planned changes.  No functional change intended.

Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D24474
2020-04-21 15:55:28 +00:00
markj
e367fe2912 Handle trashed queue pointers in vm_page_acquire_unlocked().
vm_page_acquire_unlocked() relies on type-stability of vm_page
structures and assumes that the listq linkage pointers always point to a
vm_page or are NULL.  QUEUE_MACRO_DEBUG_TRASH breaks that assumption, so
add an explicit check for a trashed queue pointer before dereferencing.

Reported and tested by:	pho
Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D24472
2020-04-20 14:45:17 +00:00
bdrewery
139e0ae1a1 Remove dead code leftover from r331018.
Sponsored by:	Dell EMC
2020-03-31 01:12:53 +00:00
kib
1bd0720f0c VOP_GETPAGES_ASYNC(): consistently call iodone() callback in case of error.
Reviewed by:	glebius, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D24038
2020-03-30 21:44:30 +00:00
kib
b4a174101e ddb show pginfo: print pages reference value in hex.
It is more useful this way after the VPRC_ flags were introduced.

Sponsored by:	The FreeBSD Foundation
2020-03-28 12:21:52 +00:00
jeff
8e62dcd978 Check for busy or wired in vm_page_relookup(). Some callers will only keep
a page wired and expect it to still be present.

Reported by:	delphij@FreeBSD.org
Reviewed by:	kib
2020-03-11 22:25:45 +00:00
markj
bdd145bf24 Clean up uma_int.h a bit.
This makes it easier to write libkvm programs that access UMA data
structures.

- Remove a couple of unused slab functions and make others local to
  uma_core.c.  Similarly move SLAB_BITSETS, which affects the layout of
  slab structures, to uma_core.c.
- Stop defining the slab structures under _KERNEL.  There's no real
  reason they can't be visible to userspace like the rest of UMA's
  structures are.
- Group KEG_ASSERT_COLD with other keg macros.
- Convert an assertion about MAXMEMDOM to use _Static_assert.

No functional change intended.

Discussed with:	jeff
Reviewed by:	rlibby
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D23980
2020-03-07 15:37:23 +00:00
markj
4c39dca275 Move SMR pointer type definition and access macros to smr_types.h.
The intent is to provide a header that can be included by other headers
without introducing too much pollution.  smr.h depends on various
headers and will likely grow over time, but is less likely to be
required by system headers.

Rename SMR_TYPE_DECLARE() to SMR_POINTER():
- One might use SMR to protect more than just pointers; it
  could be used for resizeable arrays, for example, so TYPE seems too
  generic.
- It is useful to be able to define anonymous SMR-protected pointer
  types and the _DECLARE suffix makes that look wrong.

Reviewed by:	jeff, mjg, rlibby
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D23988
2020-03-07 00:55:46 +00:00
brooks
137e6ff286 Remove an apparently incorrect assertion.
Without this change mips64 fails to boot.

Discussed with:	markj
Sponsored by:	DARPA
2020-03-06 23:31:09 +00:00
markj
6ee119267c Use COUNTER_U64_DEFINE_EARLY() in places where it simplifies things.
Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D23978
2020-03-06 19:10:00 +00:00
brooks
8119656735 Introduce kern_mmap_req().
This presents an extensible interface to the generic mmap(2)
implementation via a struct pointer intended to use a designated
initializer or compount literal.  We take advantage of the mandatory
zeroing of fields not listed in the initializer.

Remove kern_mmap_fpcheck() and use kern_mmap_req().

The motivation for this change is a desire to keep the core
implementation from growing an ever-increasing number of arguments
that must be specified in the correct order for the lowest-level
implementations.  In CheriBSD we have already added two more arguments.

Reviewed by:	kib
Discussed with:	kevans
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D23164
2020-03-04 21:27:12 +00:00
markj
a4f55681b0 Avoid doubly wiring a newly allocated page in vm_page_grab_valid().
This fixes a regression from r358363.

Reported by:	manu, jbeich
Tested by:	jbeich
2020-03-01 22:09:11 +00:00
mjg
e2980ce790 vm: add debug to uma_zone_set_smr
Reviewed by:	markj, rlibby
Differential Revision:	https://reviews.freebsd.org/D23902
2020-03-01 21:49:16 +00:00
jeff
36c40e1e93 Provide a lock free alternative to resolve bogus pages. This is not likely
to be much of a perf win, just a nice code simplification.

Reviewed by:	markj, kib
Differential Revision:	https://reviews.freebsd.org/D23866
2020-02-28 21:42:48 +00:00
jeff
f4e3c5e950 Convert a few triviail consumers to the new unlocked grab API.
Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D23847
2020-02-28 20:34:30 +00:00
jeff
d81e88be5a Support the NOCREAT flag for grab_valid_unlocked.
Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D23865
2020-02-28 20:32:35 +00:00
jeff
45d716be60 Simplify vref() code in object_reference. The local temporary is no longer
necessary.  Fix formatting errors.

Reported by:	mjg
Discussed with:	kib
2020-02-28 20:30:53 +00:00
markj
f1a13462c4 Add a blocking counter KPI.
refcount(9) was recently extended to support waiting on a refcount to
drop to zero, as this was needed for a lockless VM object
paging-in-progress counter.  However, this adds overhead to all uses of
refcount(9) and doesn't really match traditional refcounting semantics:
once a counter has dropped to zero, the protected object may be freed at
any point and it is not safe to dereference the counter.

This change removes that extension and instead adds a new set of KPIs,
blockcount_*, for use by VM object PIP and busy.

Reviewed by:	jeff, kib, mjg
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D23723
2020-02-28 16:05:18 +00:00
jeff
df024e3857 A pair of performance improvements.
Swap buckets on free as well as alloc so that alloc is always the most
cache-hot data.

When selecting a zone domain for the round-robin bucket cache use the
local domain unless there is a severe imbalance.  This does not affinitize
memory, only locks and queues.

Reviewed by:	markj, rlibby
Differential Revision:	https://reviews.freebsd.org/D23824
2020-02-27 08:23:10 +00:00
jeff
2ead2384b2 Add unlocked grab* function variants that use lockless radix code to
lookup pages.  These variants will fall back to their locked counterparts
if the page is not present.

Discussed with:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D23449
2020-02-27 02:37:27 +00:00
emaste
b216635983 Return ENOTSUP for mmap/mprotect if prot not subset of prot_max
From POSIX,

[ENOTSUP]
    The implementation does not support the combination of accesses
    requested in the prot argument.

This fits the case that prot contains permissions which are not a subset
of prot_max.

Reviewed by:	brooks, cem
Relnotes:	Yes
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D23843
2020-02-26 20:03:43 +00:00
kaktus
ad355b0a9d Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE.  All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by:	kib (mentor, blanket)
Commented by:	kib, gallatin, melifaro
Differential Revision:	https://reviews.freebsd.org/D23718
2020-02-26 14:26:36 +00:00
dougm
9ec8434fc0 The last argument to swp_pager_getswapspace is always 1. Remove that argument.
Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D23810
2020-02-24 04:01:09 +00:00
markj
5656c45944 Allow swap_pager_putpages() to allocate one block at a time.
The minimum allocation size of 4 blocks is an old policy that came with
the "new" swap pager in r42957.  Since then the blist allocator has
gotten better at reducing fragmentation; for example, with r349777 it
can return a range that spans multiple leaves.  When swap space is close
to being exhaused, the minimum of 4 blocks most likely exacerbates
memory pressure, so reduce it to 1.

Reported by:	alc
Tested by:	pho
Reviewed by:	alc, dougm, kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D23763
2020-02-23 17:59:51 +00:00
rlibby
1a3bab6bfa sys/vm: quiet -Wwrite-strings
Discussed with:	kib
Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D23796
2020-02-23 03:32:04 +00:00
markj
397e3ecf9f Constify uma_zcache_create() and uma_zsecond_create()'s "name" argument.
It is already internally handled as a pointer to a const string, in
particular by uma_zcreate().

Fix indentation while here.

MFC after:	1 week
2020-02-22 17:44:28 +00:00
kevans
086ad59e2a vm_radix: prefer __builtin_unreachable() to an unreachable panic()
This provides the needed hint to GCC and offers an annotation for readers to
observe that it's in-fact impossible to hit this point. We'll get hit with a
a -Wswitch error if the enum applicable to the switch above were to get
expanded without the new value(s) being handled.
2020-02-22 16:20:04 +00:00
jeff
51fe124dfe Add an atomic-free tick moderated lazy update variant of SMR.
This enables very cheap read sections with free-to-use latencies and memory
overhead similar to epoch.  On a recent AMD platform a read section cost
1ns vs 5ns for the default SMR.  On Xeon the numbers should be more like 1
ns vs 11.  The memory consumption should be proportional to the product
of the free rate and 2*1/hz while normal SMR consumption is proportional
to the product of free rate and maximum read section time.

While here refactor the code to make future additions more
straightforward.

Name the overall technique Global Unbound Sequences (GUS) and adjust some
comments accordingly.  This helps distinguish discussions of the general
technique (SMR) vs this specific implementation (GUS).

Discussed with:	rlibby, markj
2020-02-22 03:44:10 +00:00
imp
b143dc96e5 Don't convert all lower-layer errors to EIO.
Don't convert all lower layer errors to EIO. Instead, pass the actual error up
the stack. This will allow the upper layers that look for ENXIO to react
properly to that signal from the lower layers and, for UFS, unmount the
filesystem.

Reviewed by: kib@
Differential Revision:  https://reviews.freebsd.org/D23755
2020-02-20 01:33:01 +00:00
imp
006804e44f Don't spam the console with an additional, and useless, error message.
There's no need to spam the console with this error message. If there's an I/O
error, the disk/cam driver will report it at the lower levels. If that's an
actual problem, the upper layers will report that.

Reviewed by: kib@
Differential Revision:  https://reviews.freebsd.org/D23756
2020-02-20 00:34:46 +00:00
jeff
2b062892b7 Silence a gcc warning about no return from a function that handles every
possible enum in a switch statement.  I verified that this emits nothing
as expected on clang.  radix relies on constant propagation to eliminate
any branching from these access routines.

Reported by:	lwhsu/tinderbox
2020-02-19 22:34:22 +00:00
jeff
22d3316afc Use SMR to provide a safe unlocked lookup for vm_radix.
The tree is kept correct for readers with store barriers and careful
ordering.  The existing object lock serializes writers.  Consumers
will be introduced in later commits.

Reviewed by:	markj, kib
Differential Revision:	https://reviews.freebsd.org/D23446
2020-02-19 19:58:31 +00:00
jeff
72deafb875 Use per-domain locks for the bucket cache.
This gives much better concurrency when there are a large number of
cores per-domain and multiple domains.  Avoid taking the lock entirely
if it will not be productive.  ROUNDROBIN domains will have mixed
memory in each domain and will load balance to all domains.

While here refactor the zone/domain separation and bucket limits to
simplify callers.

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D23673
2020-02-19 18:48:46 +00:00
jeff
a30a6a746c Don't release xbusy on kmem pages. After lockless page lookup we will not
be able to guarantee that they can be racquired without blocking.

Reviewed by:	kib
Discussed with:	markj
Differential Revision:	https://reviews.freebsd.org/D23506
2020-02-19 09:10:11 +00:00
jeff
2c8ef8f230 Eliminate some unnecessary uses of UMA_ZONE_VM. Only zones involved in
virtual address or physical page allocation need to be marked with this
flag.

Reviewed by:	markj
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D23712
2020-02-19 08:17:27 +00:00
markj
0a959e0bae Remove swblk_t.
It was used only to store the bounds of each swap device.  However,
since swblk_t is a signed 32-bit int and daddr_t is a signed 64-bit
int, swp_pager_isondev() may return an invalid result if swap devices
are repeatedly added and removed and sw_end for a device ends up
becoming a negative number.

Note that the removed comment about maximum swap size still applies.

Reviewed by:	jeff, kib
Tested by:	pho
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D23666
2020-02-17 15:11:07 +00:00
markj
2f0124a78f Fix a swap block allocation race.
putpages' allocation of swap blocks is done under the global sw_dev
lock.  Previously it would drop that lock before inserting the allocated
blocks into the object's trie, creating a window in which swap blocks
are allocated but are not visible to swapoff.  This can cause
swp_pager_strategy() to fail and panic the system.

Fix the problem bluntly, by allocating swap blocks under the object
lock.

Reviewed by:	jeff, kib
Tested by:	pho
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D23665
2020-02-17 15:10:41 +00:00
markj
b1d2597003 Fix object locking races in swapoff(2).
swap_pager_swapoff_object()'s goal is to allocate pages for all valid
swap blocks belonging to the object, for which there is no resident
page.  If the page corresponding to a block is already resident and
valid, the block can simply be discarded.

The existing implementation tries to minimize the number of I/Os used.
For each cluster of swap blocks, it finds maximal runs of valid swap
blocks not resident in memory, and valid resident pages.  During this
processing, the object lock may be dropped in several places: when
calling getpages, or when blocking on a busy page in
vm_page_grab_pages().  While the lock is dropped, another thread may
free swap blocks, causing getpages to page in stale data.

Fix the problem following a suggestion from Jeff: use getpages'
readahead capability to perform clustering rather than doing it
ourselves.  The simplies the code a bit without reintroducing the old
behaviour of performing one I/O per page.

Reviewed by:	jeff
Reported by:	dhw, gallatin
Tested by:	pho
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D23664
2020-02-17 15:09:40 +00:00
jeff
12206c6f43 Add a simple accessor that returns the bytes of memory consumed by a zone. 2020-02-17 01:59:55 +00:00
jeff
02694321f8 Refactor _vm_page_busy_sleep to reduce the delta between the various
sleep routines and introduce a variant that supports lockless sleep.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D23612
2020-02-17 01:08:00 +00:00
jeff
75a1861990 UMA has become more particular about zone types. Use the right allocator
calls in uma_zwait().
2020-02-17 01:06:18 +00:00
jeff
1575a6c230 Slightly restructure uma_zalloc* to generate better code from clang and
reduce duplication among zalloc functions.

Reviewed by:	markj
Discussed with:	mjg
Differential Revision:	https://reviews.freebsd.org/D23672
2020-02-16 01:07:19 +00:00
mjg
e6a4b66e3b vm: use new capsicum helpers 2020-02-15 01:29:07 +00:00
mjg
87ec18049e vm: remove no longer needed atomic_load_ptr casts 2020-02-14 23:16:29 +00:00
markj
b9f89d3339 Fix handling of WAITFAIL in vm_page_grab() and vm_page_grab_pages().
After sleeping through a memory shortage, we must return NULL rather
than retry.

Discussed with:	jeff
Reported by:	pho
Sponsored by:	The FreeBSD Foundation
2020-02-13 23:18:35 +00:00
markj
65c30662b6 Update the zone-global count of cached items in bucket_cache_reclaim().
This was missed in r351673.  The count is used to enfore cache limits,
which are rarely used.

Discussed with:	jeff
Sponsored by:	The FreeBSD Foundation
2020-02-13 23:15:21 +00:00
jeff
06807c5be4 Fix a case where ub_seq would fail to be set if the cross bucket was
flushed due to memory pressure.

Reviewed by:	markj
Differential Revision:	http://reviews.freebsd.org/D23614
2020-02-13 20:58:51 +00:00