Commit Graph

419 Commits

Author SHA1 Message Date
Ed Maste
575a4437a9 uma: fix KTR message after r366840
Reported by:	bz
Sponsored by:	The FreeBSD Foundation
2020-10-19 18:54:44 +00:00
Mark Johnston
f09cbea31a uma: Respect uk_reserve in keg_drain()
When a reserve of free items is configured for a zone, the reserve must
not be reclaimed under memory pressure.  Modify keg_drain() to simply
respect the reserved pool.

While here remove an always-false uk_freef == NULL check (kegs that
shouldn't be drained should set _NOFREE instead), and make sure that the
keg_drain() KTR statement does not reference an uninitialized variable.

Reviewed by:	alc, rlibby
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26772
2020-10-19 16:57:40 +00:00
Mark Johnston
1b2dcc8c54 uma: Avoid depleting keg reserves when filling a bucket
zone_import() fetches a free or partially free slab from the keg and
then uses its items to populate an array, typically filling a bucket.
If a single allocation causes the keg to drop below its minimum reserve,
the inner loop ends.  However, if the bucket is still not full and
M_USE_RESERVE is specified, the outer loop will continue to fetch items
from the keg.

If M_USE_RESERVE is specified and the number of free items is below the
reserved limit, we should return only a single item.  Otherwise, if the
bucket size is larger than the reserve, all of the reserved items may
end up in a single per-CPU bucket, invisible to other CPUs.

Reviewed by:	rlibby
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26771
2020-10-19 16:55:03 +00:00
Konstantin Belousov
6f3b523c9a Avoid dump_avail[] redefinition.
Move dump_avail[] extern declaration and inlines into a new header
vm/vm_dumpset.h.  This fixes default gcc build for mips.

Reviewed by:	alc, scottph
Tested by:	kevans (previous version)
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D26741
2020-10-14 22:51:40 +00:00
Mark Johnston
06d8bdcbf7 uma: Use the bucket cache for cross-domain allocations
uma_zalloc_domain() allocates from the requested domain instead of
following a first-touch policy (the default for most zones).  Currently
it is only used by malloc_domainset(), and consumers free returned items
with free(9) since r363834.

Previously uma_zalloc_domain() worked by always going to the keg for an
item.  As a result, the use of UMA zone caches was unbalanced: we free
items to the caches, but always allocate from the keg, skipping the
caches.

Make some effort to allocate from the UMA caches when performing a
cross-domain allocation.  This avoids blowing up the caches when
something is performing many transient allocations with
malloc_domainset().

Reported and tested by:	dhw, glebius
Sponsored by:		The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26427
2020-10-02 19:04:29 +00:00
Mark Johnston
5afdf5c1ca uma: Use LIFO for non-SMR bucket caches
When SMR was introduced, zone_put_bucket() was changed to always place
full buckets at the end of the queue.  However, it is generally
preferable to use recently used buckets since their items are more
likely to be resident in cache.  So, for buckets that have no constraint
on item reuse, use a last-in-first-out ordering as we did before.

Reviewed by:	rlibby
Tested by:	dhw, glebius
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26426
2020-10-02 19:04:09 +00:00
Mark Johnston
952c8964ba uma: Remove newlines from panic messages
Sponsored by:	The FreeBSD Foundation
2020-10-02 19:03:42 +00:00
Konstantin Belousov
89d2fb14d5 Add interruptible variant of vm_wait(9), vm_wait_intr(9).
Also add msleep flags argument to vm_wait_doms(9).

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D24652
2020-09-08 23:28:09 +00:00
Mateusz Guzik
c3aa3bf97c vm: clean up empty lines in .c and .h files 2020-09-01 21:20:45 +00:00
Andrew Gallatin
791dda877f uma: record allocation failures due to zone limits
The zone limit mechanism was recently reworked, and
allocation failures due to limits being exceeded
were inadvertently no longer being recorded. This
would lead to, for example, mbuf allocation failures
not being indicated in netstat -m or vmstat -z

Reviewed by:	markj
Sponsored by:	Netflix
2020-08-21 18:31:57 +00:00
Mark Johnston
b21b022a81 Revert r364310.
Some of the resulting fallout in CAM does not appear straightforward to
fix, so simply revert the commit for now in the absence of a better
solution.

Discussed with:	mjg
Reported by:	dhw
2020-08-18 14:09:49 +00:00
Gleb Smirnoff
1921bb7b68 With INVARIANTS panic immediately if M_WAITOK is requested in a
non-sleepable context.  Previously only _sleep() would panic.
This will catch misuse of M_WAITOK at development stage rather
than at stress load stage.

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D26027
2020-08-17 15:37:08 +00:00
Mark Johnston
af32cefd7c Check the UMA zone's full bucket cache before short-circuiting an alloc.
The global "bucketdisable" flag indicates that we are in a low memory
situation and should avoid allocating buckets.  However, in the
allocation path we were checking it before the full bucket cache and
bailing even if the cache is non-empty.  Defer the check so that we have
a shot at allocating from the cache.

This came up because M_NOWAIT allocations from the buf trie node zone
must always succeed.  In one scenario, all of the preallocated trie
nodes were in the bucket list, and a new slab allocation could not
succeed due to a memory shortage.  The short-circuiting caused an
allocation failure which triggered a panic.

Reported by:	pho
Reviewed by:	cem
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25980
2020-08-10 20:34:45 +00:00
Mark Johnston
96ad26eefb Remove free_domain() and uma_zfree_domain().
These functions were introduced before UMA started ensuring that freed
memory gets placed in domain-local caches.  They no longer serve any
purpose since UMA now provides their functionality by default.  Remove
them to simplyify the kernel memory allocator interfaces a bit.

Reviewed by:	cem, kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25937
2020-08-04 13:58:36 +00:00
Mark Johnston
8c277118d8 Fix UMA's first-touch policy on systems with empty domains.
Suppose a thread is running on a CPU in a NUMA domain with no physical
RAM.  When an item is freed to a first-touch zone, it ends up in the
cross-domain bucket.  When the bucket is full, it gets placed in another
domain's bucket queue.  However, when allocating an item, UMA will
always go to the keg upon a per-CPU cache miss because the empty
domain's bucket queue will always be empty.  This means that a non-empty
domain's bucket queues can grow very rapidly on such systems.  For
example, it can easily cause mbuf allocation failures when the zone
limit is reached.

Change cache_alloc() to follow a round-robin policy when running on an
empty domain.

Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25355
2020-06-28 21:35:04 +00:00
Jeff Roberson
c8b0a88b8d Clarify some language. Favor primary where both master and primary were
used in conjunction with secondary.
2020-06-20 20:21:04 +00:00
Mateusz Guzik
1c58c09f5a uma: hide item_domain under ifdef NUMA
Fixes build warnings on mips.
2020-05-29 08:30:35 +00:00
Mark Johnston
81302f1d77 Fix boot on systems where NUMA domain 0 is unpopulated.
- Add vm_phys_early_add_seg(), complementing vm_phys_early_alloc(), to
  ensure that segments registered during hammer_time() are placed in the
  right domain.  Otherwise, since the SRAT is not parsed at that point,
  we just add them to domain 0, which may be incorrect and results in a
  domain with only several MB worth of memory.
- Fix uma_startup1() to try allocating memory for zones from any domain.
  If domain 0 is unpopulated, the allocation will simply fail, resulting
  in a page fault slightly later during boot.
- Change _vm_phys_domain() to return -1 for addresses not covered by the
  affinity table, and change vm_phys_early_alloc() to handle wildcard
  domains.  This is necessary on amd64, where the page array is dense
  and pmap_page_array_startup() may allocate page table pages for
  non-existent page frames.

Reported and tested by:	Rafael Kitover <rkitover@gmail.com>
Reviewed by:	cem (earlier version), kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25001
2020-05-28 19:41:00 +00:00
Mark Johnston
dc2b320563 Allocate UMA per-CPU counters earlier.
Otherwise anything counted before SI_SUB_VM_CONF is discarded.  However,
it is useful to be able to see stats from allocations done early during
boot.

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D24756
2020-05-14 16:06:54 +00:00
Mark Johnston
54007ce8ae Clean up uma_int.h a bit.
This makes it easier to write libkvm programs that access UMA data
structures.

- Remove a couple of unused slab functions and make others local to
  uma_core.c.  Similarly move SLAB_BITSETS, which affects the layout of
  slab structures, to uma_core.c.
- Stop defining the slab structures under _KERNEL.  There's no real
  reason they can't be visible to userspace like the rest of UMA's
  structures are.
- Group KEG_ASSERT_COLD with other keg macros.
- Convert an assertion about MAXMEMDOM to use _Static_assert.

No functional change intended.

Discussed with:	jeff
Reviewed by:	rlibby
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D23980
2020-03-07 15:37:23 +00:00
Brooks Davis
3823a5990a Remove an apparently incorrect assertion.
Without this change mips64 fails to boot.

Discussed with:	markj
Sponsored by:	DARPA
2020-03-06 23:31:09 +00:00
Mateusz Guzik
7f746c9fcc vm: add debug to uma_zone_set_smr
Reviewed by:	markj, rlibby
Differential Revision:	https://reviews.freebsd.org/D23902
2020-03-01 21:49:16 +00:00
Jeff Roberson
fe835cbf5f A pair of performance improvements.
Swap buckets on free as well as alloc so that alloc is always the most
cache-hot data.

When selecting a zone domain for the round-robin bucket cache use the
local domain unless there is a severe imbalance.  This does not affinitize
memory, only locks and queues.

Reviewed by:	markj, rlibby
Differential Revision:	https://reviews.freebsd.org/D23824
2020-02-27 08:23:10 +00:00
Pawel Biernacki
7029da5c36 Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE.  All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by:	kib (mentor, blanket)
Commented by:	kib, gallatin, melifaro
Differential Revision:	https://reviews.freebsd.org/D23718
2020-02-26 14:26:36 +00:00
Ryan Libby
eaa17d4291 sys/vm: quiet -Wwrite-strings
Discussed with:	kib
Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D23796
2020-02-23 03:32:04 +00:00
Mark Johnston
0464f16e91 Constify uma_zcache_create() and uma_zsecond_create()'s "name" argument.
It is already internally handled as a pointer to a const string, in
particular by uma_zcreate().

Fix indentation while here.

MFC after:	1 week
2020-02-22 17:44:28 +00:00
Jeff Roberson
226dd6db47 Add an atomic-free tick moderated lazy update variant of SMR.
This enables very cheap read sections with free-to-use latencies and memory
overhead similar to epoch.  On a recent AMD platform a read section cost
1ns vs 5ns for the default SMR.  On Xeon the numbers should be more like 1
ns vs 11.  The memory consumption should be proportional to the product
of the free rate and 2*1/hz while normal SMR consumption is proportional
to the product of free rate and maximum read section time.

While here refactor the code to make future additions more
straightforward.

Name the overall technique Global Unbound Sequences (GUS) and adjust some
comments accordingly.  This helps distinguish discussions of the general
technique (SMR) vs this specific implementation (GUS).

Discussed with:	rlibby, markj
2020-02-22 03:44:10 +00:00
Jeff Roberson
c6fd3e23f7 Use per-domain locks for the bucket cache.
This gives much better concurrency when there are a large number of
cores per-domain and multiple domains.  Avoid taking the lock entirely
if it will not be productive.  ROUNDROBIN domains will have mixed
memory in each domain and will load balance to all domains.

While here refactor the zone/domain separation and bucket limits to
simplify callers.

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D23673
2020-02-19 18:48:46 +00:00
Jeff Roberson
ed581bf68f Add a simple accessor that returns the bytes of memory consumed by a zone. 2020-02-17 01:59:55 +00:00
Jeff Roberson
70260874ac UMA has become more particular about zone types. Use the right allocator
calls in uma_zwait().
2020-02-17 01:06:18 +00:00
Jeff Roberson
6d88d784f8 Slightly restructure uma_zalloc* to generate better code from clang and
reduce duplication among zalloc functions.

Reviewed by:	markj
Discussed with:	mjg
Differential Revision:	https://reviews.freebsd.org/D23672
2020-02-16 01:07:19 +00:00
Mark Johnston
cefc92e1a2 Update the zone-global count of cached items in bucket_cache_reclaim().
This was missed in r351673.  The count is used to enfore cache limits,
which are rarely used.

Discussed with:	jeff
Sponsored by:	The FreeBSD Foundation
2020-02-13 23:15:21 +00:00
Jeff Roberson
543117bed8 Fix a case where ub_seq would fail to be set if the cross bucket was
flushed due to memory pressure.

Reviewed by:	markj
Differential Revision:	http://reviews.freebsd.org/D23614
2020-02-13 20:58:51 +00:00
Mateusz Guzik
3acb6572fc Store offset into zpcpu allocations in the per-cpu area.
This shorten zpcpu_get and allows more optimizations.

Reviewed by:	jeff
Differential Revision:	https://reviews.freebsd.org/D23570
2020-02-12 11:11:22 +00:00
Mark Johnston
4ab3aee8fb Reduce lock hold time in keg_drain().
Maintain a count of free slabs in the per-domain keg structure and use
that to clear the free slab list in constant time for most cases.  This
helps minimize lock contention induced by reclamation, in preparation
for proactive trimming of excesses of free memory.

Reviewed by:	jeff, rlibby
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D23532
2020-02-11 20:06:33 +00:00
Ryan Libby
bae55c4aec uma: remove UMA_ZFLAG_CACHEONLY flag
UMA_ZFLAG_CACHEONLY was essentially the same thing as UMA_ZONE_VM, but
with a more confusing name.  Remove the flag, make UMA_ZONE_VM an
inherit flag, and replace all references.

Reviewed by:	markj
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D23516
2020-02-06 08:32:25 +00:00
Ryan Libby
33e5a1ea3b uma: multipage chicken switch
Add a switch to allow disabling multipage slabs, in order to facilitate
measuring memory usage and performance effects.  The tunable
vm.debug.uma_multipage_slabs defaults to 1 and can be set to 0 to
disable.  The name may change soon.

Reviewed by:	markj (previous version)
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D23487
2020-02-04 22:40:45 +00:00
Ryan Libby
27ca37acb7 uma: grow slabs to enforce minimum memory efficiency
Memory efficiency can be poor with awkward item sizes (e.g. 1/2 or 1
page size + epsilon).  In order to achieve a minimum memory efficiency,
select a slab size with a potentially larger number of pages if it
yields a lower portion of waste.

This may mean using page_alloc instead of uma_small_alloc, which could
be more costly.

Discussed with:	jeff, mckusick
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D23239
2020-02-04 22:40:34 +00:00
Ryan Libby
ec0d828071 uma: add UMA_ZONE_CONTIG, and a default contig_alloc
For now, copy the mbuf allocator.

Reviewed by:	jeff, markj (previous version)
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D23237
2020-02-04 22:40:11 +00:00
Ryan Libby
5ba16cf3d7 uma: pcpu_page_free needs to startup_free pages from startup_alloc
After r357392, it is apparent that we do have some early-boot PCPU
zones.  Make it so we can safely free pages from them if they are
actually used during early boot.

Reviewed by:	jeff, markj
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D23496
2020-02-04 22:39:58 +00:00
Jeff Roberson
e84130a0c0 Use literal bucket sizes for smaller buckets rather than the rounding
system.  Small bucket sizes already pack well even if they are an odd
number of words.  This prevents any potential new instances of the
problem fixed in r357463 as well as making the system easier to
understand.

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D23494
2020-02-04 20:28:06 +00:00
Jeff Roberson
dc3915c8c6 Use STAILQ instead of TAILQ for bucket lists. We only need FIFO behavior
and this is more space efficient.

Stop queueing recently used buckets to the head of the list.  If the bucket
goes to a different processor the cache coherency will be more expensive.
We already try to encourage cache-hot behavior in the per-cpu layer.

Reviewed by:	rlibby
Differential Revision:	https://reviews.freebsd.org/D23493
2020-02-04 02:41:24 +00:00
Mark Johnston
36cb95c736 Disable the smallest UMA bucket size on 32-bit platforms.
With r357314, sizeof(struct uma_bucket) grew to 16 bytes on 32-bit
platforms, so BUCKET_SIZE(4) is 0.  This resulted in the creation of a
bucket zone for buckets with zero capacity.  A more general fix is
planned, but for now this bandaid allows 32-bit platforms to boot again.

PR:		243837
Discussed with:	jeff
Reported by:	pho, Jenkins via lwhsu
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
2020-02-03 19:29:02 +00:00
Jeff Roberson
f96d4157a7 Fix a bug in r356776 where the page allocator was not properly restored to
the percpu page allocator after it had been temporarily overridden by
startup_alloc.

Reported by:	pho, bdragon
2020-02-01 23:46:30 +00:00
Jeff Roberson
9e47b34110 Fix LINT build with MEMGUARD. 2020-01-31 02:03:22 +00:00
Jeff Roberson
d4665eaa66 Implement a safe memory reclamation feature that is tightly coupled with UMA.
This is in the same family of algorithms as Epoch/QSBR/RCU/PARSEC but is
a unique algorithm.  This has 3x the performance of epoch in a write heavy
workload with less than half of the read side cost.  The memory overhead
is significantly lessened by limiting the free-to-use latency.  A synthetic
test uses 1/20th of the memory vs Epoch.  There is significant further
discussion in the comments and code review.

This code should be considered experimental.  I will write a man page after
it has settled.  After further validation the VM will begin using this
feature to permit lockless page lookups.

Both markj and cperciva tested on arm64 at large core counts to verify
fences on weaker ordering architectures.  I will commit a stress testing
tool in a follow-up.

Reviewed by:	mmacy, markj, rlibby, hselasky
Discussed with:	sbahara
Differential Revision:	https://reviews.freebsd.org/D22586
2020-01-31 00:49:51 +00:00
Ryan Libby
8d1c459ae5 uma: fix zone domain overlaying pcpu cache with disabled cpus
UMA zone structures have two arrays at the end which are sized according
to the machine: an array of CPU count length, and an array of NUMA
domain count length.  The CPU counting was wrong in the case where some
CPUs are disabled (when mp_ncpus != mp_maxid + 1), and this caused the
second array to be overlaid with the first.

Reported by:	olivier
Reviewed by:	jeff, markj
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D23318
2020-01-23 04:56:38 +00:00
Ryan Libby
7e2406774e uma: report leaks more accurately
Previously UMA had some false negatives in the leak report at keg
destruction time, where it only reported leaks if there were free items
in the slab layer (rather than allocated items), which notably would not
be true for single-item slabs (large items).  Now, report a leak if
there are any allocated pages, and calculate and report the number of
allocated items rather than free items.

Reviewed by:	jeff, markj
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D23275
2020-01-23 04:56:34 +00:00
Jeff Roberson
530cc6a25d Some architectures with DMAP still consume boot kva. Simplify the test for
claiming kva in uma_startup2() to handle this.

Reported by:	bdragon
2020-01-23 03:37:35 +00:00
Andrew Gallatin
2052680238 pcpu_page_alloc: guard against empty NUMA domains
Some systems, such as higher end Threadripper, may have
NUMA domains with no physical memory, Don't allocate
from these domains.

This fixes a "panic: vm_wait in early boot" on my 2990WX desktop

Reviewed by:	jeff
Sponsored by:	Netflix
2020-01-18 18:25:37 +00:00