Commit Graph

133 Commits

Author SHA1 Message Date
Alexander Motin
2760658b21 Improve UMA cache reclamation.
When estimating working set size, measure only allocation batches, not free
batches.  Allocation and free patterns can be very different.  For example,
ZFS on vm_lowmem event can free to UMA few gigabytes of memory in one call,
but it does not mean it will request the same amount back that fast too, in
fact it won't.

Update working set size on every reclamation call, shrinking caches faster
under pressure.  Lack of this caused repeating vm_lowmem events squeezing
more and more memory out of real consumers only to make it stuck in UMA
caches.  I saw ZFS drop ARC size in half before previous algorithm after
periodic WSS update decided to reclaim UMA caches.

Introduce voluntary reclamation of UMA caches not used for a long time. For
each zdom track longterm minimal cache size watermark, freeing some unused
items every UMA_TIMEOUT after first 15 minutes without cache misses. Freed
memory can get better use by other consumers.  For example, ZFS won't grow
its ARC unless it see free memory, since it does not know it is not really
used.  And even if memory is not really needed, periodic free during
inactivity periods should reduce its fragmentation.

Reviewed by:	markj, jeff (previous version)
MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D29790
2021-05-02 19:45:23 -04:00
Mark Johnston
aabe13f145 uma: Introduce per-domain reclamation functions
Make it possible to reclaim items from a specific NUMA domain.

- Add uma_zone_reclaim_domain() and uma_reclaim_domain().
- Permit parallel reclamations.  Use a counter instead of a flag to
  synchronize with zone_dtor().
- Use the zone lock to protect cache_shrink() now that parallel reclaims
  can happen.
- Add a sysctl that can be used to trigger reclamation from a specific
  domain.

Currently the new KPIs are unused, so there should be no functional
change.

Reviewed by:	mav
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29685
2021-04-14 13:03:34 -04:00
Mark Johnston
d2f1c44bc9 uma: Remove the MINBUCKET flag from the flag name list
This should have been done in r368399 / commit
f8b6c51538.

Reported by:	rlibby
Sponsored by:	The FreeBSD Foundation
2020-12-27 17:01:33 -05:00
Mateusz Guzik
c3aa3bf97c vm: clean up empty lines in .c and .h files 2020-09-01 21:20:45 +00:00
Eric van Gyzen
a2e194654f memstat_kvm_uma: fix reading of uma_zone_domain structures
Coverity flagged the scaling by sizeof(uzd).  That is the type
of the pointer, so the scaling was already done by pointer arithmetic.
However, this was also passing a stack frame pointer to kvm_read,
so it was doubly wrong.

Move ZDOM_GET into the !_KERNEL section and use it in libmemstat.

Reported by:	Coverity
Reviewed by:	markj
MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D26213
2020-08-28 19:50:40 +00:00
Jeff Roberson
c8b0a88b8d Clarify some language. Favor primary where both master and primary were
used in conjunction with secondary.
2020-06-20 20:21:04 +00:00
Mark Johnston
54007ce8ae Clean up uma_int.h a bit.
This makes it easier to write libkvm programs that access UMA data
structures.

- Remove a couple of unused slab functions and make others local to
  uma_core.c.  Similarly move SLAB_BITSETS, which affects the layout of
  slab structures, to uma_core.c.
- Stop defining the slab structures under _KERNEL.  There's no real
  reason they can't be visible to userspace like the rest of UMA's
  structures are.
- Group KEG_ASSERT_COLD with other keg macros.
- Convert an assertion about MAXMEMDOM to use _Static_assert.

No functional change intended.

Discussed with:	jeff
Reviewed by:	rlibby
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D23980
2020-03-07 15:37:23 +00:00
Jeff Roberson
c6fd3e23f7 Use per-domain locks for the bucket cache.
This gives much better concurrency when there are a large number of
cores per-domain and multiple domains.  Avoid taking the lock entirely
if it will not be productive.  ROUNDROBIN domains will have mixed
memory in each domain and will load balance to all domains.

While here refactor the zone/domain separation and bucket limits to
simplify callers.

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D23673
2020-02-19 18:48:46 +00:00
Mark Johnston
4ab3aee8fb Reduce lock hold time in keg_drain().
Maintain a count of free slabs in the per-domain keg structure and use
that to clear the free slab list in constant time for most cases.  This
helps minimize lock contention induced by reclamation, in preparation
for proactive trimming of excesses of free memory.

Reviewed by:	jeff, rlibby
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D23532
2020-02-11 20:06:33 +00:00
Ryan Libby
bae55c4aec uma: remove UMA_ZFLAG_CACHEONLY flag
UMA_ZFLAG_CACHEONLY was essentially the same thing as UMA_ZONE_VM, but
with a more confusing name.  Remove the flag, make UMA_ZONE_VM an
inherit flag, and replace all references.

Reviewed by:	markj
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D23516
2020-02-06 08:32:25 +00:00
Ryan Libby
ec0d828071 uma: add UMA_ZONE_CONTIG, and a default contig_alloc
For now, copy the mbuf allocator.

Reviewed by:	jeff, markj (previous version)
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D23237
2020-02-04 22:40:11 +00:00
Jeff Roberson
dc3915c8c6 Use STAILQ instead of TAILQ for bucket lists. We only need FIFO behavior
and this is more space efficient.

Stop queueing recently used buckets to the head of the list.  If the bucket
goes to a different processor the cache coherency will be more expensive.
We already try to encourage cache-hot behavior in the per-cpu layer.

Reviewed by:	rlibby
Differential Revision:	https://reviews.freebsd.org/D23493
2020-02-04 02:41:24 +00:00
Jeff Roberson
d4665eaa66 Implement a safe memory reclamation feature that is tightly coupled with UMA.
This is in the same family of algorithms as Epoch/QSBR/RCU/PARSEC but is
a unique algorithm.  This has 3x the performance of epoch in a write heavy
workload with less than half of the read side cost.  The memory overhead
is significantly lessened by limiting the free-to-use latency.  A synthetic
test uses 1/20th of the memory vs Epoch.  There is significant further
discussion in the comments and code review.

This code should be considered experimental.  I will write a man page after
it has settled.  After further validation the VM will begin using this
feature to permit lockless page lookups.

Both markj and cperciva tested on arm64 at large core counts to verify
fences on weaker ordering architectures.  I will commit a stress testing
tool in a follow-up.

Reviewed by:	mmacy, markj, rlibby, hselasky
Discussed with:	sbahara
Differential Revision:	https://reviews.freebsd.org/D22586
2020-01-31 00:49:51 +00:00
Ryan Libby
9b8db4d0a0 uma: split slabzone into two sizes
By allowing more items per slab, we can improve memory efficiency for
small allocs.  If we were just to increase the bitmap size of the
slabzone, we would then waste slabzone memory.  So, split slabzone into
two zones, one especially for 8-byte allocs (512 per slab).  The
practical effect should be reduced memory usage for counter(9).

Reviewed by:	jeff, markj
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D23149
2020-01-14 02:14:15 +00:00
Ryan Libby
4a8b575c6b uma: unify layout paths and improve efficiency
Unify the keg layout selection paths (keg_small_init, keg_large_init,
keg_cachespread_init), and slightly improve memory efficiecy by:
 - using the padding of the final item to store the slab header,
 - not going OFFPAGE if we have a choice unless it improves efficiency.

Reviewed by:	jeff, markj
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D23048
2020-01-09 02:03:17 +00:00
Ryan Libby
54c5ae804f uma: reorganize flags
- Garbage collect UMA_ZONE_PAGEABLE & UMA_ZONE_STATIC.
 - Move flag VTOSLAB from public to private.
 - Introduce public NOTPAGE flag and make HASH private.
 - Introduce public NOTOUCH flag and make OFFPAGE private.
 - Update man page.

The net effect of this should be to make the contract with clients more
clear.  Clients should choose constraints, UMA will figure out how to
implement them.  This also breaks the confusing double meaning of
OFFPAGE.

Reviewed by:	jeff, markj
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D23016
2020-01-09 02:03:03 +00:00
Jeff Roberson
79c9f9429a Fix uma boot pages calculations on NUMA machines that also don't have
MD_UMA_SMALL_ALLOC.  This is unusual but not impossible.  Fix the alignemnt
of zones while here.  This was already correct because uz_cpu strongly
aligned the zone structure but the specified alignment did not match
reality and involved redundant defines.

Reviewed by:	markj, rlibby
Differential Revision:	https://reviews.freebsd.org/D23046
2020-01-06 02:51:19 +00:00
Jeff Roberson
31c251a046 Fix an assertion introduced in r356348. On architectures without
UMA_MD_SMALL_ALLOC vmem has a more complicated startup sequence that
violated the new assert.  Resolve this by rewriting the COLD asserts to
look at the per-cpu allocation counts for evidence of api activity.

Discussed with:	rlibby
Reviewed by:	markj
Reported by:	lwhsu
2020-01-04 19:29:25 +00:00
Jeff Roberson
dfe13344f5 UMA NUMA flag day. UMA_ZONE_NUMA was a source of confusion. Make the names
more consistent with other NUMA features as UMA_ZONE_FIRSTTOUCH and
UMA_ZONE_ROUNDROBIN.  The system will now pick a select a default depending
on kernel configuration.  API users need only specify one if they want to
override the default.

Remove the UMA_XDOMAIN and UMA_FIRSTTOUCH kernel options and key only off
of NUMA.  XDOMAIN is now fast enough in all cases to enable whenever NUMA
is.

Reviewed by:	markj
Discussed with:	rlibby
Differential Revision:	https://reviews.freebsd.org/D22831
2020-01-04 18:48:13 +00:00
Jeff Roberson
91d947bfbe Sort cross-domain frees into per-domain buckets before inserting these
onto their respective bucket lists.  This is a several order of magnitude
improvement in contention on the keg lock under heavy free traffic while
requiring only an additional bucket per-domain worth of memory.

Discussed with:		markj, rlibby
Differential Revision:	https://reviews.freebsd.org/D22830
2020-01-04 07:56:28 +00:00
Jeff Roberson
8b987a7769 Use per-domain keg locks. This provides both a lock and separate space
accounting for each NUMA domain.  Independent keg domain locks are important
with cross-domain frees.  Hashed zones are non-numa and use a single keg
lock to protect the hash table.

Reviewed by:	markj, rlibby
Differential Revision:	https://reviews.freebsd.org/D22829
2020-01-04 03:30:08 +00:00
Jeff Roberson
727c691857 Use a separate lock for the zone and keg. This provides concurrency
between populating buckets from the slab layer and fetching full buckets
from the zone layer.  Eliminate some nonsense locking patterns where
we lock to fetch a single variable.

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D22828
2020-01-04 03:15:34 +00:00
Jeff Roberson
4bd61e19a2 Use atomics for the zone limit and sleeper count. This relies on the
sleepq to serialize sleepers.  This patch retains the existing sleep/wakeup
paradigm to limit 'thundering herd' wakeups.  It resolves a missing wakeup
in one case but otherwise should be bug for bug compatible.  In particular,
there are still various races surrounding adjusting the limit via sysctl
that are now documented.

Discussed with:	markj
Reviewed by:	rlibby
Differential Revision:	https://reviews.freebsd.org/D22827
2020-01-04 03:04:46 +00:00
Jeff Roberson
cc7ce83ae0 Further reduce the cacheline footprint of fast allocations by duplicating
the zone size and flags fields in the per-cpu caches.  This allows fast
alloctions to proceed only touching the single per-cpu cacheline and
simplifies the common case when no ctor/dtor is specified.

Reviewed by:	markj, rlibby
Differential Revision:	https://reviews.freebsd.org/D22826
2019-12-25 20:57:24 +00:00
Jeff Roberson
376b1ba394 Optimize fast path allocations by storing bucket headers in the per-cpu
cache area.  This allows us to check on bucket space for all per-cpu
buckets with a single cacheline access and fewer branches.

Reviewed by:	markj, rlibby
Differential Revision:	https://reviews.freebsd.org/D22825
2019-12-25 20:50:53 +00:00
Ryan Libby
815db20425 uma dbg: flexible size for slab debug bitset too
Recently (r355315) the size of the struct uma_slab bitset field us_free
became dynamic instead of conservative.  Now, make the debug bitset
size dynamic too.  The debug bitset is INVARIANTS-only, so in fact we
don't care too much about the space savings that results from this, but
enabling minimally-sized slabs on INVARIANTS builds is still important
in order to be able to test new slab layouts effectively.

Reviewed by:	jeff (previous version), markj (previous version)
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D22759
2019-12-14 05:21:56 +00:00
Ryan Libby
d82c8ffb16 Revert r355706 & r355710
The quick fix didn't work.  I'll sort it out tomorrow.

Revert r355710: "libmemstat: unbreak build"
Revert r355706: "uma dbg: flexible size for slab debug bitset too"
2019-12-13 11:21:28 +00:00
Ryan Libby
7508f15ff1 uma dbg: flexible size for slab debug bitset too
Recently (r355315) the size of the struct uma_slab bitset field us_free
became dynamic instead of conservative.  Now, make the debug bitset
size dynamic too.  The debug bitset is INVARIANTS-only, so in fact we
don't care too much about the space savings that results from this, but
enabling minimally-sized slabs on INVARIANTS builds is still important
in order to be able to test new slab layouts effectively.

Reviewed by:	jeff, markj
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D22759
2019-12-13 09:31:59 +00:00
Ryan Libby
6d204a6a0e uma: pretty print zone flags sysctl
Requested by:	jeff
Reviewed by:	jeff, markj
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D22748
2019-12-11 06:50:55 +00:00
Jeff Roberson
1e0701e1e5 Use a variant slab structure for offpage zones. This saves space in
embedded slabs but also is an opportunity to tidy up code and add
accessor inlines.

Reviewed by:	markj, rlibby
Differential Revision:	https://reviews.freebsd.org/D22609
2019-12-08 01:15:06 +00:00
Jeff Roberson
9b78b1f433 Use a precise bit count for the slab free items in UMA. This significantly
shrinks embedded slab structures.

Reviewed by:	markj, rlibby (prior version)
Differential Revision:	https://reviews.freebsd.org/D22584
2019-12-02 22:44:34 +00:00
Jeff Roberson
6d6a03d7a8 Handle large mallocs by going directly to kmem. Taking a detour through
UMA does not provide any additional value.

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D22563
2019-11-29 03:14:10 +00:00
Jeff Roberson
584061b480 Garbage collect the mostly unused us_keg field. Use appropriately named
union members in vm_page.h to store the zone and slab.  Remove some nearby
dead code.

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D22564
2019-11-28 07:49:25 +00:00
Jeff Roberson
20a4e15451 Implement a sysctl tree for uma zones to assist in debugging and provide
more statistcs than are exported via the ABI stable vmstat interface.
Rename uz_count to uz_bucket_size because even I was confused by the
name after returning to the source years later.

Reviewed by:	rlibby
Differential Revision:	https://reviews.freebsd.org/D22554
2019-11-28 00:19:09 +00:00
Ryan Libby
ca293436d1 uma: trash memory when ctor/dtor supplied too
On INVARIANTS kernels, UMA has a use-after-free detection mechanism.
This mechanism previously required that all of the ctor/dtor/uminit/fini
arguments to uma_zcreate() be NULL in order to function.  Now, it only
requires that uminit and fini be NULL; now, the trash ctor and dtor will
be called in addition to any supplied ctor or dtor.

Also do a little refactoring for readability of the resulting logic.

This enables use-after-free detection for more zones, and will allow for
simplification of some callers that worked around the previous
restriction (see kern_mbuf.c).

Reviewed by:	jeff, markj
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D20722
2019-11-27 19:49:55 +00:00
Mark Johnston
08cfa56ea3 Extend uma_reclaim() to permit different reclamation targets.
The page daemon periodically invokes uma_reclaim() to reclaim cached
items from each zone when the system is under memory pressure.  This
is important since the size of these caches is unbounded by default.
However it also results in bursts of high latency when allocating from
heavily used zones as threads miss in the per-CPU caches and must
access the keg in order to allocate new items.

With r340405 we maintain an estimate of each zone's usage of its
(per-NUMA domain) cache of full buckets.  Start making use of this
estimate to avoid reclaiming the entire cache when under memory
pressure.  In particular, introduce TRIM, DRAIN and DRAIN_CPU
verbs for uma_reclaim() and uma_zone_reclaim().  When trimming, only
items in excess of the estimate are reclaimed.  Draining a zone
reclaims all of the cached full buckets (the previous behaviour of
uma_reclaim()), and may further drain the per-CPU caches in extreme
cases.

Now, when under memory pressure, the page daemon will trim zones
rather than draining them.  As a result, heavily used zones do not incur
bursts of bucket cache misses following reclamation, but large, unused
caches will be reclaimed as before.

Reviewed by:	jeff
Tested by:	pho (an earlier version)
MFC after:	2 months
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D16667
2019-09-01 22:22:43 +00:00
Jeff Roberson
c168508655 Add two new kernel options to control memory locality on NUMA hardware.
- UMA_XDOMAIN enables an additional per-cpu bucket for freed memory that
   was freed on a different domain from where it was allocated.  This is
   only used for UMA_ZONE_NUMA (first-touch) zones.
 - UMA_FIRSTTOUCH sets the default UMA policy to be first-touch for all
   zones.  This tries to maintain locality for kernel memory.

Reviewed by:	gallatin, alc, kib
Tested by:	pho, gallatin
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D20929
2019-08-06 21:50:34 +00:00
Pedro F. Giffuni
6929b7d1ab UMA: unsign some variables related to allocation in hash_alloc().
As a followup to r343673, unsign some variables related to allocation
since the hashsize cannot be negative. This gives a bit more space to
handle bigger allocations and avoid some implicit casting.

While here also unsign uh_hashmask, it makes little sense to keep that
signed.

MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D19148
2019-02-12 04:33:05 +00:00
Gleb Smirnoff
ad66f95865 Now that there is only one way to allocate a slab, remove uz_slab method.
Discussed with:	jeff
2019-02-07 03:55:05 +00:00
Gleb Smirnoff
b68d692a3d Whitespace. 2019-01-16 04:02:08 +00:00
Gleb Smirnoff
396694153f Fix compilation failures on different arches that have vm_machdep.c not
aware of counter_u64_t by including counter.h into uma_int.h. I'm not
happy about this inclusion, but it fixes compilation ASAP.
2019-01-15 19:33:47 +00:00
Gleb Smirnoff
2efcc8cbca Make uz_allocs, uz_frees and uz_fails counter(9). This removes some
atomic updates and reduces amount of data protected by zone lock.

During startup point these fields to EARLY_COUNTER. After startup
allocate them for all early zones.

Tested by:	pho
2019-01-15 18:24:34 +00:00
Gleb Smirnoff
bb15d1c778 o Move zone limit from keg level up to zone level. This means that now
two zones sharing a keg may have different limits. Now this is going
  to work:

  zone = uma_zcreate();
  uma_zone_set_max(zone, limit);
  zone2 = uma_zsecond_create(zone);
  uma_zone_set_max(zone2, limit2);

  Kegs no longer have uk_maxpages field, but zones have uz_items. When
  set, it may be rounded up to minimum possible CPU bucket cache size.
  For small limits bucket cache can also be reconfigured to be smaller.
  Counter uz_items is updated whenever items transition from keg to a
  bucket cache or directly to a consumer. If zone has uz_maxitems set and
  it is reached, then we are going to sleep.

o Since new limits don't play well with multi-keg zones, remove them. The
  idea of multi-keg zones was introduced exactly 10 years ago, and never
  have had a practical usage. In discussion with Jeff we came to a wild
  agreement that if we ever want to reintroduce the idea of a smart allocator
  that would be able to choose between two (or more) totally different
  backing stores, that choice should be made one level higher than UMA,
  e.g. in malloc(9) or in mget(), or whatever and choice should be controlled
  by the caller.

o Sleeping code is improved to account number of sleepers and wake them one
  by one, to avoid thundering herd problem.

o Flag UMA_ZONE_NOBUCKETCACHE removed, instead uma_zone_set_maxcache()
  KPI added. Having no bucket cache basically means setting maxcache to 0.

o Now with many fields added and many removed (no multi-keg zones!) make
  sure that struct uma_zone is perfectly aligned.

Reviewed by:	markj, jeff
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D17773
2019-01-15 00:02:06 +00:00
Gleb Smirnoff
3d5e3df73f For not offpage zones the slab is placed at the end of page. Keg's uk_pgoff
is calculated to guarantee that struct uma_slab is placed at pointer size
alignment. Calculation of real struct uma_slab size is done in keg_ctor()
and yet again in keg_large_init(), to check if we need an extra page. This
calculation can actually be performed at compile time.

- Add SIZEOF_UMA_SLAB macro to calculate size of struct uma_slab placed at
  an end of a page with alignment requirement.
- Use SIZEOF_UMA_SLAB in keg_ctor() and in keg_large_init(). This is a not
  a functional change.
- Use SIZEOF_UMA_SLAB in UMA_SLAB_SPACE definition and in keg_small_init().
  This is a potential bugfix, but in reality I don't think there are any
  systems affected, since compiler aligns struct uma_slab anyway.
2018-11-28 19:17:27 +00:00
Mark Johnston
0f9b7bf37a Add accounting to per-domain UMA full bucket caches.
In particular, track the current size of the cache and maintain an
estimate of its working set size.  This will be used to decide how
much to shrink various caches when the kernel attempts to reclaim
pages.  As a secondary effect, it makes statistics aggregation (done
by, e.g., vmstat -z) cheaper since sysctl_vm_zone_stats() no longer
needs to iterate over lists of cached buckets.

Discussed with:	alc, glebius, jeff
Tested by:	pho (previous version)
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D16666
2018-11-13 19:44:40 +00:00
Mark Johnston
7571e24901 Add an #include required after r339686.
X-MFC with:	r339686
Sponsored by:	The FreeBSD Foundation
2018-10-24 16:49:16 +00:00
Mark Johnston
194a979ee9 Use a vm_domainset iterator in keg_fetch_slab().
Previously, it used a hand-rolled round-robin iterator.  This meant that
the minskip logic in r338507 didn't apply to UMA allocations, and also
meant that we would call vm_wait() for individual domains rather than
permitting an allocation from any domain with sufficient free pages.

Discussed with:	jeff
Tested by:	pho
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17420
2018-10-24 16:41:47 +00:00
Gleb Smirnoff
306abf0f35 Either "free" or "allocated" is misleading here, since an item
in a bucket is free from perspective of UMA consumer, and it is
allocated from perspective of keg.

Discussed with:	markj
Approved by:	re (kib)
2018-08-24 18:47:50 +00:00
Gleb Smirnoff
a307fb5b0c Fix comment. The actual meaning of ub_cnt is the opposite. 2018-08-23 23:24:28 +00:00
Jeff Roberson
63b5557b2f Sort uma_zone fields according to 64 byte cache line with adjacent line
prefetch on 64bit architectures.  Prior to this, two lines were needed
for the fast path and each line may fetch an unused adjacent neighbor.
 - Move fields used by the fast path into a single line.
 - Move constants into the adjacent line which is mostly used for
   the spare bucket alloc 'medium path'.
 - Unpad the mtx which is only used by the fast path and place it in
   a line with rarely used data.  This aligns the cachelines better and
   eliminates 128 bytes of wasted space.

This gives a 45% improvement on a will-it-scale test on a 24 core machine.

Reviewed by:	mmacy
2018-06-23 08:10:09 +00:00