Commit Graph

308 Commits

Author SHA1 Message Date
glebius
60d5d98bc3 Make uz_allocs, uz_frees and uz_fails counter(9). This removes some
atomic updates and reduces amount of data protected by zone lock.

During startup point these fields to EARLY_COUNTER. After startup
allocate them for all early zones.

Tested by:	pho
2019-01-15 18:24:34 +00:00
glebius
dd582940fe Fix compilation on 32-bit. 2019-01-15 03:43:46 +00:00
glebius
f1a8621cf2 o Move zone limit from keg level up to zone level. This means that now
two zones sharing a keg may have different limits. Now this is going
  to work:

  zone = uma_zcreate();
  uma_zone_set_max(zone, limit);
  zone2 = uma_zsecond_create(zone);
  uma_zone_set_max(zone2, limit2);

  Kegs no longer have uk_maxpages field, but zones have uz_items. When
  set, it may be rounded up to minimum possible CPU bucket cache size.
  For small limits bucket cache can also be reconfigured to be smaller.
  Counter uz_items is updated whenever items transition from keg to a
  bucket cache or directly to a consumer. If zone has uz_maxitems set and
  it is reached, then we are going to sleep.

o Since new limits don't play well with multi-keg zones, remove them. The
  idea of multi-keg zones was introduced exactly 10 years ago, and never
  have had a practical usage. In discussion with Jeff we came to a wild
  agreement that if we ever want to reintroduce the idea of a smart allocator
  that would be able to choose between two (or more) totally different
  backing stores, that choice should be made one level higher than UMA,
  e.g. in malloc(9) or in mget(), or whatever and choice should be controlled
  by the caller.

o Sleeping code is improved to account number of sleepers and wake them one
  by one, to avoid thundering herd problem.

o Flag UMA_ZONE_NOBUCKETCACHE removed, instead uma_zone_set_maxcache()
  KPI added. Having no bucket cache basically means setting maxcache to 0.

o Now with many fields added and many removed (no multi-keg zones!) make
  sure that struct uma_zone is perfectly aligned.

Reviewed by:	markj, jeff
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D17773
2019-01-15 00:02:06 +00:00
glebius
7b43f3229b Fix yet another edge case in uma_startup_count(). If zone size fits into
several pages, but leaves no space for struct uma_slab at the end we
miscalculate number of pages by one. Totally mimic keg_large_init() math
here to cover that problem.

Reported by:	gallatin
2018-11-28 19:54:02 +00:00
glebius
82a2de386c For not offpage zones the slab is placed at the end of page. Keg's uk_pgoff
is calculated to guarantee that struct uma_slab is placed at pointer size
alignment. Calculation of real struct uma_slab size is done in keg_ctor()
and yet again in keg_large_init(), to check if we need an extra page. This
calculation can actually be performed at compile time.

- Add SIZEOF_UMA_SLAB macro to calculate size of struct uma_slab placed at
  an end of a page with alignment requirement.
- Use SIZEOF_UMA_SLAB in keg_ctor() and in keg_large_init(). This is a not
  a functional change.
- Use SIZEOF_UMA_SLAB in UMA_SLAB_SPACE definition and in keg_small_init().
  This is a potential bugfix, but in reality I don't think there are any
  systems affected, since compiler aligns struct uma_slab anyway.
2018-11-28 19:17:27 +00:00
markj
e8333c0398 Add accounting to per-domain UMA full bucket caches.
In particular, track the current size of the cache and maintain an
estimate of its working set size.  This will be used to decide how
much to shrink various caches when the kernel attempts to reclaim
pages.  As a secondary effect, it makes statistics aggregation (done
by, e.g., vmstat -z) cheaper since sysctl_vm_zone_stats() no longer
needs to iterate over lists of cached buckets.

Discussed with:	alc, glebius, jeff
Tested by:	pho (previous version)
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D16666
2018-11-13 19:44:40 +00:00
markj
f931b753dd Add malloc_domainset(9) and _domainset variants to other allocator KPIs.
Remove malloc_domain(9) and most other _domain KPIs added in r327900.
The new functions allow the caller to specify a general NUMA domain
selection policy, rather than specifically requesting an allocation from
a specific domain.  The latter policy tends to interact poorly with
M_WAITOK, resulting in situations where a caller is blocked indefinitely
because the specified domain is depleted.  Most existing consumers of
the _domain KPIs are converted to instead use a DOMAINSET_PREF() policy,
in which we fall back to other domains to satisfy the allocation
request.

This change also defines a set of DOMAINSET_FIXED() policies, which
only permit allocations from the specified domain.

Discussed with:	gallatin, jeff
Reported and tested by:	pho (previous version)
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17418
2018-10-30 18:26:34 +00:00
markj
4f4e1af172 Fix some problems that manifest when NUMA domain 0 is empty.
- In uma_prealloc(), we need to check for an empty domain before the
  first allocation attempt, not after.  Fix this by switching
  uma_prealloc() to use a vm_domainset iterator, which addresses the
  secondary issue of using a signed domain identifier in round-robin
  iteration.
- Don't automatically create a page daemon for domain 0.
- In domainset_empty_vm(), recompute ds_cnt and ds_order after
  excluding empty domains; otherwise we may frequently specify an empty
  domain when calling in to the page allocator, wasting CPU time.
  Convert DOMAINSET_PREF() policies for empty domains to round-robin.
- When freeing bootstrap pages, don't count them towards the per-domain
  total page counts for now: some vm_phys segments are created before
  the SRAT is parsed and are thus always identified as being in domain 0
  even when they are not.  Then, when bootstrap pages are freed, they
  are added to a domain that we had previously thought was empty.  Until
  this is corrected, we simply exclude them from the per-domain page
  count.

Reported and tested by:	Rajesh Kumar <rajfbsd@gmail.com>
Reviewed by:	gallatin
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17704
2018-10-30 17:57:40 +00:00
markj
ac2a239a60 Use a vm_domainset iterator in keg_fetch_slab().
Previously, it used a hand-rolled round-robin iterator.  This meant that
the minskip logic in r338507 didn't apply to UMA allocations, and also
meant that we would call vm_wait() for individual domains rather than
permitting an allocation from any domain with sufficient free pages.

Discussed with:	jeff
Tested by:	pho
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17420
2018-10-24 16:41:47 +00:00
glebius
e30533f381 If we lost race or were migrated during bucket allocation for the per-CPU
cache, then we put new bucket on generic bucket cache. However, code didn't
honor UMA_ZONE_NOBUCKETCACHE flag, so potentially we could start a cache
on a zone that clearly forbids that. Fix this.

Reviewed by:	markj
2018-10-22 15:48:07 +00:00
gallatin
770bffbee6 Allow empty NUMA memory domains to support Threadripper2
The AMD Threadripper 2990WX is basically a slightly crippled Epyc.
Rather than having 4 memory controllers, one per NUMA domain, it has
only 2  memory controllers enabled. This means that only 2 of the
4 NUMA domains can be populated with physical memory, and the
others are empty.

Add support to FreeBSD for empty NUMA domains by:

- creating empty memory domains when parsing the SRAT table,
    rather than failing to parse the table
- not running the pageout deamon threads in empty domains
- adding defensive code to UMA to avoid allocating from empty domains
- adding defensive code to cpuset to avoid binding to an empty domain
    Thanks to Jeff for suggesting this strategy.

Reviewed by:	alc, markj
Approved by:	re (gjb@)
Differential Revision:	https://reviews.freebsd.org/D1683
2018-10-01 14:14:21 +00:00
markj
351e1e4099 Only update the domain cursor once in keg_fetch_slab().
We drop the keg lock when we go to actually allocate the slab, allowing
other threads to advance the cursor.  This can cause us to exit the
round-robin loop before having attempted allocations from all domains,
resulting in a hang during a subsequent blocking allocation attempt from
a depleted domain.

Reported and tested by:	Jan Bramkamp <crest@bultmann.eu>
Reviewed by:	alc, cem
Approved by:	re (gjb)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17209
2018-09-18 17:51:45 +00:00
markm
d8723e8b03 Remove the Yarrow PRNG algorithm option in accordance with due notice
given in random(4).

This includes updating of the relevant man pages, and no-longer-used
harvesting parameters.

Ensure that the pseudo-unit-test still does something useful, now also
with the "other" algorithm instead of Yarrow.

PR:		230870
Reviewed by:	cem
Approved by:	so(delphij,gtetlow)
Approved by:	re(marius)
Differential Revision:	https://reviews.freebsd.org/D16898
2018-08-26 12:51:46 +00:00
alc
3799d78beb Eliminate the arena parameter to kmem_free(). Implicitly this corrects an
error in the function hypercall_memfree(), where the wrong arena was being
passed to kmem_free().

Introduce a per-page flag, VPO_KMEM_EXEC, to mark physical pages that are
mapped in kmem with execute permissions.  Use this flag to determine which
arena the kmem virtual addresses are returned to.

Eliminate UMA_SLAB_KRWX.  The introduction of VPO_KMEM_EXEC makes it
redundant.

Update the nearby comment for UMA_SLAB_KERNEL.

Reviewed by:	kib, markj
Discussed with:	jeff
Approved by:	re (marius)
Differential Revision:	https://reviews.freebsd.org/D16845
2018-08-25 19:38:08 +00:00
alc
4ce21fcbea Eliminate kmem_malloc()'s unused arena parameter. (The arena parameter
became unused in FreeBSD 12.x as a side-effect of the NUMA-related
changes.)

Reviewed by:	kib, markj
Discussed with:	jeff, re@
Differential Revision:	https://reviews.freebsd.org/D16825
2018-08-21 16:43:46 +00:00
alc
dce7d8625a Eliminate the arena parameter to kmem_malloc_domain(). It is redundant.
The domain and flags parameters suffice.  In fact, the related functions
kmem_alloc_{attr,contig}_domain() don't have an arena parameter.

Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D16713
2018-08-18 18:33:50 +00:00
mjg
20f4798188 uma: whack main zone counter update in the slow path, freeing side
See r333052.
2018-07-12 22:35:52 +00:00
markj
b368231e1d Fix pre-SI_SUB_CPU initialization of per-CPU counters.
r336020 introduced pcpu_page_alloc(), replacing page_alloc() as the
backend allocator for PCPU UMA zones.  Unlike page_alloc(), it does
not honour malloc(9) flags such as M_ZERO or M_NODUMP, so fix that.

r336020 also changed counter(9) to initialize each counter using a
CPU_FOREACH() loop instead of an SMP rendezvous.  Before SI_SUB_CPU,
smp_rendezvous() will only execute the callback on the current CPU
(i.e., CPU 0), so only one counter gets zeroed.  The rest are zeroed
by virtue of the fact that UMA gratuitously zeroes slabs when importing
them into a zone.

Prior to SI_SUB_CPU, all_cpus is clear, so with r336020 we weren't
zeroing vm_cnt counters during boot: the CPU_FOREACH() loop had no
effect, and pcpu_page_alloc() didn't honour M_ZERO.  Fix this by
iterating over the full range of CPU IDs when zeroing counters,
ignoring whether the corresponding bits in all_cpus are set.

Reported and tested by:	pho (previous version)
Reviewed by:		kib (previous version)
Differential Revision:	https://reviews.freebsd.org/D16190
2018-07-10 00:18:12 +00:00
sbruno
93165a4a59 Wrap the declaration and assignment of "stripe" with #ifdef NUMA declarations
as not all targets are NUMA aware.

Found with gcc.

Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D16113
2018-07-07 13:37:44 +00:00
mmacy
ff20311f27 Back pcpu zone with domain correct pages
- Change pcpu zone consumers to use a stride size of PAGE_SIZE.
  (defined as UMA_PCPU_ALLOC_SIZE to make future identification easier)

- Allocate page from the correct domain for a given cpu.

- Don't initialize pc_domain to non-zero value if NUMA is not defined
  There are some misconceptions surrounding this field. It is the
  _VM_ NUMA domain and should only ever correspond to valid domain
  values as understood by the VM.

The former slab size of sizeof(struct pcpu) was somewhat arbitrary.
The new value is PAGE_SIZE because that's the smallest granularity
which the VM can allocate a slab for a given domain. If you have
fewer than PAGE_SIZE/8 counters on your system there will be some
memory wasted, but this is obviously something where you want the
cache line to be coming from the correct domain.

Reviewed by: jeff
Sponsored by: Limelight Networks
Differential Revision:  https://reviews.freebsd.org/D15933
2018-07-06 02:06:03 +00:00
ian
10180229a6 Eliminate a spurious panic on non-SMP systems (occurred on shutdown/reboot). 2018-06-22 20:22:26 +00:00
br
47876edb35 Fix uma_zalloc_pcpu_arg() operation in case of !SMP build.
Reviewed by:	mjg
Sponsored by:	DARPA, AFRL
2018-06-21 11:43:54 +00:00
jtl
8222f5cb7c Make UMA and malloc(9) return non-executable memory in most cases.
Most kernel memory that is allocated after boot does not need to be
executable.  There are a few exceptions.  For example, kernel modules
do need executable memory, but they don't use UMA or malloc(9).  The
BPF JIT compiler also needs executable memory and did use malloc(9)
until r317072.

(Note that a side effect of r316767 was that the "small allocation"
path in UMA on amd64 already returned non-executable memory.  This
meant that some calls to malloc(9) or the UMA zone(9) allocator could
return executable memory, while others could return non-executable
memory.  This change makes the behavior consistent.)

This change makes malloc(9) return non-executable memory unless the new
M_EXEC flag is specified.  After this change, the UMA zone(9) allocator
will always return non-executable memory, and a KASSERT will catch
attempts to use the M_EXEC flag to allocate executable memory using
uma_zalloc() or its variants.

Allocations that do need executable memory have various choices.  They
may use the M_EXEC flag to malloc(9), or they may use a different VM
interfact to obtain executable pages.

Now that malloc(9) again allows executable allocations, this change also
reverts most of r317072.

PR:		228927
Reviewed by:	alc, kib, markj, jhb (previous version)
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D15691
2018-06-13 17:04:41 +00:00
mjg
844f744aa6 uma: implement provisional api for per-cpu zones
Per-cpu zone allocations are very rarely done compared to regular zones.
The intent is to avoid pessimizing the latter case with per-cpu specific
code.

In particular contrary to the claim in r334824, M_ZERO is sometimes being
used for such zones. But the zeroing method is completely different and
braching on it in the fast path for regular zones is a waste of time.
2018-06-08 21:40:03 +00:00
mjg
08fabf55c9 uma: fix up r334824
Turns out there is code which ends up passing M_ZERO to counters.
Since counters zero unconditionally on their own, just ignore drop the
flag in that place.
2018-06-08 05:40:36 +00:00
mjg
81c7eb074b uma: remove M_ZERO support for pcpu zones
Nothing in the tree uses it and pcpu zones have a fundamentally different use
case than the regular zones - they are not supposed to be allocated and freed
all the time.

This reduces pollution in the allocation fast path.
2018-06-08 03:16:16 +00:00
glebius
1b1a1a56a0 UMA memory debugging enabled with INVARIANTS consists of two things:
trashing freed memory and checking that allocated memory is properly
trashed, and also of keeping a bitset of freed items. Trashing/checking
creates a lot of CPU cache poisoning, while keeping debugging bitsets
consistent creates a lot of contention on UMA zone lock(s). The performance
difference between INVARIANTS kernel and normal one is mostly attributed
to UMA debugging, rather than to all KASSERT checks in the kernel.

Add loader tunable vm.debug.divisor that allows either to turn off UMA
debugging completely, or turn it on only for a fraction of allocations,
while still running all KASSERTs in kernel. That allows to run INVARIANTS
kernels in production environments without reducing load by orders of
magnitude, but still doing useful extra checks.

Default value is 1, meaning debug every allocation. Value of 0 would
disable UMA debugging completely. Values above 1 enable debugging only
for every N-th item. It isn't possible to strictly follow the number,
but still amount of debugging is reduced roughly by (N-1)/N percent.

Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D15199
2018-06-08 00:15:08 +00:00
mjg
2ae63d9ae0 uma: whack main zone counter update in the slow path
Cached counters are typically zero at this point so it performs
avoidable atomics. Everything reading them also reads the cached
ones, thus there is really no point.

Reviewed by:		jeff
2018-04-27 05:37:35 +00:00
markj
4f052bd0fe Add a UMA zone flag to disable the use of buckets.
This allows the creation of zones which don't do any caching in front of
the keg. If the zone is a cache zone, this means that UMA will not
attempt any memory allocations when allocating an item from the backend.
This is intended for use after a panic by netdump, but likely has other
applications.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D15184
2018-04-24 20:05:45 +00:00
glebius
a76113efee Use UMA_SLAB_SPACE macro. No functional change here. 2018-04-02 05:15:25 +00:00
glebius
f1818db2fc In uma_startup_count() handle special case when zone will fit into
single slab, but with alignment adjustment it won't. Again, when
there is only one item in a slab alignment can be ignored. See
previous revision of this file for more info.

PR:		227116
2018-04-02 05:14:31 +00:00
glebius
b8ac450ecc Handle a special case when a slab can fit only one allocation,
and zone has a large alignment. With alignment taken into
account uk_rsize will be greater than space in a slab. However,
since we have only one item per slab, it is always naturally
aligned.

Code that will panic before this change with 4k page:

	z = uma_zcreate("test", 3984, NULL, NULL, NULL, NULL, 31, 0);
	uma_zalloc(z, M_WAITOK);

A practical scenario to hit the panic is a machine with 56 CPUs
and 2 NUMA domains, which yields in zone size of 3984.

PR:		227116
MFC after:	2 weeks
2018-04-02 05:11:59 +00:00
jeff
c9b029efcd Add the flag ZONE_NOBUCKETCACHE. This flag instructions UMA not to keep
a cache of fully populated buckets.  This will be used in a follow-on
commit.

The flag idea was originally from markj.

Reviewed by:	markj, kib
Tested by:	pho
Sponsored by:	Netflix, Dell/EMC Isilon
2018-04-01 04:47:05 +00:00
kib
a5910f9167 For vm_zone_stats() sysctl handler, do not drain sbuf calling
copyout(9) while owning zone lock.

Despite old value sysctl buffer is wired, spurious faults might still
occur.

Note that we still own the uma_rwlock there, but this lock does not
participate in sensitive lock orders.

Reported and tested by:	pho (as part of the larger patch)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2018-03-24 13:48:53 +00:00
glebius
2c573b6df7 Fix boot_pages exhaustion on machines with many domains and cores, where
size of UMA zone allocation is greater than page size. In this case zone
of zones can not use UMA_MD_SMALL_ALLOC, and we  need to postpone switch
off of this zone from startup_alloc() until full launch of VM.

o Always supply number of VM zones to uma_startup_count(). On machines
  with UMA_MD_SMALL_ALLOC ignore it completely, unless zsize goes over
  a page. In the latter case account VM zones for number of allocations
  from the zone of zones.
o Rewrite startup_alloc() so that it will immediately switch off from
  itself any zone that is already capable of running real alloc.
  In worst case scenario we may leak a single page here. See comment
  in uma_startup_count().
o Hardcode call to uma_startup2() into vm_mem_init(). Otherwise some
  extra SYSINITs, e.g. vm_page_init() may sneak in before.
o While here, remove uma_boot_pages_mtx. With recent changes to boot
  pages calculation, we are guaranteed to use all of the boot_pages
  in the early single threaded stage.

Reported & tested by:	mav
2018-02-09 04:45:39 +00:00
glebius
ed8d237f4f Fix three miscalculations in amount of boot pages:
o Most of startup zones have struct uma_slab embedded into the slab,
  so provide macro UMA_SLAB_SPACE and use it instead of UMA_SLAB_SIZE,
  when calculating how many pages would certain kind of allocations
  require. Some zones are offpage, so we might have a positive inaccuracy.
o The keg for the zone of zones is allocated "dynamically", so we
  need +1 when calculating amount of pages for kegs. [1]
o The zones of zones and zones of kegs have arbitrary alignment of 32,
  and this also needs to be accounted for. [2]

While here, spread more comments and improve diagnostic messages.

Reported by:	pho [1], jtl [2]
2018-02-07 18:32:51 +00:00
glebius
8f926da325 Use correct arithmetic to calculate how many pages we need for kegs
and hashes.  There is no functional change with current sizes.
2018-02-06 22:13:40 +00:00
jeff
e67ec0d694 Use per-domain locks for vm page queue free. Move paging control from
global to per-domain state.  Protect reservations with the free lock
from the domain that they belong to.  Refactor to make vm domains more
of a first class object.

Reviewed by:    markj, kib, gallatin
Tested by:      pho
Sponsored by:   Netflix, Dell/EMC Isilon
Differential Revision:  https://reviews.freebsd.org/D14000
2018-02-06 22:10:07 +00:00
glebius
7fbf23740f Improve DIAGNOSTIC printf. Report using a boot page every time regardless
of booted status.
2018-02-06 22:08:43 +00:00
glebius
baecd1a4b3 Followup on r302393 by cperciva, improving calculation of boot pages required
for UMA startup.

o Introduce another stage of UMA startup, which is entered after
  vm_page_startup() finishes. After this stage we don't yet enable buckets,
  but we can ask VM for pages. Rename stages to meaningful names while here.
  New list of stages: BOOT_COLD, BOOT_STRAPPED, BOOT_PAGEALLOC, BOOT_BUCKETS,
  BOOT_RUNNING.
  Enabling page alloc earlier allows us to dramatically reduce number of
  boot pages required. What is more important number of zones becomes
  consistent across different machines, as no MD allocations are done before
  the BOOT_PAGEALLOC stage. Now only UMA internal zones actually need to use
  startup_alloc(), however that may change, so vm_page_startup() provides
  its need for early zones as argument.
o Introduce uma_startup_count() function, to avoid code duplication. The
  functions calculates sizes of zones zone and kegs zone, and calculates how
  many pages UMA will need to bootstrap.
  It counts not only of zone structures, but also of kegs, slabs and hashes.
o Hide uma_startup_foo() declarations from public file.
o Provide several DIAGNOSTIC printfs on boot_pages usage.
o Bugfix: when calculating zone of zones size use (mp_maxid + 1) instead of
  mp_ncpus. Use resulting number not only in the size argument to zone_ctor()
  but also as args.size.

Reviewed by:		imp, gallatin (earlier version)
Differential Revision:	https://reviews.freebsd.org/D14054
2018-02-06 04:16:00 +00:00
jeff
cc3d6a3370 Move VM_NUMA_ALLOC and DEVICE_NUMA under the single global config option NUMA.
Sponsored by:	Netflix, Dell/EMC Isilon
Discussed with:	jhb
2018-01-14 03:36:03 +00:00
jeff
f375b4dd66 Implement NUMA support in uma(9) and malloc(9). Allocations from specific
domains can be done by the _domain() API variants.  UMA also supports a
first-touch policy via the NUMA zone flag.

The slab layer is now segregated by VM domains and is precise.  It handles
iteration for round-robin directly.  The per-cpu cache layer remains
a mix of domains according to where memory is allocated and freed.  Well
behaved clients can achieve perfect locality with no performance penalty.

The direct domain allocation functions have to visit the slab layer and
so require per-zone locks which come at some expense.

Reviewed by:	Attilio (a slightly older version)
Tested by:	pho
Sponsored by:	Netflix, Dell/EMC Isilon
2018-01-12 23:25:05 +00:00
jeff
c17fd15c00 Fix arc after r326347 broke various memory limit queries. Use UMA features
rather than kmem arena size to determine available memory.

Initialize the UMA limit to LONG_MAX to avoid spurious wakeups on boot before
the real limit is set.

PR:		224330 (partial), 224080
Reviewed by:	markj, avg
Sponsored by:	Netflix / Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D13494
2018-01-02 04:35:56 +00:00
kib
d84b266e99 Perform all accesses to uma_reclaim_needed using atomic(9) KPI.
Reviewed by:	alc, jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D13534
2017-12-19 10:06:55 +00:00
markj
90ce7fadcd Fix the UMA reclaim worker after r326347.
atomic_set_*() sets a bit in the target memory location, so
atomic_set_int(&uma_reclaim_needed, 0) does not do what it looks like
it does.

PR:		224080
Reviewed by:	jeff, kib
Differential Revision:	https://reviews.freebsd.org/D13412
2017-12-07 19:38:09 +00:00
jeff
990ca74cdc Eliminate kmem_arena and kmem_object in preparation for further NUMA commits.
The arena argument to kmem_*() is now only used in an assert.  A follow-up
commit will remove the argument altogether before we freeze the API for the
next release.

This replaces the hard limit on kmem size with a soft limit imposed by UMA.  When
the soft limit is exceeded we periodically wakeup the UMA reclaim thread to
attempt to shrink KVA.  On 32bit architectures this should behave much more
gracefully as we exhaust KVA.  On 64bit the limits are likely never hit.

Reviewed by:	markj, kib (some objections)
Discussed with:	alc
Tested by:	pho
Sponsored by:	Netflix / Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D13187
2017-11-28 23:40:54 +00:00
pfg
78a6b08618 sys: general adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

No functional change intended.
2017-11-27 15:23:17 +00:00
kib
c05cd25802 Fix operator priority.
Sponsored by:	The FreeBSD Foundation
2017-11-08 23:25:05 +00:00
jeff
3c355d849c Replace manyinstances of VM_WAIT with blocking page allocation flags
similar to the kernel memory allocator.

This simplifies NUMA allocation because the domain will be known at wait
time and races between failure and sleeping are eliminated.  This also
reduces boilerplate code and simplifies callers.

A wait primitive is supplied for uma zones for similar reasons.  This
eliminates some non-specific VM_WAIT calls in favor of more explicit
sleeps that may be satisfied without new pages.

Reviewed by:	alc, kib, markj
Tested by:	pho
Sponsored by:	Netflix, Dell/EMC Isilon
2017-11-08 02:39:37 +00:00
markj
60fea4eba0 Fix a logic error in the item size calculation for internal UMA zones.
Kegs for internal zones always keep the slab header in the slab itself.
Therefore, when determining the allocation size, we need to take the
slab header size into account.

Reported and tested by:	ae, rakuco
Reviewed by:	avg
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D12342
2017-09-13 15:44:54 +00:00