Commit Graph

273 Commits

Author SHA1 Message Date
Konstantin Belousov
1ac7c34486 malloc_aligned: roundup allocation size up to next power of two
to make it use the right aligned zone.

Reported by:	melifaro
Reviewed by:	alc, markj (previous version)
Discussed with:	jrtc27
Tested by:	pho (previous version)
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D28219
2021-01-21 23:34:10 +02:00
Konstantin Belousov
0781c79d48 Restrict supported alignment for malloc_domainset_aligned(9) to PAGE_SIZE.
UMA page_alloc() does not take an alignment, so UMA can only handle
alignment less then page size.

Noted by:	alc
Reviewed by:	alc, markj (previous version)
Discussed with:	jrtc27
Tested by:	pho (previous version)
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D28219
2021-01-21 23:34:10 +02:00
Konstantin Belousov
3b15beb30b Implement malloc_domainset_aligned(9).
Change the power-of-two malloc zones to require alignment equal to the
size [*].  Current uma allocator already provides such alignment, so in
fact this change does not change anything except providing future-proof
setup.

Suggested by:	markj [*]
Reviewed by:	andrew, jah, markj
Tested by:	pho
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D28147
2021-01-17 19:29:05 +02:00
Mateusz Guzik
89deca0a33 malloc: make malloc_large closer to standalone
This moves entire large alloc handling out of all consumers, apart from
deciding to go there.

This is a step towards creating a fast path.

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D27198
2020-11-16 17:56:58 +00:00
Mateusz Guzik
9b9bb9ffa5 malloc: retire MALLOC_PROFILE
The global array has prohibitive performance impact on multicore systems.

The same data (and more) can be obtained with dtrace.

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D27199
2020-11-13 19:22:53 +00:00
Mateusz Guzik
9aa6d792b5 malloc: retire malloc_last_fail
The routine does not serve any practical purpose.

Memory can be allocated in many other ways and most consumers pass the
M_WAITOK flag, making malloc not fail in the first place.

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D27143
2020-11-12 20:22:58 +00:00
Mateusz Guzik
f0c90a0931 malloc: provide 384 byte zone
Total page count after buildworld on ZFS for 384 (if present) and 512 zones:
before: 29713
after: 25946

per-zone page use:
vm.uma.malloc_384.keg.domain.1.pages: 11621
vm.uma.malloc_384.keg.domain.0.pages: 11597
vm.uma.malloc_512.keg.domain.1.pages: 1280
vm.uma.malloc_512.keg.domain.0.pages: 1448

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D27145
2020-11-09 22:59:41 +00:00
Mateusz Guzik
8e6526e966 malloc: retire mt_stats_zone in favor of pcpu_zone_64
Reviewed by:	markj, imp
Differential Revision:	https://reviews.freebsd.org/D27142
2020-11-09 22:58:29 +00:00
Mateusz Guzik
e25d8b67c3 malloc: tweak the version check in r367432 to include type name
While here fix a whitespace problem.
2020-11-07 01:32:16 +00:00
Mateusz Guzik
bdcc222644 malloc: move malloc_type_internal into malloc_type
According to code comments the original motivation was to allow for
malloc_type_internal changes without ABI breakage. This can be trivially
accomplished by providing spare fields and versioning the struct, as
implemented in the patch below.

The upshots are one less memory indirection on each alloc and disappearance
of mt_zone.

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D27104
2020-11-06 21:33:59 +00:00
Mateusz Guzik
16b971ed6d malloc: add a helper returning size allocated for given request
Sample usage: kernel modules can decide whether to stick to malloc or
create their own zone.

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D27097
2020-11-05 16:21:21 +00:00
Mateusz Guzik
e1b6a7f83f malloc: prefix zones with malloc-
Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D27038
2020-11-02 17:39:15 +00:00
Mateusz Guzik
828afdda17 malloc: export kernel zones instead of relying on them being power-of-2
Reviewed by:	markj (previous version)
Differential Revision:	https://reviews.freebsd.org/D27026
2020-11-02 17:38:08 +00:00
Mateusz Guzik
82c174a3b4 malloc: delegate M_EXEC handling to dedicacted routines
It is almost never needed and adds an avoidable branch.

While here do minior clean ups in preparation for larger changes.

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D27019
2020-10-30 20:02:32 +00:00
Mateusz Guzik
6fed89b179 kern: clean up empty lines in .c and .h files 2020-09-01 22:12:32 +00:00
Vladimir Kondratyev
5d4bf0578f LinuxKPI: Implement ksize() function.
In Linux, ksize() gets the actual amount of memory allocated for a given
object. This commit adds malloc_usable_size() to FreeBSD KPI which does
the same. It also maps LinuxKPI ksize() to newly created function.

ksize() function is used by drm-kmod.

Reviewed by:	hselasky, kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D26215
2020-08-29 19:26:31 +00:00
Mark Johnston
b21b022a81 Revert r364310.
Some of the resulting fallout in CAM does not appear straightforward to
fix, so simply revert the commit for now in the absence of a better
solution.

Discussed with:	mjg
Reported by:	dhw
2020-08-18 14:09:49 +00:00
Gleb Smirnoff
1921bb7b68 With INVARIANTS panic immediately if M_WAITOK is requested in a
non-sleepable context.  Previously only _sleep() would panic.
This will catch misuse of M_WAITOK at development stage rather
than at stress load stage.

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D26027
2020-08-17 15:37:08 +00:00
Mark Johnston
96ad26eefb Remove free_domain() and uma_zfree_domain().
These functions were introduced before UMA started ensuring that freed
memory gets placed in domain-local caches.  They no longer serve any
purpose since UMA now provides their functionality by default.  Remove
them to simplyify the kernel memory allocator interfaces a bit.

Reviewed by:	cem, kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25937
2020-08-04 13:58:36 +00:00
Pawel Biernacki
7029da5c36 Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE.  All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by:	kib (mentor, blanket)
Commented by:	kib, gallatin, melifaro
Differential Revision:	https://reviews.freebsd.org/D23718
2020-02-26 14:26:36 +00:00
Ryan Libby
eaa17d4291 sys/vm: quiet -Wwrite-strings
Discussed with:	kib
Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D23796
2020-02-23 03:32:04 +00:00
Matt Macy
45035becfe Add zfree to zero allocation before free
Key and cookie management typically wants to
avoid information leaks by explicitly zeroing
before free. This routine simplifies that by
permitting consumers to do so without carrying
the size around.

Reviewed by:	jeff@, jhb@
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC (Netgate)
Differential Revision:	https://reviews.freebsd.org/D22790
2020-02-16 00:12:53 +00:00
Warner Losh
58aa35d429 Remove sparc64 kernel support
Remove all sparc64 specific files
Remove all sparc64 ifdefs
Removee indireeect sparc64 ifdefs
2020-02-03 17:35:11 +00:00
Mark Johnston
dc727127f1 Change malloc_domain() to return the allocation size to the caller.
Otherwise the malloc type accounting in malloc_domainset(9) is wrong
after r355203.

Reviewed by:	rlibby
Reported by:	kaktus
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D23095
2020-01-09 15:02:48 +00:00
Jeff Roberson
6d6a03d7a8 Handle large mallocs by going directly to kmem. Taking a detour through
UMA does not provide any additional value.

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D22563
2019-11-29 03:14:10 +00:00
Jeff Roberson
b476ae7f52 Fix DEBUG_REDZONE build after r355169 2019-11-28 08:56:14 +00:00
Jeff Roberson
584061b480 Garbage collect the mostly unused us_keg field. Use appropriately named
union members in vm_page.h to store the zone and slab.  Remove some nearby
dead code.

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D22564
2019-11-28 07:49:25 +00:00
Gleb Smirnoff
5757b59f3e Merge td_epochnest with td_no_sleeping.
Epoch itself doesn't rely on the counter and it is provided
merely for sleeping subsystems to check it.

- In functions that sleep use THREAD_CAN_SLEEP() to assert
  correctness.  With EPOCH_TRACE compiled print epoch info.
- _sleep() was a wrong place to put the assertion for epoch,
  right place is sleepq_add(), as there ways to call the
  latter bypassing _sleep().
- Do not increase td_no_sleeping in non-preemptible epochs.
  The critical section would trigger all possible safeguards,
  no sleeping counter is extraneous.

Reviewed by:	kib
2019-10-29 17:28:25 +00:00
Gleb Smirnoff
4b25d1f2e3 Missing from r353596. 2019-10-15 21:32:38 +00:00
Gleb Smirnoff
bac060388f When assertion for a thread not being in an epoch fails also print all
entered epochs. Works with EPOCH_TRACE only.

Reviewed by:	hselasky
Differential Revision:	https://reviews.freebsd.org/D22017
2019-10-15 21:24:25 +00:00
Conrad Meyer
46d70077be ddb: Add CSV option, sorting to 'show (malloc|uma)'
Add /i option for machine-parseable CSV output.  This allows ready copy/
pasting into more sophisticated tooling outside of DDB.

Add total zone size ("Memory Use") as a new column for UMA.

For both, sort the displayed list on size (print the largest zones/types
first).  This is handy for quickly diagnosing "where has my memory gone?" at
a high level.

Submitted by:	Emily Pettigrew <Emily.Pettigrew AT isilon.com> (earlier version)
Sponsored by:	Dell EMC Isilon
2019-10-11 01:31:31 +00:00
Konstantin Belousov
28b740da38 Handle overflow in calculating max kmem size.
vm_kmem_size is u_long, and it might be not capable of holding page
count times PAGE_SIZE, even when scaled down by VM_KMEM_SIZE_SCALE.  As
bde reported, 12G PAE config ends up with zero for kmem size.

Explicitly check for overflow and clamp kmem size at vm_kmem_size_max.
If we end up at zero size because VM_KMEM_SIZE_MAX is not defined,
panic with clear explanation rather then failing in a way which is
hard to relate.

Reported by:	bde, pho
Tested by:	pho
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D18767
2019-01-14 07:31:19 +00:00
Mark Johnston
26e9d9b01f Fix DDB's "show malloc" after r338899.
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2018-12-19 00:17:22 +00:00
Mark Johnston
9978bd996b Add malloc_domainset(9) and _domainset variants to other allocator KPIs.
Remove malloc_domain(9) and most other _domain KPIs added in r327900.
The new functions allow the caller to specify a general NUMA domain
selection policy, rather than specifically requesting an allocation from
a specific domain.  The latter policy tends to interact poorly with
M_WAITOK, resulting in situations where a caller is blocked indefinitely
because the specified domain is depleted.  Most existing consumers of
the _domain KPIs are converted to instead use a DOMAINSET_PREF() policy,
in which we fall back to other domains to satisfy the allocation
request.

This change also defines a set of DOMAINSET_FIXED() policies, which
only permit allocations from the specified domain.

Discussed with:	gallatin, jeff
Reported and tested by:	pho (previous version)
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17418
2018-10-30 18:26:34 +00:00
Mateusz Guzik
9afff6b1c0 Eliminate false sharing in malloc due to statistic collection
Currently stats are collected in a MAXCPU-sized array which is not
aligned and suffers enormous false-sharing. Fix the problem by
utilizing per-cpu allocation.

The counter(9) API is not used here as it is too incomplete and does
not provide a win over per-cpu zone sized for malloc stats struct. In
particular stats are being reported for each cpu separately by just
copying what is supposed to be an array element for given cpu.

This eliminates significant false-sharing during malloc-heavy tests
e.g. on Skylake. See the review for details.

Reviewed by:	markj
Approved by:	re (kib)
Differential Revision:	https://reviews.freebsd.org/D17289
2018-09-23 19:00:06 +00:00
Alan Cox
49bfa624ac Eliminate the arena parameter to kmem_free(). Implicitly this corrects an
error in the function hypercall_memfree(), where the wrong arena was being
passed to kmem_free().

Introduce a per-page flag, VPO_KMEM_EXEC, to mark physical pages that are
mapped in kmem with execute permissions.  Use this flag to determine which
arena the kmem virtual addresses are returned to.

Eliminate UMA_SLAB_KRWX.  The introduction of VPO_KMEM_EXEC makes it
redundant.

Update the nearby comment for UMA_SLAB_KERNEL.

Reviewed by:	kib, markj
Discussed with:	jeff
Approved by:	re (marius)
Differential Revision:	https://reviews.freebsd.org/D16845
2018-08-25 19:38:08 +00:00
Alan Cox
44d0efb215 Eliminate kmem_alloc_contig()'s unused arena parameter.
Reviewed by:	hselasky, kib, markj
Discussed with:	jeff
Differential Revision:	https://reviews.freebsd.org/D16799
2018-08-20 15:57:27 +00:00
Jonathan T. Looney
0766f278d8 Make UMA and malloc(9) return non-executable memory in most cases.
Most kernel memory that is allocated after boot does not need to be
executable.  There are a few exceptions.  For example, kernel modules
do need executable memory, but they don't use UMA or malloc(9).  The
BPF JIT compiler also needs executable memory and did use malloc(9)
until r317072.

(Note that a side effect of r316767 was that the "small allocation"
path in UMA on amd64 already returned non-executable memory.  This
meant that some calls to malloc(9) or the UMA zone(9) allocator could
return executable memory, while others could return non-executable
memory.  This change makes the behavior consistent.)

This change makes malloc(9) return non-executable memory unless the new
M_EXEC flag is specified.  After this change, the UMA zone(9) allocator
will always return non-executable memory, and a KASSERT will catch
attempts to use the M_EXEC flag to allocate executable memory using
uma_zalloc() or its variants.

Allocations that do need executable memory have various choices.  They
may use the M_EXEC flag to malloc(9), or they may use a different VM
interfact to obtain executable pages.

Now that malloc(9) again allows executable allocations, this change also
reverts most of r317072.

PR:		228927
Reviewed by:	alc, kib, markj, jhb (previous version)
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D15691
2018-06-13 17:04:41 +00:00
Mateusz Guzik
34c538c356 malloc: try to use builtins for zeroing at the callsite
Plenty of allocation sites pass M_ZERO and sizes which are small and known
at compilation time. Handling them internally in malloc loses this information
and results in avoidable calls to memset.

Instead, let the compiler take the advantage of it whenever possible.

Discussed with:	jeff
2018-06-02 22:20:09 +00:00
Matt Macy
5072a5f465 malloc: avoid possibly returning stack garbage if MALLOC_DEBUG is defined 2018-05-19 04:43:49 +00:00
Matt Macy
06bf2a6aef Add simple preempt safe epoch API
Read locking is over used in the kernel to guarantee liveness. This API makes
it easy to provide livenes guarantees without atomics.

Includes epoch_test kernel module to stress test the API.

Documentation will follow initial use case.

Test case and improvements to preemption handling in response to discussion
with mjg@

Reviewed by:	imp@, shurd@
Approved by:	sbruno@
2018-05-10 17:55:24 +00:00
Mateusz Guzik
7cd794214a dtrace: depessimize dtmalloc when dtrace is active
Each malloc/free was testing dtrace_malloc_enabled and forcing
extra reads from the malloc type struct to see if perhaps a
dtmalloc probe was on.

Treat it like lockstat and sdt: have a global bolean.
2018-04-24 01:06:20 +00:00
Mateusz Guzik
c9e05ccd62 malloc: stop reading the subzone if MALLOC_DEBUG_MAXZONES == 1 (the default)
malloc was showing at the top of profile during while running microbenchmarks.

#define DTMALLOC_PROBE_MAX              2
struct malloc_type_internal {
        uint32_t        mti_probes[DTMALLOC_PROBE_MAX];
        u_char          mti_zone;
        struct malloc_type_stats        mti_stats[MAXCPU];
};

Reading mti_zone it wastes a cacheline to hold mti_probes + mti_zone
(which we know is 0) + part of malloc stats of the first cpu which on top
induces false-sharing.

In particular will-it-scale lock1_processes -t 128 -s 10:
before: average:45879692
after:  average:51655596

Note the counters can be padded but the right fix is to move them to
counter(9), leaving the struct read-only after creation (modulo dtrace
probes).
2018-04-23 22:28:49 +00:00
Gleb Smirnoff
f7d3578564 Fix boot_pages exhaustion on machines with many domains and cores, where
size of UMA zone allocation is greater than page size. In this case zone
of zones can not use UMA_MD_SMALL_ALLOC, and we  need to postpone switch
off of this zone from startup_alloc() until full launch of VM.

o Always supply number of VM zones to uma_startup_count(). On machines
  with UMA_MD_SMALL_ALLOC ignore it completely, unless zsize goes over
  a page. In the latter case account VM zones for number of allocations
  from the zone of zones.
o Rewrite startup_alloc() so that it will immediately switch off from
  itself any zone that is already capable of running real alloc.
  In worst case scenario we may leak a single page here. See comment
  in uma_startup_count().
o Hardcode call to uma_startup2() into vm_mem_init(). Otherwise some
  extra SYSINITs, e.g. vm_page_init() may sneak in before.
o While here, remove uma_boot_pages_mtx. With recent changes to boot
  pages calculation, we are guaranteed to use all of the boot_pages
  in the early single threaded stage.

Reported & tested by:	mav
2018-02-09 04:45:39 +00:00
Gleb Smirnoff
f4bef67c9c Followup on r302393 by cperciva, improving calculation of boot pages required
for UMA startup.

o Introduce another stage of UMA startup, which is entered after
  vm_page_startup() finishes. After this stage we don't yet enable buckets,
  but we can ask VM for pages. Rename stages to meaningful names while here.
  New list of stages: BOOT_COLD, BOOT_STRAPPED, BOOT_PAGEALLOC, BOOT_BUCKETS,
  BOOT_RUNNING.
  Enabling page alloc earlier allows us to dramatically reduce number of
  boot pages required. What is more important number of zones becomes
  consistent across different machines, as no MD allocations are done before
  the BOOT_PAGEALLOC stage. Now only UMA internal zones actually need to use
  startup_alloc(), however that may change, so vm_page_startup() provides
  its need for early zones as argument.
o Introduce uma_startup_count() function, to avoid code duplication. The
  functions calculates sizes of zones zone and kegs zone, and calculates how
  many pages UMA will need to bootstrap.
  It counts not only of zone structures, but also of kegs, slabs and hashes.
o Hide uma_startup_foo() declarations from public file.
o Provide several DIAGNOSTIC printfs on boot_pages usage.
o Bugfix: when calculating zone of zones size use (mp_maxid + 1) instead of
  mp_ncpus. Use resulting number not only in the size argument to zone_ctor()
  but also as args.size.

Reviewed by:		imp, gallatin (earlier version)
Differential Revision:	https://reviews.freebsd.org/D14054
2018-02-06 04:16:00 +00:00
Li-Wen Hsu
5a70796a71 Fix build for architectures where size_t is not unsigned long
Reviewed by:	cem
Differential Revision:	https://reviews.freebsd.org/D14045
2018-01-25 06:37:14 +00:00
Conrad Meyer
bd555da94b malloc(9): Change nominal size to size_t to match standard C
No functional change -- size_t matches unsigned long on all platforms.

Reported by:	bde
Discussed with:	jhb
Sponsored by:	Dell EMC Isilon
2018-01-24 19:37:18 +00:00
Jeff Roberson
ab3185d15e Implement NUMA support in uma(9) and malloc(9). Allocations from specific
domains can be done by the _domain() API variants.  UMA also supports a
first-touch policy via the NUMA zone flag.

The slab layer is now segregated by VM domains and is precise.  It handles
iteration for round-robin directly.  The per-cpu cache layer remains
a mix of domains according to where memory is allocated and freed.  Well
behaved clients can achieve perfect locality with no performance penalty.

The direct domain allocation functions have to visit the slab layer and
so require per-zone locks which come at some expense.

Reviewed by:	Attilio (a slightly older version)
Tested by:	pho
Sponsored by:	Netflix, Dell/EMC Isilon
2018-01-12 23:25:05 +00:00
Conrad Meyer
c02fc9607a mallocarray(9): panic if the requested allocation would overflow
Additionally, move the overflow check logic out to WOULD_OVERFLOW() for
consumers to have a common means of testing for overflowing allocations.
WOULD_OVERFLOW() should be a secondary check -- on 64-bit platforms, just
because an allocation won't overflow size_t does not mean it is a sane size
to request.  Callers should be imposing reasonable allocation limits far,
far, below overflow.

Discussed with:	emaste, jhb, kp
Sponsored by:	Dell EMC Isilon
2018-01-10 21:49:45 +00:00
Kristof Provost
fd91e076c1 Introduce mallocarray() in the kernel
Similar to calloc() the mallocarray() function checks for integer
overflows before allocating memory.
It does not zero memory, unless the M_ZERO flag is set.

Reviewed by:	pfg, vangyzen (previous version), imp (previous version)
Obtained from:	OpenBSD
Differential Revision:	https://reviews.freebsd.org/D13766
2018-01-07 13:21:01 +00:00