freebsd-nq

Author	SHA1	Message	Date
Konstantin Belousov	1ac7c34486	malloc_aligned: roundup allocation size up to next power of two to make it use the right aligned zone. Reported by: melifaro Reviewed by: alc, markj (previous version) Discussed with: jrtc27 Tested by: pho (previous version) MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D28219	2021-01-21 23:34:10 +02:00
Konstantin Belousov	0781c79d48	Restrict supported alignment for malloc_domainset_aligned(9) to PAGE_SIZE. UMA page_alloc() does not take an alignment, so UMA can only handle alignment less then page size. Noted by: alc Reviewed by: alc, markj (previous version) Discussed with: jrtc27 Tested by: pho (previous version) MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D28219	2021-01-21 23:34:10 +02:00
Konstantin Belousov	3b15beb30b	Implement malloc_domainset_aligned(9). Change the power-of-two malloc zones to require alignment equal to the size []. Current uma allocator already provides such alignment, so in fact this change does not change anything except providing future-proof setup. Suggested by: markj [] Reviewed by: andrew, jah, markj Tested by: pho MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D28147	2021-01-17 19:29:05 +02:00
Mateusz Guzik	89deca0a33	malloc: make malloc_large closer to standalone This moves entire large alloc handling out of all consumers, apart from deciding to go there. This is a step towards creating a fast path. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D27198	2020-11-16 17:56:58 +00:00
Mateusz Guzik	9b9bb9ffa5	malloc: retire MALLOC_PROFILE The global array has prohibitive performance impact on multicore systems. The same data (and more) can be obtained with dtrace. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D27199	2020-11-13 19:22:53 +00:00
Mateusz Guzik	9aa6d792b5	malloc: retire malloc_last_fail The routine does not serve any practical purpose. Memory can be allocated in many other ways and most consumers pass the M_WAITOK flag, making malloc not fail in the first place. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D27143	2020-11-12 20:22:58 +00:00
Mateusz Guzik	f0c90a0931	malloc: provide 384 byte zone Total page count after buildworld on ZFS for 384 (if present) and 512 zones: before: 29713 after: 25946 per-zone page use: vm.uma.malloc_384.keg.domain.1.pages: 11621 vm.uma.malloc_384.keg.domain.0.pages: 11597 vm.uma.malloc_512.keg.domain.1.pages: 1280 vm.uma.malloc_512.keg.domain.0.pages: 1448 Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D27145	2020-11-09 22:59:41 +00:00
Mateusz Guzik	8e6526e966	malloc: retire mt_stats_zone in favor of pcpu_zone_64 Reviewed by: markj, imp Differential Revision: https://reviews.freebsd.org/D27142	2020-11-09 22:58:29 +00:00
Mateusz Guzik	e25d8b67c3	malloc: tweak the version check in r367432 to include type name While here fix a whitespace problem.	2020-11-07 01:32:16 +00:00
Mateusz Guzik	bdcc222644	malloc: move malloc_type_internal into malloc_type According to code comments the original motivation was to allow for malloc_type_internal changes without ABI breakage. This can be trivially accomplished by providing spare fields and versioning the struct, as implemented in the patch below. The upshots are one less memory indirection on each alloc and disappearance of mt_zone. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D27104	2020-11-06 21:33:59 +00:00
Mateusz Guzik	16b971ed6d	malloc: add a helper returning size allocated for given request Sample usage: kernel modules can decide whether to stick to malloc or create their own zone. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D27097	2020-11-05 16:21:21 +00:00
Mateusz Guzik	e1b6a7f83f	malloc: prefix zones with malloc- Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D27038	2020-11-02 17:39:15 +00:00
Mateusz Guzik	828afdda17	malloc: export kernel zones instead of relying on them being power-of-2 Reviewed by: markj (previous version) Differential Revision: https://reviews.freebsd.org/D27026	2020-11-02 17:38:08 +00:00
Mateusz Guzik	82c174a3b4	malloc: delegate M_EXEC handling to dedicacted routines It is almost never needed and adds an avoidable branch. While here do minior clean ups in preparation for larger changes. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D27019	2020-10-30 20:02:32 +00:00
Mateusz Guzik	6fed89b179	kern: clean up empty lines in .c and .h files	2020-09-01 22:12:32 +00:00
Vladimir Kondratyev	5d4bf0578f	LinuxKPI: Implement ksize() function. In Linux, ksize() gets the actual amount of memory allocated for a given object. This commit adds malloc_usable_size() to FreeBSD KPI which does the same. It also maps LinuxKPI ksize() to newly created function. ksize() function is used by drm-kmod. Reviewed by: hselasky, kib MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D26215	2020-08-29 19:26:31 +00:00
Mark Johnston	b21b022a81	Revert r364310. Some of the resulting fallout in CAM does not appear straightforward to fix, so simply revert the commit for now in the absence of a better solution. Discussed with: mjg Reported by: dhw	2020-08-18 14:09:49 +00:00
Gleb Smirnoff	1921bb7b68	With INVARIANTS panic immediately if M_WAITOK is requested in a non-sleepable context. Previously only _sleep() would panic. This will catch misuse of M_WAITOK at development stage rather than at stress load stage. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D26027	2020-08-17 15:37:08 +00:00
Mark Johnston	96ad26eefb	Remove free_domain() and uma_zfree_domain(). These functions were introduced before UMA started ensuring that freed memory gets placed in domain-local caches. They no longer serve any purpose since UMA now provides their functionality by default. Remove them to simplyify the kernel memory allocator interfaces a bit. Reviewed by: cem, kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D25937	2020-08-04 13:58:36 +00:00
Pawel Biernacki	7029da5c36	Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many) r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly marked). Use it in preparation for a general review of all nodes. This is non-functional change that adds annotations to SYSCTL_NODE and SYSCTL_PROC nodes using one of the soon-to-be-required flags. Mark all obvious cases as MPSAFE. All entries that haven't been marked as MPSAFE before are by default marked as NEEDGIANT Approved by: kib (mentor, blanket) Commented by: kib, gallatin, melifaro Differential Revision: https://reviews.freebsd.org/D23718	2020-02-26 14:26:36 +00:00
Ryan Libby	eaa17d4291	sys/vm: quiet -Wwrite-strings Discussed with: kib Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D23796	2020-02-23 03:32:04 +00:00
Matt Macy	45035becfe	Add zfree to zero allocation before free Key and cookie management typically wants to avoid information leaks by explicitly zeroing before free. This routine simplifies that by permitting consumers to do so without carrying the size around. Reviewed by: jeff@, jhb@ MFC after: 1 week Sponsored by: Rubicon Communications, LLC (Netgate) Differential Revision: https://reviews.freebsd.org/D22790	2020-02-16 00:12:53 +00:00
Warner Losh	58aa35d429	Remove sparc64 kernel support Remove all sparc64 specific files Remove all sparc64 ifdefs Removee indireeect sparc64 ifdefs	2020-02-03 17:35:11 +00:00
Mark Johnston	dc727127f1	Change malloc_domain() to return the allocation size to the caller. Otherwise the malloc type accounting in malloc_domainset(9) is wrong after r355203. Reviewed by: rlibby Reported by: kaktus Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D23095	2020-01-09 15:02:48 +00:00
Jeff Roberson	6d6a03d7a8	Handle large mallocs by going directly to kmem. Taking a detour through UMA does not provide any additional value. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D22563	2019-11-29 03:14:10 +00:00
Jeff Roberson	b476ae7f52	Fix DEBUG_REDZONE build after r355169	2019-11-28 08:56:14 +00:00
Jeff Roberson	584061b480	Garbage collect the mostly unused us_keg field. Use appropriately named union members in vm_page.h to store the zone and slab. Remove some nearby dead code. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D22564	2019-11-28 07:49:25 +00:00
Gleb Smirnoff	5757b59f3e	Merge td_epochnest with td_no_sleeping. Epoch itself doesn't rely on the counter and it is provided merely for sleeping subsystems to check it. - In functions that sleep use THREAD_CAN_SLEEP() to assert correctness. With EPOCH_TRACE compiled print epoch info. - _sleep() was a wrong place to put the assertion for epoch, right place is sleepq_add(), as there ways to call the latter bypassing _sleep(). - Do not increase td_no_sleeping in non-preemptible epochs. The critical section would trigger all possible safeguards, no sleeping counter is extraneous. Reviewed by: kib	2019-10-29 17:28:25 +00:00
Gleb Smirnoff	4b25d1f2e3	Missing from r353596.	2019-10-15 21:32:38 +00:00
Gleb Smirnoff	bac060388f	When assertion for a thread not being in an epoch fails also print all entered epochs. Works with EPOCH_TRACE only. Reviewed by: hselasky Differential Revision: https://reviews.freebsd.org/D22017	2019-10-15 21:24:25 +00:00
Conrad Meyer	46d70077be	ddb: Add CSV option, sorting to 'show (malloc\|uma)' Add /i option for machine-parseable CSV output. This allows ready copy/ pasting into more sophisticated tooling outside of DDB. Add total zone size ("Memory Use") as a new column for UMA. For both, sort the displayed list on size (print the largest zones/types first). This is handy for quickly diagnosing "where has my memory gone?" at a high level. Submitted by: Emily Pettigrew <Emily.Pettigrew AT isilon.com> (earlier version) Sponsored by: Dell EMC Isilon	2019-10-11 01:31:31 +00:00
Konstantin Belousov	28b740da38	Handle overflow in calculating max kmem size. vm_kmem_size is u_long, and it might be not capable of holding page count times PAGE_SIZE, even when scaled down by VM_KMEM_SIZE_SCALE. As bde reported, 12G PAE config ends up with zero for kmem size. Explicitly check for overflow and clamp kmem size at vm_kmem_size_max. If we end up at zero size because VM_KMEM_SIZE_MAX is not defined, panic with clear explanation rather then failing in a way which is hard to relate. Reported by: bde, pho Tested by: pho Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D18767	2019-01-14 07:31:19 +00:00
Mark Johnston	26e9d9b01f	Fix DDB's "show malloc" after r338899. MFC after: 3 days Sponsored by: The FreeBSD Foundation	2018-12-19 00:17:22 +00:00
Mark Johnston	9978bd996b	Add malloc_domainset(9) and _domainset variants to other allocator KPIs. Remove malloc_domain(9) and most other _domain KPIs added in r327900. The new functions allow the caller to specify a general NUMA domain selection policy, rather than specifically requesting an allocation from a specific domain. The latter policy tends to interact poorly with M_WAITOK, resulting in situations where a caller is blocked indefinitely because the specified domain is depleted. Most existing consumers of the _domain KPIs are converted to instead use a DOMAINSET_PREF() policy, in which we fall back to other domains to satisfy the allocation request. This change also defines a set of DOMAINSET_FIXED() policies, which only permit allocations from the specified domain. Discussed with: gallatin, jeff Reported and tested by: pho (previous version) MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17418	2018-10-30 18:26:34 +00:00
Mateusz Guzik	9afff6b1c0	Eliminate false sharing in malloc due to statistic collection Currently stats are collected in a MAXCPU-sized array which is not aligned and suffers enormous false-sharing. Fix the problem by utilizing per-cpu allocation. The counter(9) API is not used here as it is too incomplete and does not provide a win over per-cpu zone sized for malloc stats struct. In particular stats are being reported for each cpu separately by just copying what is supposed to be an array element for given cpu. This eliminates significant false-sharing during malloc-heavy tests e.g. on Skylake. See the review for details. Reviewed by: markj Approved by: re (kib) Differential Revision: https://reviews.freebsd.org/D17289	2018-09-23 19:00:06 +00:00
Alan Cox	49bfa624ac	Eliminate the arena parameter to kmem_free(). Implicitly this corrects an error in the function hypercall_memfree(), where the wrong arena was being passed to kmem_free(). Introduce a per-page flag, VPO_KMEM_EXEC, to mark physical pages that are mapped in kmem with execute permissions. Use this flag to determine which arena the kmem virtual addresses are returned to. Eliminate UMA_SLAB_KRWX. The introduction of VPO_KMEM_EXEC makes it redundant. Update the nearby comment for UMA_SLAB_KERNEL. Reviewed by: kib, markj Discussed with: jeff Approved by: re (marius) Differential Revision: https://reviews.freebsd.org/D16845	2018-08-25 19:38:08 +00:00
Alan Cox	44d0efb215	Eliminate kmem_alloc_contig()'s unused arena parameter. Reviewed by: hselasky, kib, markj Discussed with: jeff Differential Revision: https://reviews.freebsd.org/D16799	2018-08-20 15:57:27 +00:00
Jonathan T. Looney	0766f278d8	Make UMA and malloc(9) return non-executable memory in most cases. Most kernel memory that is allocated after boot does not need to be executable. There are a few exceptions. For example, kernel modules do need executable memory, but they don't use UMA or malloc(9). The BPF JIT compiler also needs executable memory and did use malloc(9) until r317072. (Note that a side effect of r316767 was that the "small allocation" path in UMA on amd64 already returned non-executable memory. This meant that some calls to malloc(9) or the UMA zone(9) allocator could return executable memory, while others could return non-executable memory. This change makes the behavior consistent.) This change makes malloc(9) return non-executable memory unless the new M_EXEC flag is specified. After this change, the UMA zone(9) allocator will always return non-executable memory, and a KASSERT will catch attempts to use the M_EXEC flag to allocate executable memory using uma_zalloc() or its variants. Allocations that do need executable memory have various choices. They may use the M_EXEC flag to malloc(9), or they may use a different VM interfact to obtain executable pages. Now that malloc(9) again allows executable allocations, this change also reverts most of r317072. PR: 228927 Reviewed by: alc, kib, markj, jhb (previous version) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D15691	2018-06-13 17:04:41 +00:00
Mateusz Guzik	34c538c356	malloc: try to use builtins for zeroing at the callsite Plenty of allocation sites pass M_ZERO and sizes which are small and known at compilation time. Handling them internally in malloc loses this information and results in avoidable calls to memset. Instead, let the compiler take the advantage of it whenever possible. Discussed with: jeff	2018-06-02 22:20:09 +00:00
Matt Macy	5072a5f465	malloc: avoid possibly returning stack garbage if MALLOC_DEBUG is defined	2018-05-19 04:43:49 +00:00
Matt Macy	06bf2a6aef	Add simple preempt safe epoch API Read locking is over used in the kernel to guarantee liveness. This API makes it easy to provide livenes guarantees without atomics. Includes epoch_test kernel module to stress test the API. Documentation will follow initial use case. Test case and improvements to preemption handling in response to discussion with mjg@ Reviewed by: imp@, shurd@ Approved by: sbruno@	2018-05-10 17:55:24 +00:00
Mateusz Guzik	7cd794214a	dtrace: depessimize dtmalloc when dtrace is active Each malloc/free was testing dtrace_malloc_enabled and forcing extra reads from the malloc type struct to see if perhaps a dtmalloc probe was on. Treat it like lockstat and sdt: have a global bolean.	2018-04-24 01:06:20 +00:00
Mateusz Guzik	c9e05ccd62	malloc: stop reading the subzone if MALLOC_DEBUG_MAXZONES == 1 (the default) malloc was showing at the top of profile during while running microbenchmarks. #define DTMALLOC_PROBE_MAX 2 struct malloc_type_internal { uint32_t mti_probes[DTMALLOC_PROBE_MAX]; u_char mti_zone; struct malloc_type_stats mti_stats[MAXCPU]; }; Reading mti_zone it wastes a cacheline to hold mti_probes + mti_zone (which we know is 0) + part of malloc stats of the first cpu which on top induces false-sharing. In particular will-it-scale lock1_processes -t 128 -s 10: before: average:45879692 after: average:51655596 Note the counters can be padded but the right fix is to move them to counter(9), leaving the struct read-only after creation (modulo dtrace probes).	2018-04-23 22:28:49 +00:00
Gleb Smirnoff	f7d3578564	Fix boot_pages exhaustion on machines with many domains and cores, where size of UMA zone allocation is greater than page size. In this case zone of zones can not use UMA_MD_SMALL_ALLOC, and we need to postpone switch off of this zone from startup_alloc() until full launch of VM. o Always supply number of VM zones to uma_startup_count(). On machines with UMA_MD_SMALL_ALLOC ignore it completely, unless zsize goes over a page. In the latter case account VM zones for number of allocations from the zone of zones. o Rewrite startup_alloc() so that it will immediately switch off from itself any zone that is already capable of running real alloc. In worst case scenario we may leak a single page here. See comment in uma_startup_count(). o Hardcode call to uma_startup2() into vm_mem_init(). Otherwise some extra SYSINITs, e.g. vm_page_init() may sneak in before. o While here, remove uma_boot_pages_mtx. With recent changes to boot pages calculation, we are guaranteed to use all of the boot_pages in the early single threaded stage. Reported & tested by: mav	2018-02-09 04:45:39 +00:00
Gleb Smirnoff	f4bef67c9c	Followup on r302393 by cperciva, improving calculation of boot pages required for UMA startup. o Introduce another stage of UMA startup, which is entered after vm_page_startup() finishes. After this stage we don't yet enable buckets, but we can ask VM for pages. Rename stages to meaningful names while here. New list of stages: BOOT_COLD, BOOT_STRAPPED, BOOT_PAGEALLOC, BOOT_BUCKETS, BOOT_RUNNING. Enabling page alloc earlier allows us to dramatically reduce number of boot pages required. What is more important number of zones becomes consistent across different machines, as no MD allocations are done before the BOOT_PAGEALLOC stage. Now only UMA internal zones actually need to use startup_alloc(), however that may change, so vm_page_startup() provides its need for early zones as argument. o Introduce uma_startup_count() function, to avoid code duplication. The functions calculates sizes of zones zone and kegs zone, and calculates how many pages UMA will need to bootstrap. It counts not only of zone structures, but also of kegs, slabs and hashes. o Hide uma_startup_foo() declarations from public file. o Provide several DIAGNOSTIC printfs on boot_pages usage. o Bugfix: when calculating zone of zones size use (mp_maxid + 1) instead of mp_ncpus. Use resulting number not only in the size argument to zone_ctor() but also as args.size. Reviewed by: imp, gallatin (earlier version) Differential Revision: https://reviews.freebsd.org/D14054	2018-02-06 04:16:00 +00:00
Li-Wen Hsu	5a70796a71	Fix build for architectures where size_t is not unsigned long Reviewed by: cem Differential Revision: https://reviews.freebsd.org/D14045	2018-01-25 06:37:14 +00:00
Conrad Meyer	bd555da94b	malloc(9): Change nominal size to size_t to match standard C No functional change -- size_t matches unsigned long on all platforms. Reported by: bde Discussed with: jhb Sponsored by: Dell EMC Isilon	2018-01-24 19:37:18 +00:00
Jeff Roberson	ab3185d15e	Implement NUMA support in uma(9) and malloc(9). Allocations from specific domains can be done by the _domain() API variants. UMA also supports a first-touch policy via the NUMA zone flag. The slab layer is now segregated by VM domains and is precise. It handles iteration for round-robin directly. The per-cpu cache layer remains a mix of domains according to where memory is allocated and freed. Well behaved clients can achieve perfect locality with no performance penalty. The direct domain allocation functions have to visit the slab layer and so require per-zone locks which come at some expense. Reviewed by: Attilio (a slightly older version) Tested by: pho Sponsored by: Netflix, Dell/EMC Isilon	2018-01-12 23:25:05 +00:00
Conrad Meyer	c02fc9607a	mallocarray(9): panic if the requested allocation would overflow Additionally, move the overflow check logic out to WOULD_OVERFLOW() for consumers to have a common means of testing for overflowing allocations. WOULD_OVERFLOW() should be a secondary check -- on 64-bit platforms, just because an allocation won't overflow size_t does not mean it is a sane size to request. Callers should be imposing reasonable allocation limits far, far, below overflow. Discussed with: emaste, jhb, kp Sponsored by: Dell EMC Isilon	2018-01-10 21:49:45 +00:00
Kristof Provost	fd91e076c1	Introduce mallocarray() in the kernel Similar to calloc() the mallocarray() function checks for integer overflows before allocating memory. It does not zero memory, unless the M_ZERO flag is set. Reviewed by: pfg, vangyzen (previous version), imp (previous version) Obtained from: OpenBSD Differential Revision: https://reviews.freebsd.org/D13766	2018-01-07 13:21:01 +00:00

1 2 3 4 5 ...

273 Commits