freebsd-skq

Author	SHA1	Message	Date
Mark Johnston	d2f1c44bc9	uma: Remove the MINBUCKET flag from the flag name list This should have been done in r368399 / commit `f8b6c51538`. Reported by: rlibby Sponsored by: The FreeBSD Foundation	2020-12-27 17:01:33 -05:00
Mateusz Guzik	c3aa3bf97c	vm: clean up empty lines in .c and .h files	2020-09-01 21:20:45 +00:00
Eric van Gyzen	a2e194654f	memstat_kvm_uma: fix reading of uma_zone_domain structures Coverity flagged the scaling by sizeof(uzd). That is the type of the pointer, so the scaling was already done by pointer arithmetic. However, this was also passing a stack frame pointer to kvm_read, so it was doubly wrong. Move ZDOM_GET into the !_KERNEL section and use it in libmemstat. Reported by: Coverity Reviewed by: markj MFC after: 2 weeks Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D26213	2020-08-28 19:50:40 +00:00
Jeff Roberson	c8b0a88b8d	Clarify some language. Favor primary where both master and primary were used in conjunction with secondary.	2020-06-20 20:21:04 +00:00
Mark Johnston	54007ce8ae	Clean up uma_int.h a bit. This makes it easier to write libkvm programs that access UMA data structures. - Remove a couple of unused slab functions and make others local to uma_core.c. Similarly move SLAB_BITSETS, which affects the layout of slab structures, to uma_core.c. - Stop defining the slab structures under _KERNEL. There's no real reason they can't be visible to userspace like the rest of UMA's structures are. - Group KEG_ASSERT_COLD with other keg macros. - Convert an assertion about MAXMEMDOM to use _Static_assert. No functional change intended. Discussed with: jeff Reviewed by: rlibby Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D23980	2020-03-07 15:37:23 +00:00
Jeff Roberson	c6fd3e23f7	Use per-domain locks for the bucket cache. This gives much better concurrency when there are a large number of cores per-domain and multiple domains. Avoid taking the lock entirely if it will not be productive. ROUNDROBIN domains will have mixed memory in each domain and will load balance to all domains. While here refactor the zone/domain separation and bucket limits to simplify callers. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D23673	2020-02-19 18:48:46 +00:00
Mark Johnston	4ab3aee8fb	Reduce lock hold time in keg_drain(). Maintain a count of free slabs in the per-domain keg structure and use that to clear the free slab list in constant time for most cases. This helps minimize lock contention induced by reclamation, in preparation for proactive trimming of excesses of free memory. Reviewed by: jeff, rlibby Tested by: pho Differential Revision: https://reviews.freebsd.org/D23532	2020-02-11 20:06:33 +00:00
Ryan Libby	bae55c4aec	uma: remove UMA_ZFLAG_CACHEONLY flag UMA_ZFLAG_CACHEONLY was essentially the same thing as UMA_ZONE_VM, but with a more confusing name. Remove the flag, make UMA_ZONE_VM an inherit flag, and replace all references. Reviewed by: markj Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D23516	2020-02-06 08:32:25 +00:00
Ryan Libby	ec0d828071	uma: add UMA_ZONE_CONTIG, and a default contig_alloc For now, copy the mbuf allocator. Reviewed by: jeff, markj (previous version) Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D23237	2020-02-04 22:40:11 +00:00
Jeff Roberson	dc3915c8c6	Use STAILQ instead of TAILQ for bucket lists. We only need FIFO behavior and this is more space efficient. Stop queueing recently used buckets to the head of the list. If the bucket goes to a different processor the cache coherency will be more expensive. We already try to encourage cache-hot behavior in the per-cpu layer. Reviewed by: rlibby Differential Revision: https://reviews.freebsd.org/D23493	2020-02-04 02:41:24 +00:00
Jeff Roberson	d4665eaa66	Implement a safe memory reclamation feature that is tightly coupled with UMA. This is in the same family of algorithms as Epoch/QSBR/RCU/PARSEC but is a unique algorithm. This has 3x the performance of epoch in a write heavy workload with less than half of the read side cost. The memory overhead is significantly lessened by limiting the free-to-use latency. A synthetic test uses 1/20th of the memory vs Epoch. There is significant further discussion in the comments and code review. This code should be considered experimental. I will write a man page after it has settled. After further validation the VM will begin using this feature to permit lockless page lookups. Both markj and cperciva tested on arm64 at large core counts to verify fences on weaker ordering architectures. I will commit a stress testing tool in a follow-up. Reviewed by: mmacy, markj, rlibby, hselasky Discussed with: sbahara Differential Revision: https://reviews.freebsd.org/D22586	2020-01-31 00:49:51 +00:00
Ryan Libby	9b8db4d0a0	uma: split slabzone into two sizes By allowing more items per slab, we can improve memory efficiency for small allocs. If we were just to increase the bitmap size of the slabzone, we would then waste slabzone memory. So, split slabzone into two zones, one especially for 8-byte allocs (512 per slab). The practical effect should be reduced memory usage for counter(9). Reviewed by: jeff, markj Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D23149	2020-01-14 02:14:15 +00:00
Ryan Libby	4a8b575c6b	uma: unify layout paths and improve efficiency Unify the keg layout selection paths (keg_small_init, keg_large_init, keg_cachespread_init), and slightly improve memory efficiecy by: - using the padding of the final item to store the slab header, - not going OFFPAGE if we have a choice unless it improves efficiency. Reviewed by: jeff, markj Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D23048	2020-01-09 02:03:17 +00:00
Ryan Libby	54c5ae804f	uma: reorganize flags - Garbage collect UMA_ZONE_PAGEABLE & UMA_ZONE_STATIC. - Move flag VTOSLAB from public to private. - Introduce public NOTPAGE flag and make HASH private. - Introduce public NOTOUCH flag and make OFFPAGE private. - Update man page. The net effect of this should be to make the contract with clients more clear. Clients should choose constraints, UMA will figure out how to implement them. This also breaks the confusing double meaning of OFFPAGE. Reviewed by: jeff, markj Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D23016	2020-01-09 02:03:03 +00:00
Jeff Roberson	79c9f9429a	Fix uma boot pages calculations on NUMA machines that also don't have MD_UMA_SMALL_ALLOC. This is unusual but not impossible. Fix the alignemnt of zones while here. This was already correct because uz_cpu strongly aligned the zone structure but the specified alignment did not match reality and involved redundant defines. Reviewed by: markj, rlibby Differential Revision: https://reviews.freebsd.org/D23046	2020-01-06 02:51:19 +00:00
Jeff Roberson	31c251a046	Fix an assertion introduced in r356348. On architectures without UMA_MD_SMALL_ALLOC vmem has a more complicated startup sequence that violated the new assert. Resolve this by rewriting the COLD asserts to look at the per-cpu allocation counts for evidence of api activity. Discussed with: rlibby Reviewed by: markj Reported by: lwhsu	2020-01-04 19:29:25 +00:00
Jeff Roberson	dfe13344f5	UMA NUMA flag day. UMA_ZONE_NUMA was a source of confusion. Make the names more consistent with other NUMA features as UMA_ZONE_FIRSTTOUCH and UMA_ZONE_ROUNDROBIN. The system will now pick a select a default depending on kernel configuration. API users need only specify one if they want to override the default. Remove the UMA_XDOMAIN and UMA_FIRSTTOUCH kernel options and key only off of NUMA. XDOMAIN is now fast enough in all cases to enable whenever NUMA is. Reviewed by: markj Discussed with: rlibby Differential Revision: https://reviews.freebsd.org/D22831	2020-01-04 18:48:13 +00:00
Jeff Roberson	91d947bfbe	Sort cross-domain frees into per-domain buckets before inserting these onto their respective bucket lists. This is a several order of magnitude improvement in contention on the keg lock under heavy free traffic while requiring only an additional bucket per-domain worth of memory. Discussed with: markj, rlibby Differential Revision: https://reviews.freebsd.org/D22830	2020-01-04 07:56:28 +00:00
Jeff Roberson	8b987a7769	Use per-domain keg locks. This provides both a lock and separate space accounting for each NUMA domain. Independent keg domain locks are important with cross-domain frees. Hashed zones are non-numa and use a single keg lock to protect the hash table. Reviewed by: markj, rlibby Differential Revision: https://reviews.freebsd.org/D22829	2020-01-04 03:30:08 +00:00
Jeff Roberson	727c691857	Use a separate lock for the zone and keg. This provides concurrency between populating buckets from the slab layer and fetching full buckets from the zone layer. Eliminate some nonsense locking patterns where we lock to fetch a single variable. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D22828	2020-01-04 03:15:34 +00:00
Jeff Roberson	4bd61e19a2	Use atomics for the zone limit and sleeper count. This relies on the sleepq to serialize sleepers. This patch retains the existing sleep/wakeup paradigm to limit 'thundering herd' wakeups. It resolves a missing wakeup in one case but otherwise should be bug for bug compatible. In particular, there are still various races surrounding adjusting the limit via sysctl that are now documented. Discussed with: markj Reviewed by: rlibby Differential Revision: https://reviews.freebsd.org/D22827	2020-01-04 03:04:46 +00:00
Jeff Roberson	cc7ce83ae0	Further reduce the cacheline footprint of fast allocations by duplicating the zone size and flags fields in the per-cpu caches. This allows fast alloctions to proceed only touching the single per-cpu cacheline and simplifies the common case when no ctor/dtor is specified. Reviewed by: markj, rlibby Differential Revision: https://reviews.freebsd.org/D22826	2019-12-25 20:57:24 +00:00
Jeff Roberson	376b1ba394	Optimize fast path allocations by storing bucket headers in the per-cpu cache area. This allows us to check on bucket space for all per-cpu buckets with a single cacheline access and fewer branches. Reviewed by: markj, rlibby Differential Revision: https://reviews.freebsd.org/D22825	2019-12-25 20:50:53 +00:00
Ryan Libby	815db20425	uma dbg: flexible size for slab debug bitset too Recently (r355315) the size of the struct uma_slab bitset field us_free became dynamic instead of conservative. Now, make the debug bitset size dynamic too. The debug bitset is INVARIANTS-only, so in fact we don't care too much about the space savings that results from this, but enabling minimally-sized slabs on INVARIANTS builds is still important in order to be able to test new slab layouts effectively. Reviewed by: jeff (previous version), markj (previous version) Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D22759	2019-12-14 05:21:56 +00:00
Ryan Libby	d82c8ffb16	Revert r355706 & r355710 The quick fix didn't work. I'll sort it out tomorrow. Revert r355710: "libmemstat: unbreak build" Revert r355706: "uma dbg: flexible size for slab debug bitset too"	2019-12-13 11:21:28 +00:00
Ryan Libby	7508f15ff1	uma dbg: flexible size for slab debug bitset too Recently (r355315) the size of the struct uma_slab bitset field us_free became dynamic instead of conservative. Now, make the debug bitset size dynamic too. The debug bitset is INVARIANTS-only, so in fact we don't care too much about the space savings that results from this, but enabling minimally-sized slabs on INVARIANTS builds is still important in order to be able to test new slab layouts effectively. Reviewed by: jeff, markj Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D22759	2019-12-13 09:31:59 +00:00
Ryan Libby	6d204a6a0e	uma: pretty print zone flags sysctl Requested by: jeff Reviewed by: jeff, markj Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D22748	2019-12-11 06:50:55 +00:00
Jeff Roberson	1e0701e1e5	Use a variant slab structure for offpage zones. This saves space in embedded slabs but also is an opportunity to tidy up code and add accessor inlines. Reviewed by: markj, rlibby Differential Revision: https://reviews.freebsd.org/D22609	2019-12-08 01:15:06 +00:00
Jeff Roberson	9b78b1f433	Use a precise bit count for the slab free items in UMA. This significantly shrinks embedded slab structures. Reviewed by: markj, rlibby (prior version) Differential Revision: https://reviews.freebsd.org/D22584	2019-12-02 22:44:34 +00:00
Jeff Roberson	6d6a03d7a8	Handle large mallocs by going directly to kmem. Taking a detour through UMA does not provide any additional value. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D22563	2019-11-29 03:14:10 +00:00
Jeff Roberson	584061b480	Garbage collect the mostly unused us_keg field. Use appropriately named union members in vm_page.h to store the zone and slab. Remove some nearby dead code. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D22564	2019-11-28 07:49:25 +00:00
Jeff Roberson	20a4e15451	Implement a sysctl tree for uma zones to assist in debugging and provide more statistcs than are exported via the ABI stable vmstat interface. Rename uz_count to uz_bucket_size because even I was confused by the name after returning to the source years later. Reviewed by: rlibby Differential Revision: https://reviews.freebsd.org/D22554	2019-11-28 00:19:09 +00:00
Ryan Libby	ca293436d1	uma: trash memory when ctor/dtor supplied too On INVARIANTS kernels, UMA has a use-after-free detection mechanism. This mechanism previously required that all of the ctor/dtor/uminit/fini arguments to uma_zcreate() be NULL in order to function. Now, it only requires that uminit and fini be NULL; now, the trash ctor and dtor will be called in addition to any supplied ctor or dtor. Also do a little refactoring for readability of the resulting logic. This enables use-after-free detection for more zones, and will allow for simplification of some callers that worked around the previous restriction (see kern_mbuf.c). Reviewed by: jeff, markj Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D20722	2019-11-27 19:49:55 +00:00
Mark Johnston	08cfa56ea3	Extend uma_reclaim() to permit different reclamation targets. The page daemon periodically invokes uma_reclaim() to reclaim cached items from each zone when the system is under memory pressure. This is important since the size of these caches is unbounded by default. However it also results in bursts of high latency when allocating from heavily used zones as threads miss in the per-CPU caches and must access the keg in order to allocate new items. With r340405 we maintain an estimate of each zone's usage of its (per-NUMA domain) cache of full buckets. Start making use of this estimate to avoid reclaiming the entire cache when under memory pressure. In particular, introduce TRIM, DRAIN and DRAIN_CPU verbs for uma_reclaim() and uma_zone_reclaim(). When trimming, only items in excess of the estimate are reclaimed. Draining a zone reclaims all of the cached full buckets (the previous behaviour of uma_reclaim()), and may further drain the per-CPU caches in extreme cases. Now, when under memory pressure, the page daemon will trim zones rather than draining them. As a result, heavily used zones do not incur bursts of bucket cache misses following reclamation, but large, unused caches will be reclaimed as before. Reviewed by: jeff Tested by: pho (an earlier version) MFC after: 2 months Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D16667	2019-09-01 22:22:43 +00:00
Jeff Roberson	c168508655	Add two new kernel options to control memory locality on NUMA hardware. - UMA_XDOMAIN enables an additional per-cpu bucket for freed memory that was freed on a different domain from where it was allocated. This is only used for UMA_ZONE_NUMA (first-touch) zones. - UMA_FIRSTTOUCH sets the default UMA policy to be first-touch for all zones. This tries to maintain locality for kernel memory. Reviewed by: gallatin, alc, kib Tested by: pho, gallatin Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20929	2019-08-06 21:50:34 +00:00
Pedro F. Giffuni	6929b7d1ab	UMA: unsign some variables related to allocation in hash_alloc(). As a followup to r343673, unsign some variables related to allocation since the hashsize cannot be negative. This gives a bit more space to handle bigger allocations and avoid some implicit casting. While here also unsign uh_hashmask, it makes little sense to keep that signed. MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D19148	2019-02-12 04:33:05 +00:00
Gleb Smirnoff	ad66f95865	Now that there is only one way to allocate a slab, remove uz_slab method. Discussed with: jeff	2019-02-07 03:55:05 +00:00
Gleb Smirnoff	b68d692a3d	Whitespace.	2019-01-16 04:02:08 +00:00
Gleb Smirnoff	396694153f	Fix compilation failures on different arches that have vm_machdep.c not aware of counter_u64_t by including counter.h into uma_int.h. I'm not happy about this inclusion, but it fixes compilation ASAP.	2019-01-15 19:33:47 +00:00
Gleb Smirnoff	2efcc8cbca	Make uz_allocs, uz_frees and uz_fails counter(9). This removes some atomic updates and reduces amount of data protected by zone lock. During startup point these fields to EARLY_COUNTER. After startup allocate them for all early zones. Tested by: pho	2019-01-15 18:24:34 +00:00
Gleb Smirnoff	bb15d1c778	o Move zone limit from keg level up to zone level. This means that now two zones sharing a keg may have different limits. Now this is going to work: zone = uma_zcreate(); uma_zone_set_max(zone, limit); zone2 = uma_zsecond_create(zone); uma_zone_set_max(zone2, limit2); Kegs no longer have uk_maxpages field, but zones have uz_items. When set, it may be rounded up to minimum possible CPU bucket cache size. For small limits bucket cache can also be reconfigured to be smaller. Counter uz_items is updated whenever items transition from keg to a bucket cache or directly to a consumer. If zone has uz_maxitems set and it is reached, then we are going to sleep. o Since new limits don't play well with multi-keg zones, remove them. The idea of multi-keg zones was introduced exactly 10 years ago, and never have had a practical usage. In discussion with Jeff we came to a wild agreement that if we ever want to reintroduce the idea of a smart allocator that would be able to choose between two (or more) totally different backing stores, that choice should be made one level higher than UMA, e.g. in malloc(9) or in mget(), or whatever and choice should be controlled by the caller. o Sleeping code is improved to account number of sleepers and wake them one by one, to avoid thundering herd problem. o Flag UMA_ZONE_NOBUCKETCACHE removed, instead uma_zone_set_maxcache() KPI added. Having no bucket cache basically means setting maxcache to 0. o Now with many fields added and many removed (no multi-keg zones!) make sure that struct uma_zone is perfectly aligned. Reviewed by: markj, jeff Tested by: pho Differential Revision: https://reviews.freebsd.org/D17773	2019-01-15 00:02:06 +00:00
Gleb Smirnoff	3d5e3df73f	For not offpage zones the slab is placed at the end of page. Keg's uk_pgoff is calculated to guarantee that struct uma_slab is placed at pointer size alignment. Calculation of real struct uma_slab size is done in keg_ctor() and yet again in keg_large_init(), to check if we need an extra page. This calculation can actually be performed at compile time. - Add SIZEOF_UMA_SLAB macro to calculate size of struct uma_slab placed at an end of a page with alignment requirement. - Use SIZEOF_UMA_SLAB in keg_ctor() and in keg_large_init(). This is a not a functional change. - Use SIZEOF_UMA_SLAB in UMA_SLAB_SPACE definition and in keg_small_init(). This is a potential bugfix, but in reality I don't think there are any systems affected, since compiler aligns struct uma_slab anyway.	2018-11-28 19:17:27 +00:00
Mark Johnston	0f9b7bf37a	Add accounting to per-domain UMA full bucket caches. In particular, track the current size of the cache and maintain an estimate of its working set size. This will be used to decide how much to shrink various caches when the kernel attempts to reclaim pages. As a secondary effect, it makes statistics aggregation (done by, e.g., vmstat -z) cheaper since sysctl_vm_zone_stats() no longer needs to iterate over lists of cached buckets. Discussed with: alc, glebius, jeff Tested by: pho (previous version) MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D16666	2018-11-13 19:44:40 +00:00
Mark Johnston	7571e24901	Add an #include required after r339686. X-MFC with: r339686 Sponsored by: The FreeBSD Foundation	2018-10-24 16:49:16 +00:00
Mark Johnston	194a979ee9	Use a vm_domainset iterator in keg_fetch_slab(). Previously, it used a hand-rolled round-robin iterator. This meant that the minskip logic in r338507 didn't apply to UMA allocations, and also meant that we would call vm_wait() for individual domains rather than permitting an allocation from any domain with sufficient free pages. Discussed with: jeff Tested by: pho MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17420	2018-10-24 16:41:47 +00:00
Gleb Smirnoff	306abf0f35	Either "free" or "allocated" is misleading here, since an item in a bucket is free from perspective of UMA consumer, and it is allocated from perspective of keg. Discussed with: markj Approved by: re (kib)	2018-08-24 18:47:50 +00:00
Gleb Smirnoff	a307fb5b0c	Fix comment. The actual meaning of ub_cnt is the opposite.	2018-08-23 23:24:28 +00:00
Jeff Roberson	63b5557b2f	Sort uma_zone fields according to 64 byte cache line with adjacent line prefetch on 64bit architectures. Prior to this, two lines were needed for the fast path and each line may fetch an unused adjacent neighbor. - Move fields used by the fast path into a single line. - Move constants into the adjacent line which is mostly used for the spare bucket alloc 'medium path'. - Unpad the mtx which is only used by the fast path and place it in a line with rarely used data. This aligns the cachelines better and eliminates 128 bytes of wasted space. This gives a 45% improvement on a will-it-scale test on a 24 core machine. Reviewed by: mmacy	2018-06-23 08:10:09 +00:00
Justin Hibbits	12f691959f	Align UMA data to 128 byte cacheline size Suggested by: mjg	2018-06-04 15:44:17 +00:00
Mateusz Guzik	782e38aa48	uma: increase alignment to 128 bytes on amd64 Current UMA internals are not suited for efficient operation in multi-socket environments. In particular there is very common use of MAXCPU arrays and other fields which are not always properly aligned and are not local for target threads (apart from the first node of course). Turns out the existing UMA_ALIGN macro can be used to mostly work around the problem until the code get fixed. The current setting of 64 bytes runs into trouble when adjacent cache line prefetcher gets to work. An example 128-way benchmark doing a lot of malloc/frees has the following instruction samples: before: kernel`lf_advlockasync+0x43b 32940 kernel`malloc+0xe5 42380 kernel`bzero+0x19 47798 kernel`spinlock_exit+0x26 60423 kernel`0xffffffff80 78238 0x0 136947 kernel`uma_zfree_arg+0x46 159594 kernel`uma_zalloc_arg+0x672 180556 kernel`uma_zfree_arg+0x2a 459923 kernel`uma_zalloc_arg+0x5ec 489910 after: kernel`bzero+0xd 46115 kernel`lf_advlockasync+0x25f 46134 kernel`lf_advlockasync+0x38a 49078 kernel`fget_unlocked+0xd1 49942 kernel`lf_advlockasync+0x43b 55392 kernel`copyin+0x4a 56963 kernel`bzero+0x19 81983 kernel`spinlock_exit+0x26 91889 kernel`0xffffffff80 136357 0x0 239424 See the review for more details. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D15346	2018-05-11 07:04:57 +00:00

1 2 3

131 Commits