freebsd-nq

Author	SHA1	Message	Date
Eric van Gyzen	a2e194654f	memstat_kvm_uma: fix reading of uma_zone_domain structures Coverity flagged the scaling by sizeof(uzd). That is the type of the pointer, so the scaling was already done by pointer arithmetic. However, this was also passing a stack frame pointer to kvm_read, so it was doubly wrong. Move ZDOM_GET into the !_KERNEL section and use it in libmemstat. Reported by: Coverity Reviewed by: markj MFC after: 2 weeks Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D26213	2020-08-28 19:50:40 +00:00
Jeff Roberson	c6fd3e23f7	Use per-domain locks for the bucket cache. This gives much better concurrency when there are a large number of cores per-domain and multiple domains. Avoid taking the lock entirely if it will not be productive. ROUNDROBIN domains will have mixed memory in each domain and will load balance to all domains. While here refactor the zone/domain separation and bucket limits to simplify callers. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D23673	2020-02-19 18:48:46 +00:00
Mark Johnston	25aa4a3c07	libmemstat: Catch up with r357776. Reported by: O. Hartmann <ohartmann@walstatt.org>	2020-02-11 20:15:49 +00:00
Jeff Roberson	7f38506ff9	Fix libmemstat_uma build after r357485. Submitted by: cy	2020-02-04 05:27:45 +00:00
Jeff Roberson	8b987a7769	Use per-domain keg locks. This provides both a lock and separate space accounting for each NUMA domain. Independent keg domain locks are important with cross-domain frees. Hashed zones are non-numa and use a single keg lock to protect the hash table. Reviewed by: markj, rlibby Differential Revision: https://reviews.freebsd.org/D22829	2020-01-04 03:30:08 +00:00
Jeff Roberson	376b1ba394	Optimize fast path allocations by storing bucket headers in the per-cpu cache area. This allows us to check on bucket space for all per-cpu buckets with a single cacheline access and fewer branches. Reviewed by: markj, rlibby Differential Revision: https://reviews.freebsd.org/D22825	2019-12-25 20:50:53 +00:00
Ryan Libby	d82c8ffb16	Revert r355706 & r355710 The quick fix didn't work. I'll sort it out tomorrow. Revert r355710: "libmemstat: unbreak build" Revert r355706: "uma dbg: flexible size for slab debug bitset too"	2019-12-13 11:21:28 +00:00
Ryan Libby	80ee0f4a6b	libmemstat: unbreak build r355706 added an instance of offsetof() to the UMA private kernel header file uma_int.h. Userspace memstat_uma.c includes that header, and chokes on offsetof() because apparently the definition in sys/types.h is ifdef _KERNEL. Now, include sys/stddef.h which has an identical definition. Pointyhat to: rlibby Sponsored by: Dell EMC Isilon	2019-12-13 10:34:19 +00:00
Mark Johnston	08cfa56ea3	Extend uma_reclaim() to permit different reclamation targets. The page daemon periodically invokes uma_reclaim() to reclaim cached items from each zone when the system is under memory pressure. This is important since the size of these caches is unbounded by default. However it also results in bursts of high latency when allocating from heavily used zones as threads miss in the per-CPU caches and must access the keg in order to allocate new items. With r340405 we maintain an estimate of each zone's usage of its (per-NUMA domain) cache of full buckets. Start making use of this estimate to avoid reclaiming the entire cache when under memory pressure. In particular, introduce TRIM, DRAIN and DRAIN_CPU verbs for uma_reclaim() and uma_zone_reclaim(). When trimming, only items in excess of the estimate are reclaimed. Draining a zone reclaims all of the cached full buckets (the previous behaviour of uma_reclaim()), and may further drain the per-CPU caches in extreme cases. Now, when under memory pressure, the page daemon will trim zones rather than draining them. As a result, heavily used zones do not incur bursts of bucket cache misses following reclamation, but large, unused caches will be reclaimed as before. Reviewed by: jeff Tested by: pho (an earlier version) MFC after: 2 months Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D16667	2019-09-01 22:22:43 +00:00
Jeff Roberson	c168508655	Add two new kernel options to control memory locality on NUMA hardware. - UMA_XDOMAIN enables an additional per-cpu bucket for freed memory that was freed on a different domain from where it was allocated. This is only used for UMA_ZONE_NUMA (first-touch) zones. - UMA_FIRSTTOUCH sets the default UMA policy to be first-touch for all zones. This tries to maintain locality for kernel memory. Reviewed by: gallatin, alc, kib Tested by: pho, gallatin Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20929	2019-08-06 21:50:34 +00:00
Gleb Smirnoff	bec2d7e9a2	The KVM code also needs a fix similar to r344269. Reported by: pho	2019-05-29 03:14:46 +00:00
Gleb Smirnoff	aa43298309	With r343051 UMA switched from atomic counts to counter(9) and now kernel reports snap counts of how much a zone alloced and how much it freed. It may happen that snap values doesn't match, e.g alloced - freed < 0. Workaround that in memstat library. Reported by: pho	2019-02-18 21:27:13 +00:00
Gleb Smirnoff	cf64f5197b	This was missed in r343051: make uz_allocs, uz_frees and uz_fails counter(9).	2019-01-15 18:47:19 +00:00
Gleb Smirnoff	bb15d1c778	o Move zone limit from keg level up to zone level. This means that now two zones sharing a keg may have different limits. Now this is going to work: zone = uma_zcreate(); uma_zone_set_max(zone, limit); zone2 = uma_zsecond_create(zone); uma_zone_set_max(zone2, limit2); Kegs no longer have uk_maxpages field, but zones have uz_items. When set, it may be rounded up to minimum possible CPU bucket cache size. For small limits bucket cache can also be reconfigured to be smaller. Counter uz_items is updated whenever items transition from keg to a bucket cache or directly to a consumer. If zone has uz_maxitems set and it is reached, then we are going to sleep. o Since new limits don't play well with multi-keg zones, remove them. The idea of multi-keg zones was introduced exactly 10 years ago, and never have had a practical usage. In discussion with Jeff we came to a wild agreement that if we ever want to reintroduce the idea of a smart allocator that would be able to choose between two (or more) totally different backing stores, that choice should be made one level higher than UMA, e.g. in malloc(9) or in mget(), or whatever and choice should be controlled by the caller. o Sleeping code is improved to account number of sleepers and wake them one by one, to avoid thundering herd problem. o Flag UMA_ZONE_NOBUCKETCACHE removed, instead uma_zone_set_maxcache() KPI added. Having no bucket cache basically means setting maxcache to 0. o Now with many fields added and many removed (no multi-keg zones!) make sure that struct uma_zone is perfectly aligned. Reviewed by: markj, jeff Tested by: pho Differential Revision: https://reviews.freebsd.org/D17773	2019-01-15 00:02:06 +00:00
Jeff Roberson	ab3185d15e	Implement NUMA support in uma(9) and malloc(9). Allocations from specific domains can be done by the _domain() API variants. UMA also supports a first-touch policy via the NUMA zone flag. The slab layer is now segregated by VM domains and is precise. It handles iteration for round-robin directly. The per-cpu cache layer remains a mix of domains according to where memory is allocated and freed. Well behaved clients can achieve perfect locality with no performance penalty. The direct domain allocation functions have to visit the slab layer and so require per-zone locks which come at some expense. Reviewed by: Attilio (a slightly older version) Tested by: pho Sponsored by: Netflix, Dell/EMC Isilon	2018-01-12 23:25:05 +00:00
Pedro F. Giffuni	5e53a4f90f	lib: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 2-Clause license, however the tool I was using mis-identified many licenses so this was mostly a manual - error prone - task. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts.	2017-11-26 02:00:33 +00:00
Justin Hibbits	4026b44790	Fix buildworld for powerpc. vmpage requires struct pmap to exist and contain a pm_stats field. As of r308817, either AIM or BOOKE is required to be set in order to get their respective pmap structs. Rather than expose them both, or try to unify them unnecessarily, add a third option which contains only a pm_stats field, and change the two existing pmap structures to place the common fields at the beginning of the struct. This actually fixes the stats collection by libkvm on AIM hardware, because before it was accessing a possibly different offset, which would cause it to read garbage. Bump __FreeBSD_version to denote this ABI change, so that ports which depend on libkvm can be rebuilt.	2016-11-20 06:10:12 +00:00
Gleb Smirnoff	b28cc462ad	Include sys/_task.h into uma_int.h, so that taskqueue.h isn't a requirement for uma_int.h. Suggested by: jhb	2016-02-09 20:22:35 +00:00
Gleb Smirnoff	9508a0e1fe	Fix build.	2016-02-04 00:23:21 +00:00
Gleb Smirnoff	345e3f4dd7	Expose real size of UMA allocations via libmemstat(3). Sponsored by: Nginx, Inc.	2014-02-10 20:09:10 +00:00
Jeff Roberson	fc03d22b17	Refine UMA bucket allocation to reduce space consumption and improve performance. - Always free to the alloc bucket if there is space. This gives LIFO allocation order to improve hot-cache performance. This also allows for zones with a single bucket per-cpu rather than a pair if the entire working set fits in one bucket. - Enable per-cpu caches of buckets. To prevent recursive bucket allocation one bucket zone still has per-cpu caches disabled. - Pick the initial bucket size based on a table driven maximum size per-bucket rather than the number of items per-page. This gives more sane initial sizes. - Only grow the bucket size when we face contention on the zone lock, this causes bucket sizes to grow more slowly. - Adjust the number of items per-bucket to account for the header space. This packs the buckets more efficiently per-page while making them not quite powers of two. - Eliminate the per-zone free bucket list. Always return buckets back to the bucket zone. This ensures that as zones grow into larger bucket sizes they eventually discard the smaller sizes. It persists fewer buckets in the system. The locking is slightly trickier. - Only switch buckets in zalloc, not zfree, this eliminates pathological cases where we ping-pong between two buckets. - Ensure that the thread that fills a new bucket gets to allocate from it to give a better upper bound on allocation time. Sponsored by: EMC / Isilon Storage Division	2013-06-18 04:50:20 +00:00
Matthew D Fleming	bb196eb480	Const-ify the zone name argument to uma_zcreate(9). MFC after: 3 days	2012-10-26 17:51:05 +00:00
Sergey Kandaurov	cfc9e655ba	Cosmetic cleanup: remove #define LIBMEMSTAT used to prevent a nested include of opt_vmpage.h from vm/vm_page.h. opt_vmpage.h was retired before 7.0 together with options PQ_NOOPT. Approved by: re (kib) MFC after: 3 days	2011-09-02 14:10:42 +00:00
Sergey Kandaurov	1882360b9b	Get rid of MAXCPU knowledge used for internal needs only. Switch to dynamic memory allocation to hold per-CPU memory types data (sized to mp_maxid for UMA, and to mp_maxcpus for malloc to match the kernel). That fixes libmemstat with arbitrary large MAXCPU values and therefore eliminates MEMSTAT_ERROR_TOOMANYCPUS error type. Reviewed by: jhb Approved by: re (kib)	2011-08-01 09:43:35 +00:00
Attilio Rao	1de471dfee	Revert r222363, as bde@ pointed out the initial solution was far more correct.	2011-05-31 20:59:53 +00:00
Attilio Rao	d361ed4b1c	Style fix: cast to size_t rather than u_long when comparing to sizeof() rets. Requested by: kib	2011-05-27 16:01:51 +00:00
Attilio Rao	be720a4061	Fix a mismerge.	2011-05-08 14:45:53 +00:00
Attilio Rao	34e4a6f408	Revert MAXCPU introduction. In userland it is always 1. Noted by: marcel	2011-05-08 14:29:25 +00:00
Attilio Rao	71a19bdc64	Commit the support for removing cpumask_t and replacing it directly with cpuset_t objects. That is going to offer the underlying support for a simple bump of MAXCPU and then support for number of cpus > 32 (as it is today). Right now, cpumask_t is an int, 32 bits on all our supported architecture. cpumask_t on the other side is implemented as an array of longs, and easilly extendible by definition. The architectures touched by this commit are the following: - amd64 - i386 - pc98 - arm - ia64 - XEN while the others are still missing. Userland is believed to be fully converted with the changes contained here. Some technical notes: - This commit may be considered an ABI nop for all the architectures different from amd64 and ia64 (and sparc64 in the future) - per-cpu members, which are now converted to cpuset_t, needs to be accessed avoiding migration, because the size of cpuset_t should be considered unknown - size of cpuset_t objects is different from kernel and userland (this is primirally done in order to leave some more space in userland to cope with KBI extensions). If you need to access kernel cpuset_t from the userland please refer to example in this patch on how to do that correctly (kgdb may be a good source, for example). - Support for other architectures is going to be added soon - Only MAXCPU for amd64 is bumped now The patch has been tested by sbruno and Nicholas Esborn on opteron 4 x 12 pack CPUs. More testing on big SMP is expected to came soon. pluknet tested the patch with his 8-ways on both amd64 and i386. Tested by: pluknet, sbruno, gianni, Nicholas Esborn Reviewed by: jeff, jhb, sbruno	2011-05-05 14:39:14 +00:00
Attilio Rao	1d221389b2	Remove the redefinition of MEMSTAT_MAXCPU and just use MAXCPU for that. Reviewed by: sbruno	2011-05-02 17:13:40 +00:00
Sean Bruno	bf96595915	Add a new column to the output of vmstat -z to indicate the number of times the system was forced to sleep when requesting a new allocation. Expand the debugger hook, db_show_uma, to display these results as well. This has proven to be very useful in out of memory situations when it is not known why systems have become sluggish or fail in odd ways. Reviewed by: rwatson alc Approved by: scottl (mentor) peter Obtained from: Yahoo Inc.	2010-06-15 19:28:37 +00:00
Robert Watson	10b037c1d9	Update copyright for 2006. MFC after: 3 days	2006-02-11 19:21:39 +00:00
Robert Watson	1d90b80f28	The uma_zone data structure defines the size of its uz_cpu[] array as 1, but then sizes the containing data structure at run-time to make room for per-cpu cache data. Modify libmemstat to separately allocate a buffer to hold per-cpu cache data, sized based on the run-time mp_maxid variable when using libkvm to access UMA data. This avoids reading invalid cache data from beyond the end of the uma_zone data structure on the stack, which can result in invalid statistics and/or reads from invalid kernel addresses. Foot target practice by: ps MFC after: 3 days	2006-02-11 19:19:29 +00:00
Robert Watson	59e012a852	When reporting an error reading from UMA per-cpu cache pointers using KVM, return a KVM error rather than an out of memory error, so that the caller reports the KVM error state. This replaces a misleading error message with a more accurate although equally confusing one. MFC after: 3 days	2006-02-11 18:55:03 +00:00
Robert Watson	3f374960e6	Read all_cpus variable out of kmem, and validate CPUs against the all_cpus cpu mask before looking at the cache entries for the CPU. For systems with sparse CPU id arrays, this skips otherwise uninitialized cache structures. MFC after: 3 days	2006-02-11 18:44:37 +00:00
Robert Watson	ee4be9485c	Correct a typo in the extraction of zone information from UMA using kmem: bytes = allocated - freed, not bytes = allocated = freed. MFC after: 3 days	2006-02-11 16:54:00 +00:00
Robert Watson	c21f7757d2	Remove unnecessary and undesirable 'static' from function-local keg list, which could cause problems for multi-threaded applications using libmemstat to monitor UMA in more than one thread simultaneously. MFC after: 3 days	2006-01-16 00:37:20 +00:00
Robert Watson	2286854ff0	Define LIBMEMSTAT so that vm_page.h won't perform a nested include of opt_vmpage.h. Remove definition of _KERNEL, it is no longer required in order to include uma_int.h, as the sensitive parts of uma_int.h (a number of inlines depending on kernel-only constants) are now protected by _KERNEL.	2005-08-04 10:06:39 +00:00
Robert Watson	33c20d188c	Add memstat_kvm_uma(), an implementation of a libmemstat(3) query routine that knows how to extract UMA(9) allocator statistics from a core dump or live memory image using kvm(3). The caller is expected to provide the necessary kvm_t handle, which is then used by libmemstat(3). With these changes, it is trivially straight forward to re-introduce vmstat -z support on core dumps, which was lost when UMA was introduced. In the short term, this requires including vm/ include files that are not intended for extra-kernel use, requiring in turn some ugliness.	2005-08-01 19:07:39 +00:00
Robert Watson	22247a2a38	Correct two libmemstat(3) bugs: - Move memory_type_list flushing logic from memstat_mtl_free() to _memstat_mtl_empty(), a libmemstat-internal function that can be called from other parts of the library. Invoke _memstat_mtl_empty() from memstat_mtl_free(), which also frees the containing list structure. Invoke _memstat_mtl_empty() instead of memstat_mtl_free() in various error cases in memstat_malloc.c and memstat_uma.c, which previously resulted in the list being freed prematurely. - Reverse the order of updating the mt_kegfree and mt_free fields of the memory_type in memstat_uma.c, otherwise keg free items won't be counted properly for non-secondary zones. MFC after: 3 days	2005-08-01 13:18:21 +00:00
Robert Watson	7f6e27372b	If a retrieved UMA zone is a secondary zone, don't report keg free items, as they actually belong to the primary zone, and maye otherwise be reported more than once. MFC after: 1 day	2005-07-25 09:52:59 +00:00
Robert Watson	345628080d	Introduce more formal error handling for libmemstat(3): - Define a set of libmemstat(3) error constants, which are used by all libmemstat(3) methods except for memstat_mtl_alloc(), which allocates a memory type list and may return ENOMEM via errno. - Define a per-memory_type_list current error value, which is set when a call associated with a memory list fails. This requires wrapping a structure around the queue(9) list head data structure, but this change is not visible to libmemstat(3) consumers due to using access methods. - Add a new accessor method, memstat_mtl_geterror() to retrieve the error number. - Consistently set the error number in a number of failure modes where previously some combination of setting errno and printf'ing error descriptions was used. libmemstat(3) will now no longer print to stdio under any circumstances. Returns of NULL/-1 for errors remain the same. This avoids use of stdio, misuse of error numbers, and should make it easier to program a libmemstat(3) consumer able to print useful error messages. Currently, no error-to-string function is provided, as I'm unsure how to address internationalization concerns. MFC after: 1 day	2005-07-24 01:28:54 +00:00
Robert Watson	ddefbc898a	Prefix two non-static libmemstat(3) internal functions with '_' symbols, to try and discourage use outside the library. Remove duplicate declaration of memstat_mtl_free() from memstat_internal.h, as it's not internal, and the memstat.h definition suffices.	2005-07-23 21:17:15 +00:00
Robert Watson	ca108fe268	UMA supports "secondary" zones, in which a second zone can be layered on top of a primary zone, sharing the same allocation "keg". When reporting statistics for zones, do not report the free items in the keg as part of the free items in the zone, or those free items will be reported more than once: for the primary zone, and then any secondary zones off the primary zone. Separately record and maintain a kegfree statistic, and export via memstat_get_kegfree(), which is available for use if needed. Since items free'd back to the keg are not fully initialized, and hence may not actually be available (since secondary zone ctor-time initialization can fail), this makes some amount of sense. This change corrects a bug made visible in the libmemstat(3) modifications to netstat: mbufs freed back to the keg from the packet zone would be counted twice, resulting in negative values being printed in the mbuf free count. Some further refinement of reporting relating to secondary zones may still be required. Reported by: ssouhlal MFC after: 3 days	2005-07-20 09:17:40 +00:00
Robert Watson	d144359bde	Teach libmemstat(3) about UMA(9) failure statistics. Requested by: victor cruceru <victor dot cruceru at gmail dot com> MFC after: 1 week	2005-07-15 23:39:21 +00:00
Robert Watson	3ab4da680f	Re-spell wronge less wrongly as wrong. Submitted by: jkoshy MFC after: 1 week	2005-07-15 10:13:50 +00:00
Robert Watson	37b40e499e	Properly combine per-CPU UMA cache allocation and free counts with the global counters maintained in the zone. MFC after: 1 week	2005-07-14 20:01:04 +00:00
Robert Watson	0cddce4989	Add libmemstat(3), a library for use by debugging and monitoring applications in tracking kernel memory statistics. It provides an abstracted interface to uma(9) and malloc(9) statistics, wrapped around the recently added binary stream sysctls for the allocators. Using this interface, it is easy to build monitoring tools, query specific memory types for usage information, etc. Facilities are provided for binding caller-provided data to memory types, incremental updates of memory types, and queries that span multiple allocators. Support for additional allocators is (relatively) easy to add. The API for libmemstat(3) will probably change some over time as consumers are written, and requirements evolve. It is written to avoid encoding ABIs for data structure layout into consuming applications for this reason. MFC after: 1 week	2005-07-14 17:40:02 +00:00

48 Commits