freebsd-skq

Author	SHA1	Message	Date
Sean Bruno	bf96595915	Add a new column to the output of vmstat -z to indicate the number of times the system was forced to sleep when requesting a new allocation. Expand the debugger hook, db_show_uma, to display these results as well. This has proven to be very useful in out of memory situations when it is not known why systems have become sluggish or fail in odd ways. Reviewed by: rwatson alc Approved by: scottl (mentor) peter Obtained from: Yahoo Inc.	2010-06-15 19:28:37 +00:00
Kip Macy	1a23373ceb	- enable alignment on amd64 only - only align pcpu caches and the volatile portion of uma_zone	2010-03-22 22:39:32 +00:00
Kip Macy	6b4391d786	turn 205266 in to a no-op until the problem can be properly diagnosed	2010-03-18 20:30:25 +00:00
Kip Macy	5e4bb93cca	Cache line align various structures and move volatile counters to not share a cache line with (mostly) immutable state Reviewed by: jeff@ MFC after: 7 days	2010-03-17 21:18:28 +00:00
Antoine Brodin	4e2d83fc4a	Remove trailing ";" in UMA_HASH_INSERT and UMA_HASH_REMOVE macros. MFC after: 1 month	2009-12-05 17:45:56 +00:00
Jeff Roberson	e20a199fd5	- Make the keg abstraction more complete. Permit a zone to have multiple backend kegs so it may source compatible memory from multiple backends. This is useful for cases such as NUMA or different layouts for the same memory type. - Provide a new api for adding new backend kegs to secondary zones. - Provide a new flag for adjusting the layout of zones to stagger allocations better across cache lines. Sponsored by: Nokia	2009-01-25 09:11:24 +00:00
Robert Watson	6ab3b958fc	Update stale comment on protecting UMA per-CPU caches: we now use critical sections rather than mutexes.	2007-05-09 22:53:34 +00:00
Robert Watson	af17e9a922	Wrap inlines in uma_int.h in #ifdef _KERNEL so that uma_int.h can be used from memstat_uma.c for the purposes of kvm access without lots of additional unsafe includes. MFC after: 3 days	2005-08-04 10:03:53 +00:00
Robert Watson	08ecce74bc	Improve canonicalization of copyrights. Order copyrights by order of assertion (jeff, bmilekic, rwatson). Suggested ages ago by: bde MFC after: 1 week	2005-07-16 09:51:52 +00:00
Mike Silbersack	2018f30c01	Increase the flags field for kegs from a 16 to a 32 bit value; we have exhausted all 16 flags.	2005-07-16 02:23:41 +00:00
Robert Watson	2019094a35	Track UMA(9) allocation failures by zone, and export via sysctl. Requested by: victor cruceru <victor dot cruceru at gmail dot com> MFC after: 1 week	2005-07-15 23:34:39 +00:00
Robert Watson	7a52a97eb3	Introduce a new sysctl, vm.zone_stats, which exports UMA(9) allocator statistics via a binary structure stream: - Add structure 'uma_stream_header', which defines a stream version, definition of MAXCPUs used in the stream, and the number of zone records in the stream. - Add structure 'uma_type_header', which defines the name, alignment, size, resource allocation limits, current pages allocated, preferred bucket size, and central zone + keg statistics. - Add structure 'uma_percpu_stat', which, for each per-CPU cache, includes the number of allocations and frees, as well as the number of free items in the cache. - When the sysctl is queried, return a stream header, followed by a series of type descriptions, each consisting of a type header followed by a series of MAXCPUs uma_percpu_stat structures holding per-CPU allocation information. Typical values of MAXCPU will be 1 (UP compiled kernel) and 16 (SMP compiled kernel). This query mechanism allows user space monitoring tools to extract memory allocation statistics in a machine-readable form, and to do so at a per-CPU granularity, allowing monitoring of allocation patterns across CPUs in order to better understand the distribution of work and memory flow over multiple CPUs. While here, also export the number of UMA zones as a sysctl vm.uma_count, in order to assist in sizing user swpace buffers to receive the stream. A follow-up commit of libmemstat(3), a library to monitor kernel memory allocation, will occur in the next few days. This change directly supports converting netstat(1)'s "-mb" mode to using UMA-sourced stats rather than separately maintained mbuf allocator statistics. MFC after: 1 week	2005-07-14 16:35:13 +00:00
Robert Watson	773df9ab16	In addition to tracking allocs in the zone, also track frees. Add a zone free counter, as well as a cache free counter. MFC after: 1 week	2005-07-14 16:17:21 +00:00
Alan Cox	eafc7b549a	Increase UMA_BOOT_PAGES to prevent a crash during initialization. See http://docs.FreeBSD.org/cgi/mid.cgi?42AD8270.8060906 for a detailed description of the crash. Reported by: Eric Anderson Approved by: re (scottl) MFC after: 3 days	2005-06-16 17:06:34 +00:00
Robert Watson	5d1ae027f0	Modify UMA to use critical sections to protect per-CPU caches, rather than mutexes, which offers lower overhead on both UP and SMP. When allocating from or freeing to the per-cpu cache, without INVARIANTS enabled, we now no longer perform any mutex operations, which offers a 1%-3% performance improvement in a variety of micro-benchmarks. We rely on critical sections to prevent (a) preemption resulting in reentrant access to UMA on a single CPU, and (b) migration of the thread during access. In the event we need to go back to the zone for a new bucket, we release the critical section to acquire the global zone mutex, and must re-acquire the critical section and re-evaluate which cache we are accessing in case migration has occured, or circumstances have changed in the current cache. Per-CPU cache statistics are now gathered lock-free by the sysctl, which can result in small races in statistics reporting for caches. Reviewed by: bmilekic, jeff (somewhat) Tested by: rwatson, kris, gnn, scottl, mike at sentex dot net, others	2005-04-29 18:56:36 +00:00
Bosko Milekic	8076cb5289	Well, it seems that I pre-maturely removed the "All rights reserved" statement from some files, so re-add it for the moment, until the related legalese is sorted out. This change affects: sys/kern/kern_mbuf.c sys/vm/memguard.c sys/vm/memguard.h sys/vm/uma.h sys/vm/uma_core.c sys/vm/uma_dbg.c sys/vm/uma_dbg.h sys/vm/uma_int.h	2005-02-16 21:45:59 +00:00
Warner Losh	60727d8b86	/* -> /*- for license, minor formatting changes	2005-01-07 02:29:27 +00:00
Bosko Milekic	7b8712053c	Add my copyright and update Jeff's copyright on UMA source files, as per his request. Discussed with: Jeffrey Roberson	2004-12-26 00:35:12 +00:00
Olivier Houchard	6fc96493ac	Remove useless casts.	2004-11-26 15:04:26 +00:00
Bosko Milekic	244f45548a	Rework the way slab header storage space is calculated in UMA. - zone_large_init() stays pretty much the same. - zone_small_init() will try to stash the slab header in the slab page being allocated if the amount of calculated wasted space is less than UMA_MAX_WASTE (for both the UMA_ZONE_REFCNT case and regular case). If the amount of wasted space is >= UMA_MAX_WASTE, then UMA_ZONE_OFFPAGE will be set and the slab header will be allocated separately for better use of space. - uma_startup() calculates the maximum ipers required in offpage slabs (so that the offpage slab header zone(s) can be sized accordingly). The algorithm used to calculate this replaces the old calculation (which only happened to work coincidentally). We now iterate over possible object sizes, starting from the smallest one, until we determine that wastedspace calculated in zone_small_init() might end up being greater than UMA_MAX_WASTE, at which point we use the found object size to compute the maximum possible ipers. The reason this works is because: - wastedspace versus objectsize is a see-saw function with local minima all equal to zero and local maxima growing directly proportioned to objectsize. This implies that for objects up to or equal a certain objectsize, the see-saw remains entirely below UMA_MAX_WASTE, so for those objectsizes it is impossible to ever go OFFPAGE for slab headers. - ipers (items-per-slab) versus objectsize is an inversely proportional function which falls off very quickly (very large for small objectsizes). - To determine the maximum ipers we'll ever need from OFFPAGE slab headers we first find the largest objectsize for which we are guaranteed to not go offpage for and use it to compute ipers (as though we were offpage). Since the only objectsizes allowed to go offpage are bigger than the found objectsize, and since ipers vs objectsize is inversely proportional (and monotonically decreasing), then we are guaranteed that the ipers computed is always >= what we will ever need in offpage slab headers. - Define UMA_FRITM_SZ and UMA_FRITMREF_SZ to be the actual (possibly padded) size of each freelist index so that offset calculations are fixed. This might fix weird data corruption problems and certainly allows ARM to now boot to at least single-user (via simulator). Tested on i386 UP by me. Tested on sparc64 SMP by fenner. Tested on ARM simulator to single-user by cognet.	2004-07-29 15:25:40 +00:00
Bosko Milekic	099a0e588c	Bring in mbuma to replace mballoc. mbuma is an Mbuf & Cluster allocator built on top of a number of extensions to the UMA framework, all included herein. Extensions to UMA worth noting: - Better layering between slab <-> zone caches; introduce Keg structure which splits off slab cache away from the zone structure and allows multiple zones to be stacked on top of a single Keg (single type of slab cache); perhaps we should look into defining a subset API on top of the Keg for special use by malloc(9), for example. - UMA_ZONE_REFCNT zones can now be added, and reference counters automagically allocated for them within the end of the associated slab structures. uma_find_refcnt() does a kextract to fetch the slab struct reference from the underlying page, and lookup the corresponding refcnt. mbuma things worth noting: - integrates mbuf & cluster allocations with extended UMA and provides caches for commonly-allocated items; defines several zones (two primary, one secondary) and two kegs. - change up certain code paths that always used to do: m_get() + m_clget() to instead just use m_getcl() and try to take advantage of the newly defined secondary Packet zone. - netstat(1) and systat(1) quickly hacked up to do basic stat reporting but additional stats work needs to be done once some other details within UMA have been taken care of and it becomes clearer to how stats will work within the modified framework. From the user perspective, one implication is that the NMBCLUSTERS compile-time option is no longer used. The maximum number of clusters is still capped off according to maxusers, but it can be made unlimited by setting the kern.ipc.nmbclusters boot-time tunable to zero. Work should be done to write an appropriate sysctl handler allowing dynamic tuning of kern.ipc.nmbclusters at runtime. Additional things worth noting/known issues (READ): - One report of 'ips' (ServeRAID) driver acting really slow in conjunction with mbuma. Need more data. Latest report is that ips is equally sucking with and without mbuma. - Giant leak in NFS code sometimes occurs, can't reproduce but currently analyzing; brueffer is able to reproduce but THIS IS NOT an mbuma-specific problem and currently occurs even WITHOUT mbuma. - Issues in network locking: there is at least one code path in the rip code where one or more locks are acquired and we end up in m_prepend() with M_WAITOK, which causes WITNESS to whine from within UMA. Current temporary solution: force all UMA allocations to be M_NOWAIT from within UMA for now to avoid deadlocks unless WITNESS is defined and we can determine with certainty that we're not holding any locks when we're M_WAITOK. - I've seen at least one weird socketbuffer empty-but- mbuf-still-attached panic. I don't believe this to be related to mbuma but please keep your eyes open, turn on debugging, and capture crash dumps. This change removes more code than it adds. A paper is available detailing the change and considering various performance issues, it was presented at BSDCan2004: http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf Please read the paper for Future Work and implementation details, as well as credits. Testing and Debugging: rwatson, brueffer, Ketrien I. Saihr-Kesenchedra, ... Reviewed by: Lots of people (for different parts)	2004-05-31 21:46:06 +00:00
Alan Cox	c19aa3402b	Increase UMA_BOOT_PAGES because of changes to pv entry initialization in revision 1.457 of i386/i386/pmap.c.	2004-01-18 05:51:06 +00:00
Alan Cox	925692caa5	- Significantly reduce the number of preallocated pv entries in pmap_init(). Such a large preallocation is unnecessary and wastes nearly eight megabytes of kernel virtual address space per gigabyte of managed physical memory. - Increase UMA_BOOT_PAGES by two. This enables the removal of pmap_pv_allocf(). (Note: this function was only used during initialization, specifically, after pmap_init() but before pmap_init2(). During pmap_init2(), a new allocator is installed.)	2003-12-22 01:01:32 +00:00
Jeff Roberson	9643769a3a	- Remove the working-set algorithm. Instead, use the per cpu buckets as the working set cache. This has several advantages. Firstly, we never touch the per cpu queues now in the timeout handler. This removes one more reason for having per cpu locks. Secondly, it reduces the size of the zone by 8 bytes, bringing it under 200 bytes for a single proc x86 box. This tidies up other logic as well. - The 'destroy' flag no longer needs to be passed to zone_drain() since it always frees everything in the zone's slabs. - cache_drain() is now only called from zone_dtor() and so it destroys by default. It also does not need the destroy parameter now.	2003-09-19 23:27:46 +00:00
Jeff Roberson	3e0cab95c0	- Remove the cache colorization code. We can't use it due to all of the broken consumers of the malloc interface who assume that the allocated address will be an even multiple of the size. - Remove disabled time delay code on uma_reclaim(). The comment there said it all. It was not an effective strategy and it should not be left in #if 0'd for all eternity.	2003-09-19 23:04:44 +00:00
Jeff Roberson	b60f5b794e	- Fix the silly flag situation in UMA. Remove redundant ZFLAG/ZONE flags by accepting the user supplied flags directly. Previously this was not done so that flags for the same field would not be defined in two different files. Add comments in each header instructing future developers on how now to shoot their feet. - Fix a test for !OFFPAGE which should have been a test for HASH. This would have caused a panic if we had ever destructed a malloc zone. This also opens up the possibility that other zones could use the vsetobj() method rather than a hash.	2003-09-19 08:37:44 +00:00
Jeff Roberson	cae33c1429	- Initialize a pool of bucket zones so that we waste less space on zones that don't cache as many items. - Introduce the bucket_alloc(), bucket_free() functions to wrap bucket allocation. These functions select the appropriate bucket zone to allocate from or free to. - Rename ub_ptr to ub_cnt to reflect a change in its use. ub_cnt now reflects the count of free items in the bucket. This gets rid of many unnatural subtractions by 1 throughout the code. - Add ub_entries which reflects the number of entries possibly held in a bucket.	2003-09-19 06:26:45 +00:00
Bosko Milekic	20e8e865bd	- When deciding whether to init the zone with small_init or large_init, compare the zone element size (+1 for the byte of linkage) against UMA_SLAB_SIZE - sizeof(struct uma_slab), and not just UMA_SLAB_SIZE. Add a KASSERT in zone_small_init to make sure that the computed ipers (items per slab) for the zone is not zero, despite the addition of the check, just to be sure (this part submitted by: silby) - UMA_ZONE_VM used to imply BUCKETCACHE. Now it implies CACHEONLY instead. CACHEONLY is like BUCKETCACHE in the case of bucket allocations, but in addition to that also ensures that we don't setup the zone with OFFPAGE slab headers allocated from the slabzone. This means that we're not allowed to have a UMA_ZONE_VM zone initialized for large items (zone_large_init) because it would require the slab headers to be allocated from slabzone, and hence kmem_map. Some of the zones init'd with UMA_ZONE_VM are so init'd before kmem_map is suballoc'd from kernel_map, which is why this change is necessary.	2003-08-11 19:39:45 +00:00
Jeff Roberson	f828e5bedb	- Get rid of the ill-conceived uz_cachefree member of uma_zone. - In sysctl_vm_zone use the per cpu locks to read the current cache statistics this makes them more accurate while under heavy load. Submitted by: tegge	2003-07-30 05:59:17 +00:00
Bosko Milekic	d88797c2ba	Move the pcpu lock out of the uma_cache and instead have a single set of pcpu locks. This makes uma_zone somewhat smaller (by (LOCKNAME_LEN * sizeof(char) + sizeof(struct mtx) * maxcpu) bytes, to be exact). No Objections from jeff.	2003-06-25 20:49:48 +00:00
Poul-Henning Kamp	c5d771b807	Prepend _ to internal union members to avoid ambiguity. Found by: FlexeLint	2003-05-31 19:52:15 +00:00
Jeff Roberson	48eea37508	- Add support for machine dependant page allocation routines. MD code may define UMA_MD_SMALL_ALLOC to make use of this feature. Reviewed by: peter, jake	2002-11-01 01:01:27 +00:00
Jeff Roberson	f461cf2297	- Use my freebsd email alias in the copyright. - Remove redundant instances of my email alias in the file summary.	2002-09-19 06:05:32 +00:00
Jeff Roberson	99571dc345	- Split UMA_ZFLAG_OFFPAGE into UMA_ZFLAG_OFFPAGE and UMA_ZFLAG_HASH. - Remove all instances of the mallochash. - Stash the slab pointer in the vm page's object pointer when allocating from the kmem_obj. - Use the overloaded object pointer to find slabs for malloced memory.	2002-09-18 08:26:30 +00:00
Julian Elischer	e602ba25fd	Part 1 of KSE-III The ability to schedule multiple threads per process (one one cpu) by making ALL system calls optionally asynchronous. to come: ia64 and power-pc patches, patches for gdb, test program (in tools) Reviewed by: Almost everyone who counts (at various times, peter, jhb, matt, alfred, mini, bernd, and a cast of thousands) NOTE: this is still Beta code, and contains lots of debugging stuff. expect slight instability in signals..	2002-06-29 17:26:22 +00:00
Jeff Roberson	18aa2de5a7	- Introduce the new M_NOVM option which tells uma to only check the currently allocated slabs and bucket caches for free items. It will not go ask the vm for pages. This differs from M_NOWAIT in that it not only doesn't block, it doesn't even ask. - Add a new zcreate option ZONE_VM, that sets the BUCKETCACHE zflag. This tells uma that it should only allocate buckets out of the bucket cache, and not from the VM. It does this by using the M_NOVM option to zalloc when getting a new bucket. This is so that the VM doesn't recursively enter itself while trying to allocate buckets for vm_map_entry zones. If there are already allocated buckets when we get here we'll still use them but otherwise we'll skip it. - Use the ZONE_VM flag on vm map entries and pv entries on x86.	2002-06-17 22:02:41 +00:00
Jeff Roberson	28bc44195c	Add a new zone flag UMA_ZONE_MTXCLASS. This puts the zone in it's own mutex class. Currently this is only used for kmapentzone because kmapents are are potentially allocated when freeing memory. This is not dangerous though because no other allocations will be done while holding the kmapentzone lock.	2002-04-29 23:45:41 +00:00
Jeff Roberson	af7f9b97b6	Fix the calculation that determines uz_maxpages. It was off for large zones. Fortunately we have no large zones with maximums specified yet, so it wasn't breaking anything. Implement blocking when a zone exceeds the maximum and M_WAITOK is specified. Previously this just failed like the old zone allocator did. The old zone allocator didn't support WAITOK/NOWAIT though so we should do what we advertise. While I was in there I cleaned up some more zalloc logic to further simplify that code path and reduce redundant code. This was needed to make the blocking work properly anyway.	2002-04-14 01:56:25 +00:00
Jeff Roberson	1d4cb54ba8	Quiet witness warnings about acquiring several zone locks. In the case that this happens it is OK.	2002-04-08 21:08:17 +00:00
Jeff Roberson	a553d4b8eb	Rework most of the bucket allocation and free code so that per cpu locks are never held across blocking operations. Also, fix two other lock order reversals that were exposed by jhb's witness change. The free path previously had a bug that would cause it to skip the free bucket list in some cases and go straight to allocating a new bucket. This has been fixed as well. These changes made the bucket handling code much cleaner and removed quite a few lock operations. This should be marginally faster now. It is now possible to call malloc w/o Giant and avoid any witness warnings. This still isn't entirely safe though because malloc_type statistics are not protected by any lock.	2002-04-08 02:42:55 +00:00
Jeff Roberson	c235bfa551	Spelling correction; s/seperate/separate/g Submitted by: eric	2002-04-07 22:56:48 +00:00
John Baldwin	6008862bc2	Change callers of mtx_init() to pass in an appropriate lock type name. In most cases NULL is passed, but in some cases such as network driver locks (which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used. Tested on: i386, alpha, sparc64	2002-04-04 21:03:38 +00:00
Jeff Roberson	f22a4b62f5	Add a new mtx_init option "MTX_DUPOK" which allows duplicate acquires of locks with this flag. Remove the dup_list and dup_ok code from subr_witness. Now we just check for the flag instead of doing string compares. Also, switch the process lock, process group lock, and uma per cpu locks over to this interface. The original mechanism did not work well for uma because per cpu lock names are unique to each zone. Approved by: jhb	2002-03-27 09:23:41 +00:00
Jeff Roberson	8355f576a9	This is the first part of the new kernel memory allocator. This replaces malloc(9) and vm_zone with a slab like allocator. Reviewed by: arch@	2002-03-19 09:11:49 +00:00

44 Commits