freebsd-skq

Author	SHA1	Message	Date
rwatson	83343a94ec	Introduce a new sysctl, vm.zone_stats, which exports UMA(9) allocator statistics via a binary structure stream: - Add structure 'uma_stream_header', which defines a stream version, definition of MAXCPUs used in the stream, and the number of zone records in the stream. - Add structure 'uma_type_header', which defines the name, alignment, size, resource allocation limits, current pages allocated, preferred bucket size, and central zone + keg statistics. - Add structure 'uma_percpu_stat', which, for each per-CPU cache, includes the number of allocations and frees, as well as the number of free items in the cache. - When the sysctl is queried, return a stream header, followed by a series of type descriptions, each consisting of a type header followed by a series of MAXCPUs uma_percpu_stat structures holding per-CPU allocation information. Typical values of MAXCPU will be 1 (UP compiled kernel) and 16 (SMP compiled kernel). This query mechanism allows user space monitoring tools to extract memory allocation statistics in a machine-readable form, and to do so at a per-CPU granularity, allowing monitoring of allocation patterns across CPUs in order to better understand the distribution of work and memory flow over multiple CPUs. While here, also export the number of UMA zones as a sysctl vm.uma_count, in order to assist in sizing user swpace buffers to receive the stream. A follow-up commit of libmemstat(3), a library to monitor kernel memory allocation, will occur in the next few days. This change directly supports converting netstat(1)'s "-mb" mode to using UMA-sourced stats rather than separately maintained mbuf allocator statistics. MFC after: 1 week	2005-07-14 16:35:13 +00:00
rwatson	c24543fa50	In addition to tracking allocs in the zone, also track frees. Add a zone free counter, as well as a cache free counter. MFC after: 1 week	2005-07-14 16:17:21 +00:00
rwatson	3f3682a4b8	In an earlier world order, UMA would flush per-CPU statistics to the zone whenever it was moving buckets between the zone and the cache, or when coalescing statistics across the CPU. Remove flushing of statistics to the zone when coalescing statistics as part of sysctl, as we won't be running on the right CPU to write to the cache statistics. Add a missed gathering of statistics: when uma_zalloc_internal() does a special case allocation of a single item, make sure to update the zone statistics to represent this. Previously this case wasn't accounted for in user-visible statistics. MFC after: 1 week	2005-07-14 16:13:46 +00:00
rwatson	bb1e0b257a	Modify UMA to use critical sections to protect per-CPU caches, rather than mutexes, which offers lower overhead on both UP and SMP. When allocating from or freeing to the per-cpu cache, without INVARIANTS enabled, we now no longer perform any mutex operations, which offers a 1%-3% performance improvement in a variety of micro-benchmarks. We rely on critical sections to prevent (a) preemption resulting in reentrant access to UMA on a single CPU, and (b) migration of the thread during access. In the event we need to go back to the zone for a new bucket, we release the critical section to acquire the global zone mutex, and must re-acquire the critical section and re-evaluate which cache we are accessing in case migration has occured, or circumstances have changed in the current cache. Per-CPU cache statistics are now gathered lock-free by the sysctl, which can result in small races in statistics reporting for caches. Reviewed by: bmilekic, jeff (somewhat) Tested by: rwatson, kris, gnn, scottl, mike at sentex dot net, others	2005-04-29 18:56:36 +00:00
alc	2b424cf256	Revert the first part of revision 1.114 and modify the second part. On architectures implementing uma_small_alloc() pages do not necessarily belong to the kmem object.	2005-02-24 06:13:01 +00:00
bmilekic	f9dded75d0	Well, it seems that I pre-maturely removed the "All rights reserved" statement from some files, so re-add it for the moment, until the related legalese is sorted out. This change affects: sys/kern/kern_mbuf.c sys/vm/memguard.c sys/vm/memguard.h sys/vm/uma.h sys/vm/uma_core.c sys/vm/uma_dbg.c sys/vm/uma_dbg.h sys/vm/uma_int.h	2005-02-16 21:45:59 +00:00
bmilekic	8fa4f6f9a4	Make UMA set the overloaded page->object back to kmem_object for UMA_ZONE_REFCNT and UMA_ZONE_MALLOC zones, as the page(s) undoubtedly came from kmem_map for those two. Previously it would set it back to NULL for UMA_ZONE_REFCNT zones and although this was probably not fatal, it added MORE code for no reason.	2005-02-16 20:06:11 +00:00
bmilekic	bc2ae8f1d2	While we want the recursion protection for the bucket zones so that recursion from the VM is handled (and the calling code that allocates buckets knows how to deal with it), we do not want to prevent allocation from the slab header zones (slabzone and slabrefzone) if uk_recurse is not zero for them. The reason is that it could lead to NULL being returned for the slab header allocations even in the M_WAITOK case, and the caller can't handle that (this is also explained in a comment with this commit). The problem analysis is documented in our mailing lists: http://docs.freebsd.org/cgi/getmsg.cgi?fetch=153445+0+archive/2004/freebsd-current/20041231.freebsd-current (see entire thread for proper context). Crash dump data provided by: Peter Holm <peter@holm.cc>	2005-01-11 03:33:09 +00:00
stefanf	bc3ec4dbb0	ISO C requires at least one element in an initialiser list.	2005-01-10 20:30:04 +00:00
imp	f0bf889d0d	/* -> /*- for license, minor formatting changes	2005-01-07 02:29:27 +00:00
bmilekic	764e80eed7	Add my copyright and update Jeff's copyright on UMA source files, as per his request. Discussed with: Jeffrey Roberson	2004-12-26 00:35:12 +00:00
rwatson	2b775a8633	Abstract the logic to look up the uma_bucket_zone given a desired number of entries into bucket_zone_lookup(), which helps make more clear the logic of consumers of bucket zones. Annotate the behavior of bucket_init() with a comment indicating how the various data structures, including the bucket lookup tables, are initialized.	2004-11-06 11:43:30 +00:00
rwatson	69064711c1	Annotate what bucket_size[] array does; staticize since it's used only in uma_core.c.	2004-11-06 11:24:40 +00:00
bmilekic	13ebdd218a	Fix a INVARIANTS-only bug introduced in Revision 1.104: IF INVARIANTS is defined, and in the rare case that we have allocated some objects from the slab and at least one initializer on at least one of those objects failed, and we need to fail the allocation and push the uninitialized items back into the slab caches -- in that scenario, we would fail to [re]set the bucket cache's ub_bucket item references to NULL, which would eventually trigger a KASSERT.	2004-10-27 21:19:35 +00:00
green	76d153d5ca	In the previous revision, I did not intend to change the default value of "nosleepwithlocks." Submitted by: ru	2004-10-09 18:51:32 +00:00
green	9128ff1ce9	Fix critical stability problems that can cause UMA mbuf cluster state management corruption, mbuf leaks, general mbuf corruption, and at least on i386 a first level splash damage radius that encompasses up to about half a megabyte of the memory after an mbuf cluster's allocation slab. In short, this has caused instability nightmares anywhere the right kind of network traffic is present. When the polymorphic refcount slabs were added to UMA, the new types were not used pervasively. In particular, the slab management structure was turned into one for refcounts, and one for non-refcounts (supposed to be mostly like the old slab management structure), but the latter was almost always used through out. In general, every access to zones with UMA_ZONE_REFCNT turned on corrupted the "next free" slab offset offset and the refcount with each other and with other allocations (on i386, 2 mbuf clusters per 4096 byte slab). Fix things so that the right type is used to access refcounted zones where it was not before. There are additional errors in gross overestimation of padding, it seems, that would cause a large kegs (nee zones) to be allocated when small ones would do. Unless I have analyzed this incorrectly, it is not directly harmful.	2004-10-08 20:19:29 +00:00
rwatson	b9c212d7f8	Generate KTR trace records for uma_zalloc_arg() and uma_zfree_arg(). This doesn't trace every event of interest in UMA, but provides enough basic information to explain lock traces and sleep patterns.	2004-08-06 21:52:38 +00:00
green	9532ab7116	* Add a "how" argument to uma_zone constructors and initialization functions so that they know whether the allocation is supposed to be able to sleep or not. * Allow uma_zone constructors and initialation functions to return either success or error. Almost all of the ones in the tree currently return success unconditionally, but mbuf is a notable exception: the packet zone constructor wants to be able to fail if it cannot suballocate an mbuf cluster, and the mbuf allocators want to be able to fail in general in a MAC kernel if the MAC mbuf initializer fails. This fixes the panics people are seeing when they run out of memory for mbuf clusters. * Allow debug.nosleepwithlocks on WITNESS to be disabled, without changing the default. Both bmilekic and jeff have reviewed the changes made to make failable zone allocations work.	2004-08-02 00:18:36 +00:00
bmilekic	c998cc2027	Rework the way slab header storage space is calculated in UMA. - zone_large_init() stays pretty much the same. - zone_small_init() will try to stash the slab header in the slab page being allocated if the amount of calculated wasted space is less than UMA_MAX_WASTE (for both the UMA_ZONE_REFCNT case and regular case). If the amount of wasted space is >= UMA_MAX_WASTE, then UMA_ZONE_OFFPAGE will be set and the slab header will be allocated separately for better use of space. - uma_startup() calculates the maximum ipers required in offpage slabs (so that the offpage slab header zone(s) can be sized accordingly). The algorithm used to calculate this replaces the old calculation (which only happened to work coincidentally). We now iterate over possible object sizes, starting from the smallest one, until we determine that wastedspace calculated in zone_small_init() might end up being greater than UMA_MAX_WASTE, at which point we use the found object size to compute the maximum possible ipers. The reason this works is because: - wastedspace versus objectsize is a see-saw function with local minima all equal to zero and local maxima growing directly proportioned to objectsize. This implies that for objects up to or equal a certain objectsize, the see-saw remains entirely below UMA_MAX_WASTE, so for those objectsizes it is impossible to ever go OFFPAGE for slab headers. - ipers (items-per-slab) versus objectsize is an inversely proportional function which falls off very quickly (very large for small objectsizes). - To determine the maximum ipers we'll ever need from OFFPAGE slab headers we first find the largest objectsize for which we are guaranteed to not go offpage for and use it to compute ipers (as though we were offpage). Since the only objectsizes allowed to go offpage are bigger than the found objectsize, and since ipers vs objectsize is inversely proportional (and monotonically decreasing), then we are guaranteed that the ipers computed is always >= what we will ever need in offpage slab headers. - Define UMA_FRITM_SZ and UMA_FRITMREF_SZ to be the actual (possibly padded) size of each freelist index so that offset calculations are fixed. This might fix weird data corruption problems and certainly allows ARM to now boot to at least single-user (via simulator). Tested on i386 UP by me. Tested on sparc64 SMP by fenner. Tested on ARM simulator to single-user by cognet.	2004-07-29 15:25:40 +00:00
alc	c7df7afd46	- Change uma_zone_set_obj() to call kmem_alloc_nofault() instead of kmem_alloc_pageable(). The difference between these is that an errant memory access to the zone will be detected sooner with kmem_alloc_nofault(). The following changes serve to eliminate the following lock-order reversal reported by witness: 1st 0xc1a3c084 vm object (vm object) @ vm/swap_pager.c:1311 2nd 0xc07acb00 swap_pager swhash (swap_pager swhash) @ vm/swap_pager.c:1797 3rd 0xc1804bdc vm object (vm object) @ vm/uma_core.c:931 There is no potential deadlock in this case. However, witness is unable to recognize this because vm objects used by UMA have the same type as ordinary vm objects. To remedy this, we make the following changes: - Add a mutex type argument to VM_OBJECT_LOCK_INIT(). - Use the mutex type argument to assign distinct types to special vm objects such as the kernel object, kmem object, and UMA objects. - Define a static swap zone object for use by UMA. (Only static objects are assigned a special mutex type.)	2004-07-22 19:44:49 +00:00
green	3b66ac9138	Since breakage of malloc(9)/uma_zalloc(9) is totally non-optional in GENERIC/for WITNESS users, make sure the sysctl to disable the behavior is read-only and always enabled.	2004-07-19 15:05:24 +00:00
bmilekic	df84cdbe06	Introduce debug.nosleepwithlocks sysctl, 0 by default. If set to 1 and WITNESS is not built, then force all M_WAITOK allocations to M_NOWAIT behavior (transparently). This is to be used temporarily if wierd deadlocks are reported because we still have code paths that perform M_WAITOK allocations with lock(s) held, which can lead to deadlock. If WITNESS is compiled, then the sysctl is ignored and we ask witness to tell us wether we have locks held, converting to M_NOWAIT behavior only if it tells us that we do. Note this removes the previous mbuf.h inclusion as well (only needed by last revision), and cleans up unneeded [artificial] comparisons to just the mbuf zones. The problem described above has nothing to do with previous mbuf wait behavior; it is a general problem.	2004-07-04 16:07:44 +00:00
green	77ef401fc6	Reextend the M_WAITOK-disabling-hack to all three of the mbuf-related zones, and do it by direct comparison of uma_zone_t instead of strcmp. The mbuf subsystem used to provide M_TRYWAIT/M_DONTWAIT semantics, but this is mostly no longer the case. M_WAITOK has taken over the spot M_TRYWAIT used to have, and for mbuf things, still may return NULL if the code path is incorrectly holding a mutex going into mbuf allocation functions. The M_WAITOK/M_NOWAIT semantics are absolute; though it may deadlock the system to try to malloc or uma_zalloc something with a mutex held and M_WAITOK specified, it is absolutely required to not return NULL and will result in instability and/or security breaches otherwise. There is still room to add the WITNESS_WARN() to all cases so that we are notified of the possibility of deadlocks, but it cannot change the value of the "badness" variable and allow allocation to actually fail except for the specialized cases which used to be M_TRYWAIT.	2004-07-04 15:59:25 +00:00
green	b003469f2d	Limit mbuma damage. Suddenly ALL allocations with M_WAITOK are subject to failing -- that is, allocations via malloc(M_WAITOK) that are required to never fail -- if WITNESS is not defined. While everyone should be running WITNESS, in any case, zone "Mbuf" allocations are really the only ones that should be screwed with by this hack. This hack is crashing people, and would continue to do so with or without WITNESS. Things shouldn't be allocating with M_WAITOK with locks held, but it's not okay just to always remove M_WAITOK when !WITNESS. Reported by: Bernd Walter <ticso@cicely5.cicely.de>	2004-07-03 18:11:41 +00:00
bmilekic	7a6a2d65d4	Make uma_mtx MTX_RECURSE. Here's why: The general UMA lock is a recursion-allowed lock because there is a code path where, while we're still configured to use startup_alloc() for backend page allocations, we may end up in uma_reclaim() which calls zone_foreach(zone_drain), which grabs uma_mtx, only to later call into startup_alloc() because while freeing we needed to allocate a bucket. Since startup_alloc() also takes uma_mtx, we need to be able to recurse on it. This exact explanation also added as comment above mtx_init(). Trace showing recursion reported by: Peter Holm <peter-at-holm.cc>	2004-06-23 21:59:03 +00:00
bmilekic	ea4a8a094f	Backout previous change, I think Julian has a better solution which does not require type-stable refcnts here.	2004-06-09 20:50:08 +00:00
bmilekic	1edc23feaa	Make the slabrefzone, the zone from which we allocated slabs with internal reference counters, UMA_ZONE_NOFREE. This way, those slabs (with their ref counts) will be effectively type-stable, then using a trick like this on the refcount is no longer dangerous: MEXT_REM_REF(m); if (atomic_cmpset_int(m->m_ext.ref_cnt, 0, 1)) { if (m->m_ext.ext_type == EXT_PACKET) { uma_zfree(zone_pack, m); return; } else if (m->m_ext.ext_type == EXT_CLUSTER) { uma_zfree(zone_clust, m->m_ext.ext_buf); m->m_ext.ext_buf = NULL; } else { (*(m->m_ext.ext_free))(m->m_ext.ext_buf, m->m_ext.ext_args); if (m->m_ext.ext_type != EXT_EXTREF) free(m->m_ext.ref_cnt, M_MBUF); } } uma_zfree(zone_mbuf, m); Previously, a second thread hitting the above cmpset might actually read the refcnt AFTER it has already been freed. A very rare occurance. Now we'll know that it won't be freed, though. Spotted by: julian, pjd	2004-06-09 19:18:50 +00:00
bmilekic	f7574a2276	Bring in mbuma to replace mballoc. mbuma is an Mbuf & Cluster allocator built on top of a number of extensions to the UMA framework, all included herein. Extensions to UMA worth noting: - Better layering between slab <-> zone caches; introduce Keg structure which splits off slab cache away from the zone structure and allows multiple zones to be stacked on top of a single Keg (single type of slab cache); perhaps we should look into defining a subset API on top of the Keg for special use by malloc(9), for example. - UMA_ZONE_REFCNT zones can now be added, and reference counters automagically allocated for them within the end of the associated slab structures. uma_find_refcnt() does a kextract to fetch the slab struct reference from the underlying page, and lookup the corresponding refcnt. mbuma things worth noting: - integrates mbuf & cluster allocations with extended UMA and provides caches for commonly-allocated items; defines several zones (two primary, one secondary) and two kegs. - change up certain code paths that always used to do: m_get() + m_clget() to instead just use m_getcl() and try to take advantage of the newly defined secondary Packet zone. - netstat(1) and systat(1) quickly hacked up to do basic stat reporting but additional stats work needs to be done once some other details within UMA have been taken care of and it becomes clearer to how stats will work within the modified framework. From the user perspective, one implication is that the NMBCLUSTERS compile-time option is no longer used. The maximum number of clusters is still capped off according to maxusers, but it can be made unlimited by setting the kern.ipc.nmbclusters boot-time tunable to zero. Work should be done to write an appropriate sysctl handler allowing dynamic tuning of kern.ipc.nmbclusters at runtime. Additional things worth noting/known issues (READ): - One report of 'ips' (ServeRAID) driver acting really slow in conjunction with mbuma. Need more data. Latest report is that ips is equally sucking with and without mbuma. - Giant leak in NFS code sometimes occurs, can't reproduce but currently analyzing; brueffer is able to reproduce but THIS IS NOT an mbuma-specific problem and currently occurs even WITHOUT mbuma. - Issues in network locking: there is at least one code path in the rip code where one or more locks are acquired and we end up in m_prepend() with M_WAITOK, which causes WITNESS to whine from within UMA. Current temporary solution: force all UMA allocations to be M_NOWAIT from within UMA for now to avoid deadlocks unless WITNESS is defined and we can determine with certainty that we're not holding any locks when we're M_WAITOK. - I've seen at least one weird socketbuffer empty-but- mbuf-still-attached panic. I don't believe this to be related to mbuma but please keep your eyes open, turn on debugging, and capture crash dumps. This change removes more code than it adds. A paper is available detailing the change and considering various performance issues, it was presented at BSDCan2004: http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf Please read the paper for Future Work and implementation details, as well as credits. Testing and Debugging: rwatson, brueffer, Ketrien I. Saihr-Kesenchedra, ... Reviewed by: Lots of people (for different parts)	2004-05-31 21:46:06 +00:00
alc	dbdc402421	- Make the acquisition of Giant in vm_fault_unwire() conditional on the pmap. For the kernel pmap, Giant is not required. In general, for other pmaps, Giant is required by i386's pmap_pte() implementation. Specifically, the use of PMAP2/PADDR2 is synchronized by Giant. Note: In principle, updates to the kernel pmap's wired count could be lost without Giant. However, in practice, we never use the kernel pmap's wired count. This will be resolved when pmap locking appears. - With the above change, cpu_thread_clean() and uma_large_free() need not acquire Giant. (The first case is simply the revival of i386/i386/vm_machdep.c's revision 1.226 by peter.)	2004-03-10 04:44:43 +00:00
rwatson	fa59040dad	Mark uma_callout as CALLOUT_MPSAFE, as uma_timeout can run MPSAFE. Reviewed by: jeff	2004-03-07 07:00:46 +00:00
jeff	8b93703f2c	- Fix a problem where we did not drain the cache of buckets in the zone when uma_reclaim() was called. This was introduced when the zone working-set algorithm was removed in favor of using the per cpu caches as the working set.	2004-02-01 06:15:17 +00:00
des	40b179743a	Mechanical whitespace cleanup.	2004-01-30 16:26:29 +00:00
jhb	4b61439e79	Fix all users of mp_maxid to use the same semantics, namely: 1) mp_maxid is a valid FreeBSD CPU ID in the range 0 .. MAXCPU - 1. 2) For all active CPUs in the system, PCPU_GET(cpuid) <= mp_maxid. Approved by: re (scottl) Tested on: i386, amd64, alpha	2003-12-03 14:57:26 +00:00
jeff	d26b674d39	- Unbreak UP. mp_maxid is not defined on uni-processor machines, although I believe it and the other MP variables should be. For now, just define it here and wait for jhb to clean it up later. Approved by: re (rwatson)	2003-11-30 22:18:14 +00:00
jeff	80dcf38c3a	- Replace the local maxcpu with mp_maxid. Previously, if mp_maxid was equal to MAXCPU, we would overrun the pcpu_mtx array because maxcpu was calculated incorrectly. - Add some more debugging code so that memory leaks at the time of uma_zdestroy() are more easily diagnosed. Approved by: re (rwatson)	2003-11-30 08:04:01 +00:00
alc	48c9756047	- Remove use of Giant from uma_zone_set_obj().	2003-11-14 17:49:07 +00:00
jeff	9982722580	- Fix MD_SMALL_ALLOC on architectures that support it. Define a new alloc function, startup_alloc(), that is used for single page allocations prior to the VM starting up. If it is used after the VM startups up, it replaces the zone's allocf pointer with either page_alloc() or uma_small_alloc() where appropriate. Pointy hat to: me Tested by: phk/amd64, me/x86	2003-09-21 07:39:16 +00:00
peter	bfb0c45b8f	Bad Jeffr! No cookie! Temporarily disable the UMA_MD_SMALL_ALLOC stuff since recent commits break sparc64, amd64, ia64 and alpha. It appears only i386 and maybe powerpc were not broken.	2003-09-20 23:35:33 +00:00
jeff	accdfbd626	- Remove the working-set algorithm. Instead, use the per cpu buckets as the working set cache. This has several advantages. Firstly, we never touch the per cpu queues now in the timeout handler. This removes one more reason for having per cpu locks. Secondly, it reduces the size of the zone by 8 bytes, bringing it under 200 bytes for a single proc x86 box. This tidies up other logic as well. - The 'destroy' flag no longer needs to be passed to zone_drain() since it always frees everything in the zone's slabs. - cache_drain() is now only called from zone_dtor() and so it destroys by default. It also does not need the destroy parameter now.	2003-09-19 23:27:46 +00:00
jeff	a234ab2fa7	- Remove the cache colorization code. We can't use it due to all of the broken consumers of the malloc interface who assume that the allocated address will be an even multiple of the size. - Remove disabled time delay code on uma_reclaim(). The comment there said it all. It was not an effective strategy and it should not be left in #if 0'd for all eternity.	2003-09-19 23:04:44 +00:00
jeff	1abaac476b	- There are an endless stream of style(9) errors in this file. Fix a few. Also catch some spelling errors.	2003-09-19 22:31:45 +00:00
jeff	82c0b53020	- Don't inspect the zone in page_alloc(). It may be NULL. - Don't cache more items than the zone would like in uma_zalloc_bucket().	2003-09-19 09:22:04 +00:00
jeff	2d4c121a6d	- Move the logic for dealing with the uma_boot_pages cache into the page_alloc() function from the slab_zalloc() function. This allows us to unconditionally call uz_allocf(). - In page_alloc() cleanup the boot_pages logic some. Previously memory from this cache that was not used by the time the system started was left in the cache and never used. Typically this wasn't more than a few pages, but now we will use this cache so long as memory is available.	2003-09-19 08:53:33 +00:00
jeff	30f275bb51	- Fix the silly flag situation in UMA. Remove redundant ZFLAG/ZONE flags by accepting the user supplied flags directly. Previously this was not done so that flags for the same field would not be defined in two different files. Add comments in each header instructing future developers on how now to shoot their feet. - Fix a test for !OFFPAGE which should have been a test for HASH. This would have caused a panic if we had ever destructed a malloc zone. This also opens up the possibility that other zones could use the vsetobj() method rather than a hash.	2003-09-19 08:37:44 +00:00
jeff	620ea1ef61	- Don't abuse M_DEVBUF, define a tag for UMA hashes.	2003-09-19 07:23:50 +00:00
jeff	b6dd0c8bfb	- Eliminate a pair of unnecessary variables.	2003-09-19 06:41:06 +00:00
jeff	b8696d32c3	- Initialize a pool of bucket zones so that we waste less space on zones that don't cache as many items. - Introduce the bucket_alloc(), bucket_free() functions to wrap bucket allocation. These functions select the appropriate bucket zone to allocate from or free to. - Rename ub_ptr to ub_cnt to reflect a change in its use. ub_cnt now reflects the count of free items in the bucket. This gets rid of many unnatural subtractions by 1 throughout the code. - Add ub_entries which reflects the number of entries possibly held in a bucket.	2003-09-19 06:26:45 +00:00
bmilekic	f0a28c0844	In sysctl_vm_zone, do not calculate per-cpu cache stats on UMA_ZFLAG_INTERNAL zones at all. Apparently, Wilko's alpha was crashing while entering multi-user because, I think, we were calculating the garbage cachefree for pcpu caches that essentially don't exist for at least the 'zones' zone and it so happened that we were reading from an unmapped location. Confirmed to fix crash: wilko Helped debug: wilko, gallatin	2003-08-20 18:22:06 +00:00
bmilekic	77b7809eb0	- When deciding whether to init the zone with small_init or large_init, compare the zone element size (+1 for the byte of linkage) against UMA_SLAB_SIZE - sizeof(struct uma_slab), and not just UMA_SLAB_SIZE. Add a KASSERT in zone_small_init to make sure that the computed ipers (items per slab) for the zone is not zero, despite the addition of the check, just to be sure (this part submitted by: silby) - UMA_ZONE_VM used to imply BUCKETCACHE. Now it implies CACHEONLY instead. CACHEONLY is like BUCKETCACHE in the case of bucket allocations, but in addition to that also ensures that we don't setup the zone with OFFPAGE slab headers allocated from the slabzone. This means that we're not allowed to have a UMA_ZONE_VM zone initialized for large items (zone_large_init) because it would require the slab headers to be allocated from slabzone, and hence kmem_map. Some of the zones init'd with UMA_ZONE_VM are so init'd before kmem_map is suballoc'd from kernel_map, which is why this change is necessary.	2003-08-11 19:39:45 +00:00
alc	52878a6770	Revise obj_alloc(). Most notably, use the object's lock to prevent two concurrent invocations from acquiring the same address(es). Also, in case of an incomplete allocation, free any allocated pages. In collaboration with: tegge	2003-08-03 06:08:48 +00:00
bmilekic	2a8e0c5c0a	When INVARIANTS is on and we're in uma_zalloc_free(), we need to make sure that uma_dbg_free() is called if we're about to call uma_zfree_internal() but we're asking it to skip the dtor and uma_dbg_free() call itself. So, if we're about to call uma_zfree_internal() from uma_zfree_arg() and skip == 1, call uma_dbg_free() ourselves.	2003-08-02 22:40:27 +00:00
bmilekic	9caa205e5b	Only free the pcpu cache buckets if they are non-NULL. Crashed this person's machine: harti Pointy-hat to: me	2003-08-01 17:42:27 +00:00
bmilekic	7c379c85d8	Plug a race and a leak in UMA. 1) The race has to do with zone destruction. From the zone destructor we would lock the zone, set the working set size to 0, then unlock the zone, drain it, and then free the structure. Within the window following the working-set-size set to 0 and unlocking of the zone and the point where in zone_drain we re-acquire the zone lock, the uma timer routine could have fired off and changed the working set size to something non-zero, thereby potentially preventing us from completely freeing slabs before destroying the zone (and thus leaking them). 2) The leak has to do with zone destruction as well. When destroying a zone we would take care to free all the buckets cached in the zone, but although we would drain the pcpu cache buckets, we would not free them. This resulted in leaking a couple of bucket structures (512 bytes each) per cpu on SMP during zone destruction. While I'm here, also silence GCC warnings by turning uma_slab_alloc() from inline to real function. It's too big to be an inline. Reviewed by: JeffR	2003-07-30 18:55:15 +00:00
bmilekic	260d19ed7e	When generating the zone stats make sure to handle the master zone ("UMA Zone") carefully, because it does not have pcpu caches allocated at all. In the UP case, we did not catch this because one pcpu cache is always allocated with the zone, but for the MP case, we were getting bogus stats for this zone. Tested by: Lukas Ertl <le@univie.ac.at>	2003-07-30 15:22:37 +00:00
phk	213f4e3d07	Remove the disabling of buckets workaround. Thanks to: jeffr	2003-07-30 07:50:19 +00:00
jeff	8512070a52	- Get rid of the ill-conceived uz_cachefree member of uma_zone. - In sysctl_vm_zone use the per cpu locks to read the current cache statistics this makes them more accurate while under heavy load. Submitted by: tegge	2003-07-30 05:59:17 +00:00
jeff	50d6e1a822	- Check to see if we need a slab prior to allocating one. Failure to do so not only wastes memory but it can also cause a leak in zones that will be destroyed later. The problem is that the slab allocation code places newly created slabs on the partially allocated list because it assumes that the caller will actually allocate some memory from it. Failure to do so places an otherwise free slab on the partial slab list where we wont find it later in zone_drain(). Continuously prodded to fix by: phk (Thanks)	2003-07-30 05:42:55 +00:00
phk	70398bc9a3	Temporary workaround: Always disable buckets, there is a bug there somewhere. JeffR will look at this as soon as he has time. OK'ed by: jeffr	2003-07-29 22:07:10 +00:00
alc	79bbf9b702	None of the "alloc" functions used by UMA assume that Giant is held any longer. (If they still need it, e.g., contigmalloc(), they acquire it themselves.) Therefore, we need not acquire Giant in slab_zalloc().	2003-07-28 02:29:07 +00:00
alc	0cffd21856	Gulp ... call kmem_malloc() without Giant.	2003-07-26 03:55:32 +00:00
harti	de9698a4f7	When INVARIANTS is defined make sure that uma_zalloc_arg (and hence uma_zalloc) is called with exactly one of either M_WAITOK or M_NOWAIT and that it is called with neither M_TRYWAIT or M_DONTWAIT. Print a warning if anything is wrong. Default to M_WAITOK of no flag is given. This is the same test as in malloc(9).	2003-07-18 16:04:36 +00:00
bmilekic	bf27dce79b	Move the pcpu lock out of the uma_cache and instead have a single set of pcpu locks. This makes uma_zone somewhat smaller (by (LOCKNAME_LEN * sizeof(char) + sizeof(struct mtx) * maxcpu) bytes, to be exact). No Objections from jeff.	2003-06-25 20:49:48 +00:00
bmilekic	2763308887	Make sure that the zone destructor doesn't get called twice in certain free paths.	2003-06-25 17:25:45 +00:00
obrien	b0678d7a44	Use __FBSDID().	2003-06-11 23:50:51 +00:00
phk	07cd5a08f3	Revert last commit, I have no idea what happened.	2003-06-09 22:51:36 +00:00
phk	5411462454	A white-space nit I noticed.	2003-06-09 19:40:34 +00:00
alc	482e860dda	uma_zone_set_obj() must perform VM_OBJECT_LOCK_INIT() if the caller provides storage for the vm_object.	2003-04-28 06:11:32 +00:00
alc	f882712248	Remove an XXX comment. It is no longer a problem.	2003-04-26 05:00:56 +00:00
alc	0086837cd4	Lock the vm_object in obj_alloc().	2003-04-19 00:30:36 +00:00
gallatin	e2c2f3b862	Don't grab Giant in slab_zalloc() if M_NOWAIT is specified. This should allow the use of INTR_MPSAFE network drivers. Tested by: njl Glanced at by: jeff	2003-04-18 13:02:29 +00:00
tegge	5a1c871266	Obtain Giant before calling kmem_alloc without M_NOWAIT and before calling kmem_free if Giant isn't already held.	2003-03-26 18:44:53 +00:00
jhb	e4bcd25517	Replace calls to WITNESS_SLEEP() and witness_list() with equivalent calls to WITNESS_WARN().	2003-03-04 21:03:05 +00:00
imp	cf874b345d	Back out M_* changes, per decision of the TRB. Approved by: trb	2003-02-19 05:47:46 +00:00
phk	fcaa2a49c4	Change a printf to also tell how many items were left in the zone.	2003-02-04 08:23:18 +00:00
alfred	bf8e8a6e8f	Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0. Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.	2003-01-21 08:56:16 +00:00
jeff	b584d46930	- M_WAITOK is 0 and not a real flag. Test for this properly. Submitted by: tmm Pointy hat to: jeff	2003-01-20 01:32:56 +00:00
schweikh	d3367c5f5d	Correct typos, mostly s/ a / an / where appropriate. Some whitespace cleanup, especially in troff files.	2003-01-01 18:49:04 +00:00
jeff	11d7ff5fa8	- Wakeup the correct address when a zone is no longer full. Spotted by: jake	2002-11-18 08:27:14 +00:00
jeff	d925843eaa	- Don't forget the flags value when using boot pages. Reported by: grehan	2002-11-16 20:57:41 +00:00
mjacob	a58a13572c	atomic_set_8 isn't MI. Instead, follow Jake's suggestions about ZONE_LOCK.	2002-11-11 11:50:03 +00:00
jeff	b7d774ad97	- Add support for machine dependant page allocation routines. MD code may define UMA_MD_SMALL_ALLOC to make use of this feature. Reviewed by: peter, jake	2002-11-01 01:01:27 +00:00
jeff	ccacc0c953	- Now that uma_zalloc_internal is not the fast path don't be so fussy about extra function calls. Refactor uma_zalloc_internal into seperate functions for finding the most appropriate slab, filling buckets, allocating single items, and pulling items off of slabs. This makes the code significantly cleaner. - This also fixes the "Returning an empty bucket." panic that a few people have seen. Tested On: alpha, x86	2002-10-24 07:59:03 +00:00
jeff	203e2484f5	- Move the destructor calls so that they are not called with the zone lock held. This avoids a lock order reversal when destroying zones. Unfortunately, this also means that the free checks are not done before the destructor is called. Reported by: phk	2002-10-24 06:17:30 +00:00
phk	1dfc2c167f	Be consistent about "static" functions: if the function is marked static in its prototype, mark it static at the definition too. Inspired by: FlexeLint warning #512	2002-09-28 17:15:38 +00:00
jeff	f3621a00cf	- Use my freebsd email alias in the copyright. - Remove redundant instances of my email alias in the file summary.	2002-09-19 06:05:32 +00:00
jeff	aeb98331c1	- Split UMA_ZFLAG_OFFPAGE into UMA_ZFLAG_OFFPAGE and UMA_ZFLAG_HASH. - Remove all instances of the mallochash. - Stash the slab pointer in the vm page's object pointer when allocating from the kmem_obj. - Use the overloaded object pointer to find slabs for malloced memory.	2002-09-18 08:26:30 +00:00
archie	5ea3052c0e	Don't use "NULL" when "0" is really meant.	2002-08-21 23:39:52 +00:00
jeff	28043f7b73	Fix a lock order reversal in uma_zdestroy. The uma_mtx needs to be held across calls to zone_drain(). Noticed by: scottl	2002-07-05 21:39:52 +00:00
jeff	7b0eebbe58	Remove unnecessary includes.	2002-07-05 05:16:19 +00:00
jeff	2faa149982	Actually use the fini callback. Pointy hat to: me :-( Noticed By: Julian	2002-07-03 00:30:51 +00:00
jeff	e9c6c8e0fd	Reduce the amount of code that runs with the zone lock held in slab_zalloc(). This allows us to run the zone initialization functions without any locks held.	2002-06-25 21:04:50 +00:00
jeff	4df8a5cb05	- Remove bogus use of kmem_alloc that was inherited from the old zone allocator. - Properly set M_ZERO when talking to the back end page allocators for non malloc zones. This forces us to zero fill pages when they are first brought into a cache. - Properly handle M_ZERO in uma_zalloc_internal. This fixes a problem where per cpu buckets weren't always getting zeroed.	2002-06-19 20:49:44 +00:00
jeff	c6ac4e0b64	Honor the BUCKETCACHE flag on free as well.	2002-06-17 23:53:58 +00:00
jeff	030d3fdb72	- Introduce the new M_NOVM option which tells uma to only check the currently allocated slabs and bucket caches for free items. It will not go ask the vm for pages. This differs from M_NOWAIT in that it not only doesn't block, it doesn't even ask. - Add a new zcreate option ZONE_VM, that sets the BUCKETCACHE zflag. This tells uma that it should only allocate buckets out of the bucket cache, and not from the VM. It does this by using the M_NOVM option to zalloc when getting a new bucket. This is so that the VM doesn't recursively enter itself while trying to allocate buckets for vm_map_entry zones. If there are already allocated buckets when we get here we'll still use them but otherwise we'll skip it. - Use the ZONE_VM flag on vm map entries and pv entries on x86.	2002-06-17 22:02:41 +00:00
iedowse	02040b5ae2	Correct the logic for determining whether the per-CPU locks need to be destroyed. This fixes a problem where destroying a UMA zone would fail to destroy all zone mutexes. Reviewed by: jeff	2002-06-10 03:25:23 +00:00
jeff	d9ab0c8dbc	Add a comment describing a resource leak that occurs during a failure case in obj_alloc.	2002-06-03 22:59:19 +00:00
jhb	d53ecb9f84	In uma_zalloc_arg(), if we are performing a M_WAITOK allocation, ensure that td_intr_nesting_level is 0 (like malloc() does). Since malloc() calls uma we can probably remove the check in malloc() for this now. Also, perform an extra witness check in that case to make sure we don't hold any locks when performing a M_WAITOK allocation.	2002-05-20 17:54:48 +00:00
jeff	7b96796a72	Don't call the uz free function while the zone lock is held. This can lead to lock order reversals. uma_reclaim now builds a list of freeable slabs and then unlocks the zones to do all of the frees.	2002-05-13 05:08:18 +00:00
jeff	9020efbab0	Remove the hash_free() lock order reversal. This could have happened for several reasons before. Fixing it involved restructuring the generic hash code to require calling code to handle locking, unlocking, and freeing hashes on error conditions.	2002-05-13 04:39:28 +00:00
jeff	926e98b719	Use pages instead of uz_maxpages, which has not been initialized yet, when creating the vm_object. This was broken after the code was rearranged to grab giant itself. Spotted by: alc	2002-05-04 21:49:29 +00:00

1 2 3 4

171 Commits