Commit Graph

118 Commits

Author SHA1 Message Date
Robert Watson
5d1ae027f0 Modify UMA to use critical sections to protect per-CPU caches, rather than
mutexes, which offers lower overhead on both UP and SMP.  When allocating
from or freeing to the per-cpu cache, without INVARIANTS enabled, we now
no longer perform any mutex operations, which offers a 1%-3% performance
improvement in a variety of micro-benchmarks.  We rely on critical
sections to prevent (a) preemption resulting in reentrant access to UMA on
a single CPU, and (b) migration of the thread during access.  In the event
we need to go back to the zone for a new bucket, we release the critical
section to acquire the global zone mutex, and must re-acquire the critical
section and re-evaluate which cache we are accessing in case migration has
occured, or circumstances have changed in the current cache.

Per-CPU cache statistics are now gathered lock-free by the sysctl, which
can result in small races in statistics reporting for caches.

Reviewed by:	bmilekic, jeff (somewhat)
Tested by:	rwatson, kris, gnn, scottl, mike at sentex dot net, others
2005-04-29 18:56:36 +00:00
Alan Cox
b70458aec3 Revert the first part of revision 1.114 and modify the second part. On
architectures implementing uma_small_alloc() pages do not necessarily
belong to the kmem object.
2005-02-24 06:13:01 +00:00
Bosko Milekic
8076cb5289 Well, it seems that I pre-maturely removed the "All rights reserved"
statement from some files, so re-add it for the moment, until the
related legalese is sorted out.  This change affects:

sys/kern/kern_mbuf.c
sys/vm/memguard.c
sys/vm/memguard.h
sys/vm/uma.h
sys/vm/uma_core.c
sys/vm/uma_dbg.c
sys/vm/uma_dbg.h
sys/vm/uma_int.h
2005-02-16 21:45:59 +00:00
Bosko Milekic
500f29d06e Make UMA set the overloaded page->object back to kmem_object for
UMA_ZONE_REFCNT and UMA_ZONE_MALLOC zones, as the page(s) undoubtedly
came from kmem_map for those two.  Previously it would set it back
to NULL for UMA_ZONE_REFCNT zones and although this was probably not
fatal, it added MORE code for no reason.
2005-02-16 20:06:11 +00:00
Bosko Milekic
c5c1b16ec5 While we want the recursion protection for the bucket zones so that
recursion from the VM is handled (and the calling code that allocates
buckets knows how to deal with it), we do not want to prevent allocation
from the slab header zones (slabzone and slabrefzone) if uk_recurse is
not zero for them.  The reason is that it could lead to NULL being
returned for the slab header allocations even in the M_WAITOK
case, and the caller can't handle that (this is also explained in a
comment with this commit).

The problem analysis is documented in our mailing lists:
http://docs.freebsd.org/cgi/getmsg.cgi?fetch=153445+0+archive/2004/freebsd-current/20041231.freebsd-current

(see entire thread for proper context).

Crash dump data provided by: Peter Holm <peter@holm.cc>
2005-01-11 03:33:09 +00:00
Stefan Farfeleder
1e183df21e ISO C requires at least one element in an initialiser list. 2005-01-10 20:30:04 +00:00
Warner Losh
60727d8b86 /* -> /*- for license, minor formatting changes 2005-01-07 02:29:27 +00:00
Bosko Milekic
7b8712053c Add my copyright and update Jeff's copyright on UMA source files,
as per his request.

Discussed with: Jeffrey Roberson
2004-12-26 00:35:12 +00:00
Robert Watson
dc2c7965c0 Abstract the logic to look up the uma_bucket_zone given a desired
number of entries into bucket_zone_lookup(), which helps make more
clear the logic of consumers of bucket zones.

Annotate the behavior of bucket_init() with a comment indicating
how the various data structures, including the bucket lookup tables,
are initialized.
2004-11-06 11:43:30 +00:00
Robert Watson
f9d27e7524 Annotate what bucket_size[] array does; staticize since it's used only
in uma_core.c.
2004-11-06 11:24:40 +00:00
Bosko Milekic
a5a262c6db Fix a INVARIANTS-only bug introduced in Revision 1.104:
IF INVARIANTS is defined, and in the rare case that we have
allocated some objects from the slab and at least one initializer
on at least one of those objects failed, and we need to fail the
allocation and push the uninitialized items back into the slab
caches -- in that scenario, we would fail to [re]set the
bucket cache's ub_bucket item references to NULL, which would
eventually trigger a KASSERT.
2004-10-27 21:19:35 +00:00
Brian Feldman
55fc8c1146 In the previous revision, I did not intend to change the default value
of "nosleepwithlocks."

Submitted by:	ru
2004-10-09 18:51:32 +00:00
Brian Feldman
ab14a3f7aa Fix critical stability problems that can cause UMA mbuf cluster
state management corruption, mbuf leaks, general mbuf corruption,
and at least on i386 a first level splash damage radius that
encompasses up to about half a megabyte of the memory after
an mbuf cluster's allocation slab.  In short, this has caused
instability nightmares anywhere the right kind of network traffic
is present.

When the polymorphic refcount slabs were added to UMA, the new types
were not used pervasively.  In particular, the slab management
structure was turned into one for refcounts, and one for non-refcounts
(supposed to be mostly like the old slab management structure),
but the latter was almost always used through out.  In general, every
access to zones with UMA_ZONE_REFCNT turned on corrupted the
"next free" slab offset offset and the refcount with each other and
with other allocations (on i386, 2 mbuf clusters per 4096 byte slab).

Fix things so that the right type is used to access refcounted zones
where it was not before.  There are additional errors in gross
overestimation of padding, it seems, that would cause a large kegs
(nee zones) to be allocated when small ones would do.  Unless I have
analyzed this incorrectly, it is not directly harmful.
2004-10-08 20:19:29 +00:00
Robert Watson
3659f747f1 Generate KTR trace records for uma_zalloc_arg() and uma_zfree_arg().
This doesn't trace every event of interest in UMA, but provides
enough basic information to explain lock traces and sleep patterns.
2004-08-06 21:52:38 +00:00
Brian Feldman
b23f72e98a * Add a "how" argument to uma_zone constructors and initialization functions
so that they know whether the allocation is supposed to be able to sleep
  or not.
* Allow uma_zone constructors and initialation functions to return either
  success or error.  Almost all of the ones in the tree currently return
  success unconditionally, but mbuf is a notable exception: the packet
  zone constructor wants to be able to fail if it cannot suballocate an
  mbuf cluster, and the mbuf allocators want to be able to fail in general
  in a MAC kernel if the MAC mbuf initializer fails.  This fixes the
  panics people are seeing when they run out of memory for mbuf clusters.
* Allow debug.nosleepwithlocks on WITNESS to be disabled, without changing
  the default.

Both bmilekic and jeff have reviewed the changes made to make failable
zone allocations work.
2004-08-02 00:18:36 +00:00
Bosko Milekic
244f45548a Rework the way slab header storage space is calculated in UMA.
- zone_large_init() stays pretty much the same.
- zone_small_init() will try to stash the slab header in the slab page
  being allocated if the amount of calculated wasted space is less
  than UMA_MAX_WASTE (for both the UMA_ZONE_REFCNT case and regular
  case).  If the amount of wasted space is >= UMA_MAX_WASTE, then
  UMA_ZONE_OFFPAGE will be set and the slab header will be allocated
  separately for better use of space.
- uma_startup() calculates the maximum ipers required in offpage slabs
  (so that the offpage slab header zone(s) can be sized accordingly).
  The algorithm used to calculate this replaces the old calculation
  (which only happened to work coincidentally).  We now iterate over
  possible object sizes, starting from the smallest one, until we
  determine that wastedspace calculated in zone_small_init() might
  end up being greater than UMA_MAX_WASTE, at which point we use the
  found object size to compute the maximum possible ipers.  The
  reason this works is because:
      - wastedspace versus objectsize is a see-saw function with
        local minima all equal to zero and local maxima growing
        directly proportioned to objectsize.  This implies that
        for objects up to or equal a certain objectsize, the see-saw
        remains entirely below UMA_MAX_WASTE, so for those objectsizes
        it is impossible to ever go OFFPAGE for slab headers.
      - ipers (items-per-slab) versus objectsize is an inversely
        proportional function which falls off very quickly (very large
        for small objectsizes).
      - To determine the maximum ipers we'll ever need from OFFPAGE
        slab headers we first find the largest objectsize for which
        we are guaranteed to not go offpage for and use it to compute
        ipers (as though we were offpage).  Since the only objectsizes
        allowed to go offpage are bigger than the found objectsize,
        and since ipers vs objectsize is inversely proportional (and
        monotonically decreasing), then we are guaranteed that the
        ipers computed is always >= what we will ever need in offpage
        slab headers.
- Define UMA_FRITM_SZ and UMA_FRITMREF_SZ to be the actual (possibly
  padded) size of each freelist index so that offset calculations are
  fixed.

This might fix weird data corruption problems and certainly allows
ARM to now boot to at least single-user (via simulator).

Tested on i386 UP by me.
Tested on sparc64 SMP by fenner.
Tested on ARM simulator to single-user by cognet.
2004-07-29 15:25:40 +00:00
Alan Cox
5285558ac2 - Change uma_zone_set_obj() to call kmem_alloc_nofault() instead of
kmem_alloc_pageable().  The difference between these is that an errant
   memory access to the zone will be detected sooner with
   kmem_alloc_nofault().

The following changes serve to eliminate the following lock-order
reversal reported by witness:

 1st 0xc1a3c084 vm object (vm object) @ vm/swap_pager.c:1311
 2nd 0xc07acb00 swap_pager swhash (swap_pager swhash) @ vm/swap_pager.c:1797
 3rd 0xc1804bdc vm object (vm object) @ vm/uma_core.c:931

There is no potential deadlock in this case.  However, witness is unable
to recognize this because vm objects used by UMA have the same type as
ordinary vm objects.  To remedy this, we make the following changes:

 - Add a mutex type argument to VM_OBJECT_LOCK_INIT().
 - Use the mutex type argument to assign distinct types to special
   vm objects such as the kernel object, kmem object, and UMA objects.
 - Define a static swap zone object for use by UMA.  (Only static
   objects are assigned a special mutex type.)
2004-07-22 19:44:49 +00:00
Brian Feldman
0c3c862e21 Since breakage of malloc(9)/uma_zalloc(9) is totally non-optional in
GENERIC/for WITNESS users, make sure the sysctl to disable the behavior
is read-only and always enabled.
2004-07-19 15:05:24 +00:00
Bosko Milekic
0d0837ee6d Introduce debug.nosleepwithlocks sysctl, 0 by default. If set to 1
and WITNESS is not built, then force all M_WAITOK allocations to
M_NOWAIT behavior (transparently).  This is to be used temporarily
if wierd deadlocks are reported because we still have code paths
that perform M_WAITOK allocations with lock(s) held, which can
lead to deadlock.  If WITNESS is compiled, then the sysctl is ignored
and we ask witness to tell us wether we have locks held, converting
to M_NOWAIT behavior only if it tells us that we do.

Note this removes the previous mbuf.h inclusion as well (only needed
by last revision), and cleans up unneeded [artificial] comparisons
to just the mbuf zones.  The problem described above has nothing to
do with previous mbuf wait behavior; it is a general problem.
2004-07-04 16:07:44 +00:00
Brian Feldman
7a708c3626 Reextend the M_WAITOK-disabling-hack to all three of the mbuf-related
zones, and do it by direct comparison of uma_zone_t instead of strcmp.

The mbuf subsystem used to provide M_TRYWAIT/M_DONTWAIT semantics, but
this is mostly no longer the case.  M_WAITOK has taken over the spot
M_TRYWAIT used to have, and for mbuf things, still may return NULL if
the code path is incorrectly holding a mutex going into mbuf allocation
functions.

The M_WAITOK/M_NOWAIT semantics are absolute; though it may deadlock
the system to try to malloc or uma_zalloc something with a mutex held
and M_WAITOK specified, it is absolutely required to not return NULL
and will result in instability and/or security breaches otherwise.
There is still room to add the WITNESS_WARN() to all cases so that
we are notified of the possibility of deadlocks, but it cannot change
the value of the "badness" variable and allow allocation to actually
fail except for the specialized cases which used to be M_TRYWAIT.
2004-07-04 15:59:25 +00:00
Brian Feldman
cf107c1d1a Limit mbuma damage. Suddenly ALL allocations with M_WAITOK are subject
to failing -- that is, allocations via malloc(M_WAITOK) that are required
to never fail -- if WITNESS is not defined.  While everyone should be
running WITNESS, in any case, zone "Mbuf" allocations are really the only
ones that should be screwed with by this hack.

This hack is crashing people, and would continue to do so with or without
WITNESS.  Things shouldn't be allocating with M_WAITOK with locks held,
but it's not okay just to always remove M_WAITOK when !WITNESS.

Reported by:	Bernd Walter <ticso@cicely5.cicely.de>
2004-07-03 18:11:41 +00:00
Bosko Milekic
cc822cb53e Make uma_mtx MTX_RECURSE. Here's why:
The general UMA lock is a recursion-allowed lock because
there is a code path where, while we're still configured
to use startup_alloc() for backend page allocations, we
may end up in uma_reclaim() which calls zone_foreach(zone_drain),
which grabs uma_mtx, only to later call into startup_alloc()
because while freeing we needed to allocate a bucket.  Since
startup_alloc() also takes uma_mtx, we need to be able to
recurse on it.

This exact explanation also added as comment above mtx_init().

Trace showing recursion reported by: Peter Holm <peter-at-holm.cc>
2004-06-23 21:59:03 +00:00
Bosko Milekic
7fd8788213 Backout previous change, I think Julian has a better solution which
does not require type-stable refcnts here.
2004-06-09 20:50:08 +00:00
Bosko Milekic
e66468ea7a Make the slabrefzone, the zone from which we allocated slabs with
internal reference counters, UMA_ZONE_NOFREE.  This way, those slabs
(with their ref counts) will be effectively type-stable, then using
a trick like this on the refcount is no longer dangerous:

        MEXT_REM_REF(m);
        if (atomic_cmpset_int(m->m_ext.ref_cnt, 0, 1)) {
                if (m->m_ext.ext_type == EXT_PACKET) {
                        uma_zfree(zone_pack, m);
                        return;
                } else if (m->m_ext.ext_type == EXT_CLUSTER) {
                        uma_zfree(zone_clust, m->m_ext.ext_buf);
                        m->m_ext.ext_buf = NULL;
                } else {
                        (*(m->m_ext.ext_free))(m->m_ext.ext_buf,
                            m->m_ext.ext_args);
                        if (m->m_ext.ext_type != EXT_EXTREF)
                                free(m->m_ext.ref_cnt, M_MBUF);
                }
        }
        uma_zfree(zone_mbuf, m);

Previously, a second thread hitting the above cmpset might
actually read the refcnt AFTER it has already been freed.  A very
rare occurance.  Now we'll know that it won't be freed, though.

Spotted by: julian, pjd
2004-06-09 19:18:50 +00:00
Bosko Milekic
099a0e588c Bring in mbuma to replace mballoc.
mbuma is an Mbuf & Cluster allocator built on top of a number of
extensions to the UMA framework, all included herein.

Extensions to UMA worth noting:
  - Better layering between slab <-> zone caches; introduce
    Keg structure which splits off slab cache away from the
    zone structure and allows multiple zones to be stacked
    on top of a single Keg (single type of slab cache);
    perhaps we should look into defining a subset API on
    top of the Keg for special use by malloc(9),
    for example.
  - UMA_ZONE_REFCNT zones can now be added, and reference
    counters automagically allocated for them within the end
    of the associated slab structures.  uma_find_refcnt()
    does a kextract to fetch the slab struct reference from
    the underlying page, and lookup the corresponding refcnt.

mbuma things worth noting:
  - integrates mbuf & cluster allocations with extended UMA
    and provides caches for commonly-allocated items; defines
    several zones (two primary, one secondary) and two kegs.
  - change up certain code paths that always used to do:
    m_get() + m_clget() to instead just use m_getcl() and
    try to take advantage of the newly defined secondary
    Packet zone.
  - netstat(1) and systat(1) quickly hacked up to do basic
    stat reporting but additional stats work needs to be
    done once some other details within UMA have been taken
    care of and it becomes clearer to how stats will work
    within the modified framework.

From the user perspective, one implication is that the
NMBCLUSTERS compile-time option is no longer used.  The
maximum number of clusters is still capped off according
to maxusers, but it can be made unlimited by setting
the kern.ipc.nmbclusters boot-time tunable to zero.
Work should be done to write an appropriate sysctl
handler allowing dynamic tuning of kern.ipc.nmbclusters
at runtime.

Additional things worth noting/known issues (READ):
   - One report of 'ips' (ServeRAID) driver acting really
     slow in conjunction with mbuma.  Need more data.
     Latest report is that ips is equally sucking with
     and without mbuma.
   - Giant leak in NFS code sometimes occurs, can't
     reproduce but currently analyzing; brueffer is
     able to reproduce but THIS IS NOT an mbuma-specific
     problem and currently occurs even WITHOUT mbuma.
   - Issues in network locking: there is at least one
     code path in the rip code where one or more locks
     are acquired and we end up in m_prepend() with
     M_WAITOK, which causes WITNESS to whine from within
     UMA.  Current temporary solution: force all UMA
     allocations to be M_NOWAIT from within UMA for now
     to avoid deadlocks unless WITNESS is defined and we
     can determine with certainty that we're not holding
     any locks when we're M_WAITOK.
   - I've seen at least one weird socketbuffer empty-but-
     mbuf-still-attached panic.  I don't believe this
     to be related to mbuma but please keep your eyes
     open, turn on debugging, and capture crash dumps.

This change removes more code than it adds.

A paper is available detailing the change and considering
various performance issues, it was presented at BSDCan2004:
http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf
Please read the paper for Future Work and implementation
details, as well as credits.

Testing and Debugging:
    rwatson,
    brueffer,
    Ketrien I. Saihr-Kesenchedra,
    ...
Reviewed by: Lots of people (for different parts)
2004-05-31 21:46:06 +00:00
Alan Cox
5d328ed44b - Make the acquisition of Giant in vm_fault_unwire() conditional on the
pmap.  For the kernel pmap, Giant is not required.  In general, for
   other pmaps, Giant is required by i386's pmap_pte() implementation.
   Specifically, the use of PMAP2/PADDR2 is synchronized by Giant.
   Note: In principle, updates to the kernel pmap's wired count could be
   lost without Giant.  However, in practice, we never use the kernel
   pmap's wired count.  This will be resolved when pmap locking appears.
 - With the above change, cpu_thread_clean() and uma_large_free() need
   not acquire Giant.  (The first case is simply the revival of
   i386/i386/vm_machdep.c's revision 1.226 by peter.)
2004-03-10 04:44:43 +00:00
Robert Watson
a3c0761103 Mark uma_callout as CALLOUT_MPSAFE, as uma_timeout can run MPSAFE.
Reviewed by:	jeff
2004-03-07 07:00:46 +00:00
Jeff Roberson
aaa8bb1604 - Fix a problem where we did not drain the cache of buckets in the zone
when uma_reclaim() was called.  This was introduced when the zone
   working-set algorithm was removed in favor of using the per cpu caches
   as the working set.
2004-02-01 06:15:17 +00:00
Dag-Erling Smørgrav
e726bc0e6c Mechanical whitespace cleanup. 2004-01-30 16:26:29 +00:00
John Baldwin
b6c71225a9 Fix all users of mp_maxid to use the same semantics, namely:
1) mp_maxid is a valid FreeBSD CPU ID in the range 0 .. MAXCPU - 1.
2) For all active CPUs in the system, PCPU_GET(cpuid) <= mp_maxid.

Approved by:	re (scottl)
Tested on:	i386, amd64, alpha
2003-12-03 14:57:26 +00:00
Jeff Roberson
e30b97c5f9 - Unbreak UP. mp_maxid is not defined on uni-processor machines, although
I believe it and the other MP variables should be.  For now, just define it
   here and wait for jhb to clean it up later.

Approved by:	re (rwatson)
2003-11-30 22:18:14 +00:00
Jeff Roberson
504d5de3a8 - Replace the local maxcpu with mp_maxid. Previously, if mp_maxid
was equal to MAXCPU, we would overrun the pcpu_mtx array because maxcpu
   was calculated incorrectly.
 - Add some more debugging code so that memory leaks at the time of
   uma_zdestroy() are more easily diagnosed.

Approved by:	re (rwatson)
2003-11-30 08:04:01 +00:00
Alan Cox
d1f42ac2ee - Remove use of Giant from uma_zone_set_obj(). 2003-11-14 17:49:07 +00:00
Jeff Roberson
009b6fcb03 - Fix MD_SMALL_ALLOC on architectures that support it. Define a new alloc
function, startup_alloc(), that is used for single page allocations prior
   to the VM starting up.  If it is used after the VM startups up, it
   replaces the zone's allocf pointer with either page_alloc() or
   uma_small_alloc() where appropriate.

Pointy hat to:	me
Tested by:	phk/amd64, me/x86
2003-09-21 07:39:16 +00:00
Peter Wemm
c43ab0b5a1 Bad Jeffr! No cookie!
Temporarily disable the UMA_MD_SMALL_ALLOC stuff since recent commits
break sparc64, amd64, ia64 and alpha.  It appears only i386 and maybe
powerpc were not broken.
2003-09-20 23:35:33 +00:00
Jeff Roberson
9643769a3a - Remove the working-set algorithm. Instead, use the per cpu buckets as the
working set cache.  This has several advantages.  Firstly, we never touch
   the per cpu queues now in the timeout handler.  This removes one more
   reason for having per cpu locks.  Secondly, it reduces the size of the zone
   by 8 bytes, bringing it under 200 bytes for a single proc x86 box.  This
   tidies up other logic as well.
 - The 'destroy' flag no longer needs to be passed to zone_drain() since it
   always frees everything in the zone's slabs.
 - cache_drain() is now only called from zone_dtor() and so it destroys by
   default.  It also does not need the destroy parameter now.
2003-09-19 23:27:46 +00:00
Jeff Roberson
3e0cab95c0 - Remove the cache colorization code. We can't use it due to all of the
broken consumers of the malloc interface who assume that the allocated
   address will be an even multiple of the size.
 - Remove disabled time delay code on uma_reclaim().  The comment there said
   it all.  It was not an effective strategy and it should not be left in
   #if 0'd for all eternity.
2003-09-19 23:04:44 +00:00
Jeff Roberson
64f051e99a - There are an endless stream of style(9) errors in this file. Fix a few.
Also catch some spelling errors.
2003-09-19 22:31:45 +00:00
Jeff Roberson
44eca34adb - Don't inspect the zone in page_alloc(). It may be NULL.
- Don't cache more items than the zone would like in uma_zalloc_bucket().
2003-09-19 09:22:04 +00:00
Jeff Roberson
45bf76f0f8 - Move the logic for dealing with the uma_boot_pages cache into the
page_alloc() function from the slab_zalloc() function.  This allows us
   to unconditionally call uz_allocf().
 - In page_alloc() cleanup the boot_pages logic some.  Previously memory from
   this cache that was not used by the time the system started was left in
   the cache and never used.  Typically this wasn't more than a few pages,
   but now we will use this cache so long as memory is available.
2003-09-19 08:53:33 +00:00
Jeff Roberson
b60f5b794e - Fix the silly flag situation in UMA. Remove redundant ZFLAG/ZONE flags
by accepting the user supplied flags directly.  Previously this was not
   done so that flags for the same field would not be defined in two
   different files.  Add comments in each header instructing future
   developers on how now to shoot their feet.
 - Fix a test for !OFFPAGE which should have been a test for HASH.  This would
   have caused a panic if we had ever destructed a malloc zone.  This also
   opens up the possibility that other zones could use the vsetobj() method
   rather than a hash.
2003-09-19 08:37:44 +00:00
Jeff Roberson
961647dfd0 - Don't abuse M_DEVBUF, define a tag for UMA hashes. 2003-09-19 07:23:50 +00:00
Jeff Roberson
b983089a05 - Eliminate a pair of unnecessary variables. 2003-09-19 06:41:06 +00:00
Jeff Roberson
cae33c1429 - Initialize a pool of bucket zones so that we waste less space on zones that
don't cache as many items.
 - Introduce the bucket_alloc(), bucket_free() functions to wrap bucket
   allocation.  These functions select the appropriate bucket zone to
   allocate from or free to.
 - Rename ub_ptr to ub_cnt to reflect a change in its use.  ub_cnt now reflects
   the count of free items in the bucket.  This gets rid of many unnatural
   subtractions by 1 throughout the code.
 - Add ub_entries which reflects the number of entries possibly held in a
   bucket.
2003-09-19 06:26:45 +00:00
Bosko Milekic
1c35e213f1 In sysctl_vm_zone, do not calculate per-cpu cache stats on
UMA_ZFLAG_INTERNAL zones at all.  Apparently, Wilko's alpha
was crashing while entering multi-user because, I think, we
were calculating the garbage cachefree for pcpu caches that
essentially don't exist for at least the 'zones' zone and it so
happened that we were reading from an unmapped location.

Confirmed to fix crash: wilko
Helped debug: wilko, gallatin
2003-08-20 18:22:06 +00:00
Bosko Milekic
20e8e865bd - When deciding whether to init the zone with small_init or large_init,
compare the zone element size (+1 for the byte of linkage) against
  UMA_SLAB_SIZE - sizeof(struct uma_slab), and not just UMA_SLAB_SIZE.
  Add a KASSERT in zone_small_init to make sure that the computed
  ipers (items per slab) for the zone is not zero, despite the addition
  of the check, just to be sure (this part submitted by: silby)

- UMA_ZONE_VM used to imply BUCKETCACHE.  Now it implies
  CACHEONLY instead.  CACHEONLY is like BUCKETCACHE in the
  case of bucket allocations, but in addition to that also ensures that
  we don't setup the zone with OFFPAGE slab headers allocated from the
  slabzone.  This means that we're not allowed to have a UMA_ZONE_VM
  zone initialized for large items (zone_large_init) because it would
  require the slab headers to be allocated from slabzone, and hence
  kmem_map.  Some of the zones init'd with UMA_ZONE_VM are so init'd
  before kmem_map is suballoc'd from kernel_map, which is why this
  change is necessary.
2003-08-11 19:39:45 +00:00
Alan Cox
b245ac95cf Revise obj_alloc(). Most notably, use the object's lock to prevent two
concurrent invocations from acquiring the same address(es).  Also, in case
of an incomplete allocation, free any allocated pages.

In collaboration with:	tegge
2003-08-03 06:08:48 +00:00
Bosko Milekic
48bf87258f When INVARIANTS is on and we're in uma_zalloc_free(), we need to make
sure that uma_dbg_free() is called if we're about to call
uma_zfree_internal() but we're asking it to skip the dtor and
uma_dbg_free() call itself.  So, if we're about to call
uma_zfree_internal() from uma_zfree_arg() and skip == 1, call
uma_dbg_free() ourselves.
2003-08-02 22:40:27 +00:00
Bosko Milekic
174ab4501e Only free the pcpu cache buckets if they are non-NULL.
Crashed this person's machine: harti
Pointy-hat to: me
2003-08-01 17:42:27 +00:00
Bosko Milekic
d56368d779 Plug a race and a leak in UMA.
1) The race has to do with zone destruction.  From the zone destructor we
   would lock the zone, set the working set size to 0, then unlock the zone,
   drain it, and then free the structure.  Within the window following the
   working-set-size set to 0 and unlocking of the zone and the point where
   in zone_drain we re-acquire the zone lock, the uma timer routine could
   have fired off and changed the working set size to something non-zero,
   thereby potentially preventing us from completely freeing slabs before
   destroying the zone (and thus leaking them).

2) The leak has to do with zone destruction as well.  When destroying a
   zone we would take care to free all the buckets cached in the zone, but
   although we would drain the pcpu cache buckets, we would not free them.
   This resulted in leaking a couple of bucket structures (512 bytes each)
   per cpu on SMP during zone destruction.

While I'm here, also silence GCC warnings by turning uma_slab_alloc()
from inline to real function.  It's too big to be an inline.

Reviewed by: JeffR
2003-07-30 18:55:15 +00:00