- Remove all instances of the mallochash.
- Stash the slab pointer in the vm page's object pointer when allocating from
the kmem_obj.
- Use the overloaded object pointer to find slabs for malloced memory.
mallochash. Mallochash is going to go away as soon as I introduce the
kfree/kmalloc api and partially overhaul the malloc wrapper. This can't happen
until all users of the malloc api that expect memory to be aligned on the size
of the allocation are fixed.
Implement the following checks on freed memory in the bucket path:
- Slab membership
- Alignment
- Duplicate free
This previously was only done if we skipped the buckets. This code will slow
down INVARIANTS a bit, but it is smp safe. The checks were moved out of the
normal path and into hooks supplied in uma_dbg.
0xdeadc0de and then check for it just before memory is handed off as part
of a new request. This will catch any post free/pre alloc modification of
memory, as well as introduce errors for anything that tries to dereference
it as a pointer.
This code takes the form of special init, fini, ctor and dtor routines that
are specificly used by malloc. It is in a seperate file because additional
debugging aids will want to live here as well.
malloc profiling) also modified the set of pre-defined buckets for the
memory allocator. For reasons unknown to me, this resulted in extensive
memory corruption in the kernel, in particular on SMP boxes, so I'm
committing this work-around until Jeff gets a chance to debug it
properly. David Wolfskill pointed me at this commit as the one that
might be a problem; I've been running this code on two dual-processor
burn-in boxes for about 12 hours now, and the rate of panics due to
memory corruption has dropped to zero (from one every five minutes).
Hopefully not treading on the toes of: jeff
information related to bucket size effeciency. Three things are printed on
each row:
Size is the size the user actually asked for rounded to 16 bytes.
Requests is the number of times this size was asked for.
Real Size is the size we actually handed out.
At the end the total memory used and total waste is displayed. Currently my
system displays about 33% wasted memory.
The intent of this code is to gather statistics for tuning the malloc bucket
sizes. It is not intended to be run with INVARIANTS and it is not entirely
mp safe. It can be enabled via 'options MALLOC_PROFILE' which was commited
earlier.
Updated the kmemzones logic such that the ks_size bitmap can be used as an
index into it to report the size of the zone used.
Create the kern.malloc sysctl which replaces the kvm mechanism to report
similar data. This will provide an easy place for statistics aggregation if
malloc_type statistics become per cpu data.
Add some code ifdef'd under MALLOC_PROFILING to facilitate a tool for sizing
the malloc buckets.
most cases NULL is passed, but in some cases such as network driver locks
(which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used.
Tested on: i386, alpha, sparc64
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.
Sorry john! (your next MFC will be a doosie!)
Reviewed by: peter@freebsd.org, dillon@freebsd.org
X-MFC after: ha ha ha ha
- Callers of asleep() and await() have been converted to calling tsleep().
The only caller outside of M_ASLEEP was the ata driver, which called both
asleep() and await() with spl-raised, so there was no need for the
asleep() and await() pair. M_ASLEEP was unused.
Reviewed by: jasone, peter
order to avoid namespace collision with subr_mchain.c's mb_init(). This
wasn't "fatal" as the mbuf initialization routine mb_init() was local to
subr_mbuf.c which in turn didn't pull in subr_mchain.c's mb_init()
declaration, but it should deffinately be changed now before it creates
headache.
introduce a modified allocation mechanism for mbufs and mbuf clusters; one
which can scale under SMP and which offers the possibility of resource
reclamation to be implemented in the future. Notable advantages:
o Reduce contention for SMP by offering per-CPU pools and locks.
o Better use of data cache due to per-CPU pools.
o Much less code cache pollution due to excessively large allocation macros.
o Framework for `grouping' objects from same page together so as to be able
to possibly free wired-down pages back to the system if they are no longer
needed by the network stacks.
Additional things changed with this addition:
- Moved some mbuf specific declarations and initializations from
sys/conf/param.c into mbuf-specific code where they belong.
- m_getclr() has been renamed to m_get_clrd() because the old name is really
confusing. m_getclr() HAS been preserved though and is defined to the new
name. No tree sweep has been done "to change the interface," as the old
name will continue to be supported and is not depracated. The change was
merely done because m_getclr() sounds too much like "m_get a cluster."
- TEMPORARILY disabled mbtypes statistics displaying in netstat(1) and
systat(1) (see TODO below).
- Fixed systat(1) to display number of "free mbufs" based on new per-CPU
stat structures.
- Fixed netstat(1) to display new per-CPU stats based on sysctl-exported
per-CPU stat structures. All infos are fetched via sysctl.
TODO (in order of priority):
- Re-enable mbtypes statistics in both netstat(1) and systat(1) after
introducing an SMP friendly way to collect the mbtypes stats under the
already introduced per-CPU locks (i.e. hopefully don't use atomic() - it
seems too costly for a mere stat update, especially when other locks are
already present).
- Optionally have systat(1) display not only "total free mbufs" but also
"total free mbufs per CPU pool."
- Fix minor length-fetching issues in netstat(1) related to recently
re-enabled option to read mbuf stats from a core file.
- Move reference counters at least for mbuf clusters into an unused portion
of the cluster itself, to save space and need to allocate a counter.
- Look into introducing resource freeing possibly from a kproc.
Reviewed by (in parts): jlemon, jake, silby, terry
Tested by: jlemon (Intel & Alpha), mjacob (Intel & Alpha)
Preliminary performance measurements: jlemon (and me, obviously)
URL: http://people.freebsd.org/~bmilekic/mb_alloc/
around, use a common function for looking up and extracting the tunables
from the kernel environment. This saves duplicating the same function
over and over again. This way typically has an overhead of 8 bytes + the
path string, versus about 26 bytes + the path string.
other "system" header files.
Also help the deprecation of lockmgr.h by making it a sub-include of
sys/lock.h and removing sys/lockmgr.h form kernel .c files.
Sort sys/*.h includes where possible in affected files.
OK'ed by: bde (with reservations)
we also reserve _adequate_ space for the mb_map submap; i.e. we need
space for nmbclusters, nmbufs, _and_ nmbcnt. Furthermore, we need to
rounddown, and not roundup, so that we are consistent.
Pointed out by: bde
mtx_enter(lock, type) becomes:
mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks)
mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized)
similarily, for releasing a lock, we now have:
mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN.
We change the caller interface for the two different types of locks
because the semantics are entirely different for each case, and this
makes it explicitly clear and, at the same time, it rids us of the
extra `type' argument.
The enter->lock and exit->unlock change has been made with the idea
that we're "locking data" and not "entering locked code" in mind.
Further, remove all additional "flags" previously passed to the
lock acquire/release routines with the exception of two:
MTX_QUIET and MTX_NOSWITCH
The functionality of these flags is preserved and they can be passed
to the lock/unlock routines by calling the corresponding wrappers:
mtx_{lock, unlock}_flags(lock, flag(s)) and
mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN
locks, respectively.
Re-inline some lock acq/rel code; in the sleep lock case, we only
inline the _obtain_lock()s in order to ensure that the inlined code
fits into a cache line. In the spin lock case, we inline recursion and
actually only perform a function call if we need to spin. This change
has been made with the idea that we generally tend to avoid spin locks
and that also the spin locks that we do have and are heavily used
(i.e. sched_lock) do recurse, and therefore in an effort to reduce
function call overhead for some architectures (such as alpha), we
inline recursion for this case.
Create a new malloc type for the witness code and retire from using
the M_DEV type. The new type is called M_WITNESS and is only declared
if WITNESS is enabled.
Begin cleaning up some machdep/mutex.h code - specifically updated the
"optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN
and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently
need those.
Finally, caught up to the interface changes in all sys code.
Contributors: jake, jhb, jasone (in no particular order)
With this flag set malloc() will panic if memory allocation failed.
This usable only in critical places where failed allocation is fatal.
Reviewed by: peter
interrupt threads to run with it always >= 1, so that malloc can
detect M_WAITOK from "interrupt" context. This is also necessary
in order to context switch from sched_ithd() directly.
Reviewed By: peter
initialization until after malloc() is safe to call, then iterate through
all mutexes and complete their initialization.
This change is necessary in order to avoid some circular bootstrapping
dependencies.
Instead of:
foo = malloc(sizeof(foo), M_WAIT);
bzero(foo, sizeof(foo));
You can now (and please do) use:
foo = malloc(sizeof(foo), M_WAIT | M_ZERO);
In the future this will enable us to do idle-time pre-zeroing of
malloc-space.
On unload, remove references from freelist to memory type defined by module.
Print a warning if module defines and allocate its own memory type, but
didn't free it all on unload.
Reviewed by: peter
Order the SYSINIT() for MALLOC_DEFINE() correctly so that malloc()
doesn't have to waste time initializing itself. The
(SI_SUB_KMEM, SI_ORDER_ANY) order was shared with syscons' SYSINIT()
for scmeminit(), and scmeminit() calls malloc(), so malloc()
initialization was not always complete on the first call to malloc().
kern/kern_malloc.c:
- Removed self-initialization in malloc().
- Removed half-baked sanity check in free(). Trust MALLOC_DEFINE().
run out of KVM through a mmap()/fork() bomb that allocates hundreds
of thousands of vm_map_entry structures.
Add panic to make null-pointer dereference crash a little more verbose.
Add a new sysctl, vm.max_proc_mmap, which specifies the maximum number
of mmap()'d spaces (discrete vm_map_entry's in the process). The value
defaults to around 9000 for a 128MB machine. The test is scaled for the
number of processes sharing a vmspace (aka linux threads). Setting
the value to 0 disables the feature.
PR: kern/16573
Approved by: jkh
malloc region (kmem_map) to be wrong and semi-random on systems with more
than 1GB of RAM. This is not a complete fix, but is sufficient for
machines with 4GB or less of memory. A complete fix will require some
changes to the getenv stuff so that 64bit values can be passed around.
NOT FIXED: machines with more than 4GB of RAM (e.g. some large Alphas)
since we're still using ints to hold some of the values.
Reviewed by: bde