Commit Graph

210 Commits

Author SHA1 Message Date
Attilio Rao
2feb50bf7d Revert VMCNT_* operations introduction.
Probabilly, a general approach is not the better solution here, so we should
solve the sched_lock protection problems separately.

Requested by: alc
Approved by: jeff (mentor)
2007-05-31 22:52:15 +00:00
Robert Watson
e4e80aa713 Remove #if 0'd check for 0-size allocations, which if enabled, called
kdb_enter().
2007-05-27 13:13:46 +00:00
Jeff Roberson
222d01951f - define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating
vmcnts.  This can be used to abstract away pcpu details but also changes
   to use atomics for all counters now.  This means sched lock is no longer
   responsible for protecting counts in the switch routines.

Contributed by:		Attilio Rao <attilio@FreeBSD.org>
2007-05-18 07:10:50 +00:00
Stephane E. Potvin
0e5179e441 Add support for specifying a minimal size for vm.kmem_size in the loader via
vm.kmem_size_min. Useful when using ZFS to make sure that vm.kmem size will
be at least 256mb (for example) without forcing a particular value via vm.kmem_size.

Approved by: njl (mentor)
Reviewed by: alc
2007-04-21 01:14:48 +00:00
Robert Watson
24076d138e Increase usefulness of "show malloc" by moving from displaying the basic
counters of allocs/frees/use for each malloc type to calculating InUse,
MemUse, and Requests as displayed by the userspace vmstat -m.  This is
more useful when debugging malloc(9)-related memory leaks, where the
count of allocs/frees may not usefully reflect that current memory
allocation (i.e., when highly variable size allocations occur with the
same malloc type, such as with contigmalloc).

MFC after:			3 days
Limitations observed by:	scottl
2006-10-26 10:17:13 +00:00
Robert Watson
4b19d603c4 Remove old kern.malloc sysctl, which generated a text representation of
the kernel malloc(9) state for vmstat -m.  libmemstat is now used to
generate a machine-readable version which is converged by vmstat -m
into a human-readable version.

Not for MFC.
2006-07-23 19:55:41 +00:00
Robert Watson
0ce3f16dbb Expand comments for malloc(9) to better describe the design and
statistics / memory types model.
2006-07-23 19:51:39 +00:00
Paul Saab
45d48bdad5 Fix bug in malloc_uninit():
Releasing items from the mt_zone can not be done by a simple
uma_zfree() call since mt_zone is allocated with the UMA_ZONE_MALLOC
flag. Use uma_zfree_arg instead and supply the slab.

This bug caused panics in low memory situations on unloading kernel
modules containing MALLOC_DEFINE(..) statements.

Submitted by:	ups
2006-03-03 22:36:52 +00:00
Pawel Jakub Dawidek
847a2a1716 Add buffer corruption protection (RedZone) for kernel's malloc(9).
It detects both: buffer underflows and buffer overflows bugs at runtime
(on free(9) and realloc(9)) and prints backtraces from where memory was
allocated and from where it was freed.

Tested by:	kris
2006-01-31 11:09:21 +00:00
Pawel Jakub Dawidek
d362c40d3a Improve memguard a bit:
- Provide tunable vm.memguard.desc, so one can specify memory type without
  changing the code and recompiling the kernel.
- Allow to use memguard for kernel modules by providing sysctl
  vm.memguard.desc, which can be changed to short description of memory
  type before module is loaded.
- Move as much memguard code as possible to memguard.c.
- Add sysctl node vm.memguard. and move memguard-specific sysctl there.
- Add malloc_desc2type() function for finding memory type based on its
  short description (ks_shortdesc field).
- Memory type can be changed (via vm.memguard.desc sysctl) only if it
  doesn't exist (will be loaded later) or when no memory is allocated yet.
  If there is allocated memory for the given memory type, return EBUSY.
- Implement two ways of memory types comparsion and make safer/slower the
  default.
2005-12-30 11:45:07 +00:00
Pawel Jakub Dawidek
619f284195 In realloc(9), determine size of the original block based on
UMA_SLAB_MALLOC flag.
In some circumstances (I observed it when I was doing a lot of reallocs)
UMA_SLAB_MALLOC can be set even if us_keg != NULL.

If this is the case we have wonderful, silent data corruption, because less
data is copied to the newly allocated region than should be.

I'm not sure when this bug was introduced, it could be there undetected
for years now, as we don't have a lot of realloc(9) consumers and it was
hard to reproduce it...
...but what I know for sure, is that I don't want to know who introduce
the bug:) It took me two/three days to track it down (of course most of
the time I was looking for the bug in my own code).
2005-12-28 01:53:13 +00:00
Pawel Jakub Dawidek
2a143d5bf5 Detect memory leaks when memory type is being destroyed.
This is very helpful for detecting kernel modules memory leaks on unload.

Discussed and reviewed by:	rwatson
2005-11-03 13:48:59 +00:00
Robert Watson
64a266f9e8 Change format string for u_int64_t to %ju from %llu, in order to use the
correct format string on 64-bit systems.

Pointed out by:	pjd
2005-10-20 21:28:31 +00:00
Robert Watson
909ed16c2b Add a "show malloc" command to DDB, which prints out the current stats for
available kernel malloc types.  Quite useful for post-mortem debugging of
memory leaks without a dump device configured on a panicked box.

MFC after:	2 weeks
2005-10-20 17:41:47 +00:00
Ruslan Ermilov
2319835713 Long overdue, keep up with mbuf.h,v 1.148. 2005-08-02 20:03:23 +00:00
Pawel Jakub Dawidek
73864adbd4 Fix the way how "InUse" column in 'vmstat -m' output works:
- increase number of allocations count only on successfull malloc(9),
  so it doesn't confuse people;
- because we need to check if 'size > 0', hide 'mtsp->mts_memalloced += size;'
  under the check as well, as for size=0 it is of course a no-op;
- avoid critical_enter()/critical_exit() in case of failure in
  malloc_type_allocated() as there will be nothing to do.

OK'ed by:	rwatson
MFC after:	2 days
2005-07-27 23:17:31 +00:00
Robert Watson
4f8721d2a9 Correct build on 64-bit: cast u_int64_t to (unsigned long long) before
printfing as (unsigned long long).  32-bit build on i386 didn't notice
this.  Whoops.

Reported by:	arved
Tested by:	sledge
2005-07-14 15:21:18 +00:00
Robert Watson
cd814b2692 Introduce a new sysctl, kern.malloc_stats, which exports kernel malloc
statistics via a binary structure stream:

- Add structure 'malloc_type_stream_header', which defines a stream
  version, definition of MAXCPUS used in the stream, and a number of
  malloc_type records in the stream.

- Add structure 'malloc_type_header', which defines the name of the
  malloc type being reported on.

- When the sysctl is queried, return a stream header, followed by a
  series of type descriptions, each consisting of a type header
  followed by a series of MAXCPUS malloc_type_stats structures holding
  per-CPU allocation information.  Typical values of MAXCPUS will be 1
  (UP compiled kernel) and 16 (SMP compiled kernel).

This query mechanism allows user space monitoring tools to extract
memory allocation statistics in a machine-readable form, and to do so
at a per-CPU granularity, allowing monitoring of allocation patterns
across CPUs in order to better understand the distribution of work and
memory flow over multiple CPUs.

While here:

- Bump statistics width to uint64_t, and hard code using fixed-width
  type in order to be more sure about structure layout in the stream.
  We allocate and free a lot of memory.

- Add kmemcount, a counter of the number of registered malloc types,
  in order to avoid excessive manual counting of types.  Export via a
  new sysctl to allow user-space code to better size buffers.

- De-XXX comment on no longer maintaining the high watermark in old
  sysctl monitoring code.

A follow-up commit of libmemstat(3), a library to monitor kernel memory
allocation, will occur in the next few days.  Likewise, similar changes
to UMA.
2005-07-14 11:52:06 +00:00
Ken Smith
c0cac8dc20 Remove a variable that became unused as a result of changes made
in v1.139.  This was only exposed if MALLOC_PROFILE was defined.

Submitted by:	Gary Jennejohn
Pointy hat:	rwatson
Approved by:	re (scottl)
2005-06-16 16:01:46 +00:00
Joseph Koshy
8c61b21927 Fix typo.
Reviewed by:	rwatson, sam
2005-06-10 18:06:59 +00:00
Robert Watson
63a7e0a3f9 Kernel malloc layers malloc_type allocation over one of two underlying
allocators: a set of power-of-two UMA zones for small allocations, and the
VM page allocator for large allocations.  In order to maintain unified
statistics for specific malloc types, kernel malloc maintains a separate
per-type statistics pool, which can be monitored using vmstat -m.  Prior
to this commit, each pool of per-type statistics was protected using a
per-type mutex associated with the malloc type.

This change modifies kernel malloc to maintain per-CPU statistics pools
for each malloc type, and protects writing those statistics using critical
sections.  It also moves to unsynchronized reads of per-CPU statistics
when generating coalesced statistics.  To do this, several changes are
implemented:

- In the previous world order, the statistics memory was allocated by
  the owner of the malloc type structure, allocated statically using
  MALLOC_DEFINE().  This embedded the definition of the malloc_type
  structure into all kernel modules.  Move to a model in which a pointer
  within struct malloc_type points at a UMA-allocated
  malloc_type_internal data structure owned and maintained by
  kern_malloc.c, and not part of the exported ABI/API to the rest of
  the kernel.  For the purposes of easing a possible MFC, re-use an
  existing pointer in 'struct malloc_type', and maintain the current
  malloc_type structure size, as well as layout with respect to the
  fields reused outside of the malloc subsystem (such as ks_shortdesc).
  There are several unused fields as a result of no longer requiring
  the mutex in malloc_type.

- Struct malloc_type_internal contains an array of malloc_type_stats,
  of size MAXCPU.  The structure defined above avoids hard-coding a
  kernel compile-time value of MAXCPU into kernel modules that interact
  with malloc.

- When accessing per-cpu statistics for a malloc type, surround read -
  modify - update requests with critical_enter()/critical_exit() in
  order to avoid races during write.  The per-CPU fields are written
  only from the CPU that owns them.

- Per-CPU stats now maintained "allocated" and "freed" counters for
  number of allocations/frees and bytes allocated/freed, since there is
  no longer a coherent global notion of the totals.  When coalescing
  malloc stats, accept a slight race between reading stats across CPUs,
  and avoid showing the user a negative allocation count for the type
  in the event of a race.  The global high watermark is no longer
  maintained for a malloc type, as there is no global notion of the
  number of allocations.

- While tearing up the sysctl() path, also switch to using sbufs.  The
  current "export as text" sysctl format is retained with the same
  syntax.  We may want to change this in the future to export more
  per-CPU information, such as how allocations and frees are balanced
  across CPUs.

This change results in a substantial speedup of kernel malloc and free
paths on SMP, as critical sections (where usable) out-perform mutexes
due to avoiding atomic/bus-locked operations.  There is also a minor
improvement on UP due to the slightly lower cost of critical sections
there.  The cost of the change to this approach is the loss of a
continuous notion of total allocations that can be exploited to track
per-type high watermarks, as well as increased complexity when
monitoring statistics.

Due to carefully avoiding changing the ABI, as well as hardening the ABI
against future changes, it is not necessary to recompile kernel modules
for this change.  However, MFC'ing this change to RELENG_5 will require
also MFC'ing optimizations for soft critical sections, which may modify
exposed kernel ABIs.  The internal malloc API is changed, and
modifications to vmstat in order to restore "vmstat -m" on core dumps will
follow shortly.

Several improvements from:		bde
Statistics approach discussed with:	ups
Tested by:				scottl, others
2005-05-29 13:38:07 +00:00
Robert Watson
87efd4d58a Consistently style function declarations in kern_malloc.c.
MFC after:	3 days
2005-04-12 23:54:34 +00:00
Bosko Milekic
e4eb384b47 Bring in MemGuard, a very simple and small replacement allocator
designed to help detect tamper-after-free scenarios, a problem more
and more common and likely with multithreaded kernels where race
conditions are more prevalent.

Currently MemGuard can only take over malloc()/realloc()/free() for
particular (a) malloc type(s) and the code brought in with this
change manually instruments it to take over M_SUBPROC allocations
as an example.  If you are planning to use it, for now you must:

	1) Put "options DEBUG_MEMGUARD" in your kernel config.
	2) Edit src/sys/kern/kern_malloc.c manually, look for
	   "XXX CHANGEME" and replace the M_SUBPROC comparison with
	   the appropriate malloc type (this might require additional
	   but small/simple code modification if, say, the malloc type
	   is declared out of scope).
	3) Build and install your kernel.  Tune vm.memguard_divisor
	   boot-time tunable which is used to scale how much of kmem_map
	   you want to allott for MemGuard's use.  The default is 10,
	   so kmem_size/10.

ToDo:
	1) Bring in a memguard(9) man page.
	2) Better instrumentation (e.g., boot-time) of MemGuard taking
	   over malloc types.
	3) Teach UMA about MemGuard to allow MemGuard to override zone
	   allocations too.
	4) Improve MemGuard if necessary.

This work is partly based on some old patches from Ian Dowse.
2005-01-21 18:09:17 +00:00
Warner Losh
9454b2d864 /* -> /*- for copyright notices, minor format tweaks as necessary 2005-01-06 23:35:40 +00:00
Dag-Erling Smørgrav
479439b4fe Turn VM_KMEM_SIZE_MAX and VM_KMEM_SIZE_SCALE into tunables.
MFC after:	3 days
2004-09-29 14:21:40 +00:00
Brian Feldman
4362fada8f Reimplement contigmalloc(9) with an algorithm which stands a greatly-
improved chance of working despite pressure from running programs.
Instead of trying to throw a bunch of pages out to swap and hope for
the best, only a range that can potentially fulfill contigmalloc(9)'s
request will have its contents paged out (potentially, not forcibly)
at a time.

The new contigmalloc operation still operates in three passes, but it
could potentially be tuned to more or less.  The first pass only looks
at pages in the cache and free pages, so they would be thrown out
without having to block.  If this is not enough, the subsequent passes
page out any unwired memory.  To combat memory pressure refragmenting
the section of memory being laundered, each page is removed from the
systems' free memory queue once it has been freed so that blocking
later doesn't cause the memory laundered so far to get reallocated.

The page-out operations are now blocking, as it would make little sense
to try to push out a page, then get its status immediately afterward
to remove it from the available free pages queue, if it's unlikely to
have been freed.  Another change is that if KVA allocation fails, the
allocated memory segment will be freed and not leaked.

There is a sysctl/tunable, defaulting to on, which causes the old
contigmalloc() algorithm to be used.  Nonetheless, I have been using
vm.old_contigmalloc=0 for over a month.  It is safe to switch at
run-time to see the difference it makes.

A new interface has been used which does not require mapping the
allocated pages into KVA: vm_page.h functions vm_page_alloc_contig()
and vm_page_release_contig().  These are what vm.old_contigmalloc=0
uses internally, so the sysctl/tunable does not affect their operation.

When using the contigmalloc(9) and contigfree(9) interfaces, memory
is now tracked with malloc(9) stats.  Several functions have been
exported from kern_malloc.c to allow other subsystems to use these
statistics, as well.  This invalidates the BUGS section of the
contigmalloc(9) manpage.
2004-07-19 06:21:27 +00:00
Marcel Moolenaar
2d50560abc Update for the KDB framework:
o  Make debugging code conditional upon KDB instead of DDB.
o  Call kdb_enter() instead of Debugger().
o  Call kdb_backtrace() instead of db_print_backtrace() or backtrace().

kern_mutex.c:
o  Replace checks for db_active with checks for kdb_active and make
   them unconditional.

kern_shutdown.c:
o  s/DDB_UNATTENDED/KDB_UNATTENDED/g
o  s/DDB_TRACE/KDB_TRACE/g
o  Save the TID of the thread doing the kernel dump so the debugger
   knows which thread to select as the current when debugging the
   kernel core file.
o  Clear kdb_active instead of db_active and do so unconditionally.
o  Remove backtrace() implementation.

kern_synch.c:
o  Call kdb_reenter() instead of db_error().
2004-07-10 21:36:01 +00:00
Bosko Milekic
099a0e588c Bring in mbuma to replace mballoc.
mbuma is an Mbuf & Cluster allocator built on top of a number of
extensions to the UMA framework, all included herein.

Extensions to UMA worth noting:
  - Better layering between slab <-> zone caches; introduce
    Keg structure which splits off slab cache away from the
    zone structure and allows multiple zones to be stacked
    on top of a single Keg (single type of slab cache);
    perhaps we should look into defining a subset API on
    top of the Keg for special use by malloc(9),
    for example.
  - UMA_ZONE_REFCNT zones can now be added, and reference
    counters automagically allocated for them within the end
    of the associated slab structures.  uma_find_refcnt()
    does a kextract to fetch the slab struct reference from
    the underlying page, and lookup the corresponding refcnt.

mbuma things worth noting:
  - integrates mbuf & cluster allocations with extended UMA
    and provides caches for commonly-allocated items; defines
    several zones (two primary, one secondary) and two kegs.
  - change up certain code paths that always used to do:
    m_get() + m_clget() to instead just use m_getcl() and
    try to take advantage of the newly defined secondary
    Packet zone.
  - netstat(1) and systat(1) quickly hacked up to do basic
    stat reporting but additional stats work needs to be
    done once some other details within UMA have been taken
    care of and it becomes clearer to how stats will work
    within the modified framework.

From the user perspective, one implication is that the
NMBCLUSTERS compile-time option is no longer used.  The
maximum number of clusters is still capped off according
to maxusers, but it can be made unlimited by setting
the kern.ipc.nmbclusters boot-time tunable to zero.
Work should be done to write an appropriate sysctl
handler allowing dynamic tuning of kern.ipc.nmbclusters
at runtime.

Additional things worth noting/known issues (READ):
   - One report of 'ips' (ServeRAID) driver acting really
     slow in conjunction with mbuma.  Need more data.
     Latest report is that ips is equally sucking with
     and without mbuma.
   - Giant leak in NFS code sometimes occurs, can't
     reproduce but currently analyzing; brueffer is
     able to reproduce but THIS IS NOT an mbuma-specific
     problem and currently occurs even WITHOUT mbuma.
   - Issues in network locking: there is at least one
     code path in the rip code where one or more locks
     are acquired and we end up in m_prepend() with
     M_WAITOK, which causes WITNESS to whine from within
     UMA.  Current temporary solution: force all UMA
     allocations to be M_NOWAIT from within UMA for now
     to avoid deadlocks unless WITNESS is defined and we
     can determine with certainty that we're not holding
     any locks when we're M_WAITOK.
   - I've seen at least one weird socketbuffer empty-but-
     mbuf-still-attached panic.  I don't believe this
     to be related to mbuma but please keep your eyes
     open, turn on debugging, and capture crash dumps.

This change removes more code than it adds.

A paper is available detailing the change and considering
various performance issues, it was presented at BSDCan2004:
http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf
Please read the paper for Future Work and implementation
details, as well as credits.

Testing and Debugging:
    rwatson,
    brueffer,
    Ketrien I. Saihr-Kesenchedra,
    ...
Reviewed by: Lots of people (for different parts)
2004-05-31 21:46:06 +00:00
Warner Losh
7f8a436ff2 Remove advertising clause from University of California Regent's license,
per letter dated July 22, 1999.

Approved by: core
2004-04-05 21:03:37 +00:00
Dag-Erling Smørgrav
84344f9fbf Rename the kern.vm.kmem.size tunable to the more logical vm.kmem_size. To
assure backward compatibility (conditional on !BURN_BRIDGES), look it up
by its old name first, and log a warning (but accept the setting) if it
was found.  If both the old and new name are defined, the new name takes
precedence.

Also export vm.kmem_size as a read-only sysctl variable; I find it hard to
tune a parameter when I don't know its default value, especially when that
default value is computed at boot time.
2004-01-27 15:59:38 +00:00
Jeff Roberson
9fb535dec5 - Only use UMA to cache malloc requests up to PAGE_SIZE. Values larger than
this are requested very infrequently and waste memory when we cache
   spares.
2003-09-19 04:39:08 +00:00
Poul-Henning Kamp
68f2d20b70 Revert stuff which accidentally ended up in the previous commit. 2003-07-22 10:36:36 +00:00
Poul-Henning Kamp
55d1d7034f Don't attempt to inline large functions mb_alloc() and mb_free(),
it more than doubles the text size of this file.

GCC has wisely ignored us on this previously
2003-07-22 10:24:41 +00:00
Mike Silbersack
347194c172 Add init_param3() to subr_param. This function is called
immediately after the kernel map has been sized, and is
the optimal place for the autosizing of memory allocations
which occur within the kernel map to occur.

Suggested by:	bde
2003-07-11 00:01:03 +00:00
Paul Saab
1795d0cdec Don't overflow when calculating vm_kmem_size. This fixes kmem_map
too small panics on PAE machines which have odd > 4GB sizes (4.5 gig
would render a 20MB of KVA for kmem_map instead of 200MB).

Submitted by:	John Cagle <john.cagle@hp.com>, jeff
Reviewed by:	jeff, peter, scottl, lots of USENIX folks
2003-06-11 05:18:59 +00:00
David E. O'Brien
677b542ea2 Use __FBSDID(). 2003-06-11 00:56:59 +00:00
Poul-Henning Kamp
1282e9acea Don't pass NULL pointer to memset if we are compiled with DIAGNOSTIC
Approved by:	re/rwatson
2003-05-12 05:09:56 +00:00
Poul-Henning Kamp
8cb72d6174 Add two KASSERTS which trigger if free(9) would drag the "memuse" statistic
for a malloc bucket under zero.  This typically happens if you malloc(9)
from one bucket and free to another.
2003-05-05 08:32:53 +00:00
Poul-Henning Kamp
3f6ee876c1 Update the "last malloc failure timestamp" also for simulated
malloc errors.
2003-04-25 21:49:24 +00:00
Robert Watson
f2538508f6 Permit debug.malloc.failure_rate to be specified using a tunable so
that the feature can be enabled during the boot process.  Note the
continued limitation that FreeBSD fails so rapidly with this setting
enabled that it's hard to narrow down particular failures for
correction; we really need per-malloc type failure rates.
2003-03-26 20:44:29 +00:00
Robert Watson
eae870cdb4 Add a new kernel option, MALLOC_MAKE_FAILURES, which compiles
in a debugging feature causing M_NOWAIT allocations to fail at
a specified rate.  This can be useful for detecting poor
handling of M_NOWAIT: the most frequent problems I've bumped
into are unconditional deference of the pointer even though
it's NULL, and hangs as a result of a lost event where memory
for the event couldn't be allocated.  Two sysctls are added:

debug.malloc.failure_rate

  How often to generate a failure: if set to 0 (default), this
  feature is disabled.  Otherwise, the frequency of failures --
  I've been using 10 (one in ten mallocs fails), but other
  popular settings might be much lower or much higher.

debug.malloc.failure_count

  Number of times a coerced malloc failure has occurred as a
  result of this feature.  Useful for tracking what might have
  happened and whether failures are being generated.

Useful possible additions: tying failure rate to malloc type,
printfs indicating the thread that experienced the coerced
failure.

Reviewed by:	jeffr, jhb
2003-03-26 20:18:40 +00:00
Poul-Henning Kamp
194a0abf73 PHCC[1]:
I had commented the #ifdef INVARIANTS checks out to make sure I ran this
code in all kernels and forgot to comment the #ifdefs back in before I
committed.

Spotted by:	bmilekic

[1] PHCC = Pointy Hat Correction Commit
2003-03-10 20:24:54 +00:00
Poul-Henning Kamp
d3c11994e1 Make malloc and mbuf allocation mode flags nonoverlapping.
Under INVARIANTS whine if we get incompatible flags.

Submitted by:   imp
2003-03-10 19:39:53 +00:00
Bosko Milekic
025b4be197 o Allow "buckets" in mb_alloc to be differently sized (according to
compile-time constants).  That is, a "bucket" now is not necessarily
  a page-worth of mbufs or clusters, but it is MBUF_BUCK_SZ, CLUS_BUCK_SZ
  worth of mbufs, clusters.
o Rename {mbuf,clust}_limit to {mbuf,clust}_hiwm and introduce
  {mbuf,clust}_lowm, which currently has no effect but will be used
  to set the low watermarks.
o Fix netstat so that it can deal with the differently-sized buckets
  and teach it about the low watermarks too.
o Make sure the per-cpu stats for an absent CPU has mb_active set to 0,
  explicitly.
o Get rid of the allocate refcounts from mbuf map mess.  Instead,
  just malloc() the refcounts in one shot from mbuf_init()
o Clean up / update comments in subr_mbuf.c
2003-02-20 04:26:58 +00:00
Warner Losh
a163d034fa Back out M_* changes, per decision of the TRB.
Approved by: trb
2003-02-19 05:47:46 +00:00
Poul-Henning Kamp
4db4f5c87f Under #ifdef DIAGNOSTIC, fill malloc(9) allocations which do not have
M_ZERO specified with 0x70.  (malloc_flags=J for the kernel :-)
2003-02-01 10:07:49 +00:00
Alfred Perlstein
44956c9863 Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
2003-01-21 08:56:16 +00:00
Poul-Henning Kamp
1fb14a47a1 Introduce malloc_last_fail() which returns the number of seconds since
malloc(9) failed last time.  This is intended to help code adjust
memory usage to the current circumstances.

A typical use could be:
	if (malloc_last_fail() < 60)
		reduce_cache_by_one();
2002-11-01 18:58:12 +00:00
Jeff Roberson
99571dc345 - Split UMA_ZFLAG_OFFPAGE into UMA_ZFLAG_OFFPAGE and UMA_ZFLAG_HASH.
- Remove all instances of the mallochash.
 - Stash the slab pointer in the vm page's object pointer when allocating from
   the kmem_obj.
 - Use the overloaded object pointer to find slabs for malloced memory.
2002-09-18 08:26:30 +00:00
Robert Drehmel
280759e75e - Replace the bandaid introduced in revision 1.110 with
a better solution.
 - Add braces for a ``for'' statement containing a single
   multi-line statement.
2002-05-31 09:41:09 +00:00
Jake Burkholder
45eefe7176 Add a bandaid so that sysctl kern.malloc works on sparc64. 2002-05-20 18:29:37 +00:00
John Baldwin
42e498655d Fix the td_intr_nesting_level check to work ok if a flag like M_ZERO is
passed in with M_WAITOK to malloc().
2002-05-20 17:46:57 +00:00
Jeff Roberson
8f70816cf2 Hide a pointer to the malloc_type bucket at the end of the freed memory. If
this memory is modified after it has been freed we can now report it's
previous owner.
2002-05-02 09:07:04 +00:00
Jeff Roberson
5a34a9f089 malloc/free(9) no longer require Giant. Use the malloc_mtx to protect the
mallochash.  Mallochash is going to go away as soon as I introduce the
kfree/kmalloc api and partially overhaul the malloc wrapper.  This can't happen
until all users of the malloc api that expect memory to be aligned on the size
of the allocation are fixed.
2002-05-02 07:22:19 +00:00
Jeff Roberson
639c9550fb Remove the temporary alignment check in free().
Implement the following checks on freed memory in the bucket path:
	- Slab membership
	- Alignment
	- Duplicate free

This previously was only done if we skipped the buckets.  This code will slow
down INVARIANTS a bit, but it is smp safe.  The checks were moved out of the
normal path and into hooks supplied in uma_dbg.
2002-05-02 02:08:48 +00:00
Jeff Roberson
289f207c81 Convert longs to u_longs in stats. This will hold off wrap arounds for a
while longer.
2002-04-30 22:39:32 +00:00
Jeff Roberson
8efc4eff00 Add a new UMA debugging facility. This will overwrite freed memory with
0xdeadc0de and then check for it just before memory is handed off as part
of a new request.  This will catch any post free/pre alloc modification of
memory, as well as introduce errors for anything that tries to dereference
it as a pointer.

This code takes the form of special init, fini, ctor and dtor routines that
are specificly used by malloc.  It is in a seperate file because additional
debugging aids will want to live here as well.
2002-04-30 07:54:25 +00:00
Jeff Roberson
2cc35ff9c6 Move the implementation of M_ZERO into UMA so that it can be passed to
uma_zalloc and friends.  Remove this functionality from the malloc wrapper.

Document this change in uma.h and adjust variable names in uma_core.
2002-04-30 04:26:34 +00:00
Robert Watson
43a7c4e919 Re-add the 16384 bucket also.
Submitted by:	green
2002-04-29 17:53:23 +00:00
Robert Watson
bd796eb25f Revert a portion of kern_malloc.c:1.99, which (in addition to adding
malloc profiling) also modified the set of pre-defined buckets for the
memory allocator.  For reasons unknown to me, this resulted in extensive
memory corruption in the kernel, in particular on SMP boxes, so I'm
committing this work-around until Jeff gets a chance to debug it
properly.  David Wolfskill pointed me at this commit as the one that
might be a problem; I've been running this code on two dual-processor
burn-in boxes for about 12 hours now, and the rate of panics due to
memory corruption has dropped to zero (from one every five minutes).

Hopefully not treading on the toes of:	jeff
2002-04-29 17:12:02 +00:00
Poul-Henning Kamp
708da94ef2 Add a basic sanity check on pointers passed to free(9).
Should be improved by:	jeff
2002-04-23 18:50:25 +00:00
Jeff Roberson
5e914b96b9 Finish adding support code for sysctl kern.mprof. This dumps some malloc
information related to bucket size effeciency.  Three things are printed on
each row:

Size is the size the user actually asked for rounded to 16 bytes.
Requests is the number of times this size was asked for.
Real Size is the size we actually handed out.

At the end the total memory used and total waste is displayed.  Currently my
system displays about 33% wasted memory.

The intent of this code is to gather statistics for tuning the malloc bucket
sizes.  It is not intended to be run with INVARIANTS and it is not entirely
mp safe.  It can be enabled via 'options MALLOC_PROFILE' which was commited
earlier.
2002-04-15 05:24:01 +00:00
Jeff Roberson
6f2671750e Remove malloc_type's ks_limit.
Updated the kmemzones logic such that the ks_size bitmap can be used as an
index into it to report the size of the zone used.

Create the kern.malloc sysctl which replaces the kvm mechanism to report
similar data.  This will provide an easy place for statistics aggregation if
malloc_type statistics become per cpu data.

Add some code ifdef'd under MALLOC_PROFILING to facilitate a tool for sizing
the malloc buckets.
2002-04-15 04:05:53 +00:00
John Baldwin
6008862bc2 Change callers of mtx_init() to pass in an appropriate lock type name. In
most cases NULL is passed, but in some cases such as network driver locks
(which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used.

Tested on:	i386, alpha, sparc64
2002-04-04 21:03:38 +00:00
Alfred Perlstein
4d77a549fe Remove __P. 2002-03-19 21:25:46 +00:00
Jeff Roberson
8355f576a9 This is the first part of the new kernel memory allocator. This replaces
malloc(9) and vm_zone with a slab like allocator.

Reviewed by:	arch@
2002-03-19 09:11:49 +00:00
Archie Cobbs
44a8ff315e Add realloc() and reallocf(), and make free(NULL, ...) acceptable.
Reviewed by:	alfred
2002-03-13 01:42:33 +00:00
Julian Elischer
b40ce4165d KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after:    ha ha ha ha
2001-09-12 08:38:13 +00:00
John Baldwin
c4a448100c - Remove asleep(), await(), and M_ASLEEP.
- Callers of asleep() and await() have been converted to calling tsleep().
  The only caller outside of M_ASLEEP was the ata driver, which called both
  asleep() and await() with spl-raised, so there was no need for the
  asleep() and await() pair.  M_ASLEEP was unused.

Reviewed by:	jasone, peter
2001-08-10 06:45:43 +00:00
Bosko Milekic
ba3e88262e Rename mb_init() mbuf subsystem initialization routine to mbuf_init(), in
order to avoid namespace collision with subr_mchain.c's mb_init(). This
wasn't "fatal" as the mbuf initialization routine mb_init() was local to
subr_mbuf.c which in turn didn't pull in subr_mchain.c's mb_init()
declaration, but it should deffinately be changed now before it creates
headache.
2001-08-03 05:05:32 +00:00
Jake Burkholder
f74250ca46 Remove some code that appears to have endian problems with INVARIANTS.
This is #if BIG_ENDIAN, but is only necessary if malloc types are shorts,
not struct malloc_type * like they are now.
2001-08-03 03:31:45 +00:00
Bosko Milekic
08442f8a82 Introduce numerous SMP friendly changes to the mbuf allocator. Namely,
introduce a modified allocation mechanism for mbufs and mbuf clusters; one
which can scale under SMP and which offers the possibility of resource
reclamation to be implemented in the future. Notable advantages:

 o Reduce contention for SMP by offering per-CPU pools and locks.
 o Better use of data cache due to per-CPU pools.
 o Much less code cache pollution due to excessively large allocation macros.
 o Framework for `grouping' objects from same page together so as to be able
   to possibly free wired-down pages back to the system if they are no longer
   needed by the network stacks.

 Additional things changed with this addition:

  - Moved some mbuf specific declarations and initializations from
    sys/conf/param.c into mbuf-specific code where they belong.
  - m_getclr() has been renamed to m_get_clrd() because the old name is really
    confusing. m_getclr() HAS been preserved though and is defined to the new
    name. No tree sweep has been done "to change the interface," as the old
    name will continue to be supported and is not depracated. The change was
    merely done because m_getclr() sounds too much like "m_get a cluster."
  - TEMPORARILY disabled mbtypes statistics displaying in netstat(1) and
    systat(1) (see TODO below).
  - Fixed systat(1) to display number of "free mbufs" based on new per-CPU
    stat structures.
  - Fixed netstat(1) to display new per-CPU stats based on sysctl-exported
    per-CPU stat structures. All infos are fetched via sysctl.

 TODO (in order of priority):

  - Re-enable mbtypes statistics in both netstat(1) and systat(1) after
    introducing an SMP friendly way to collect the mbtypes stats under the
    already introduced per-CPU locks (i.e. hopefully don't use atomic() - it
    seems too costly for a mere stat update, especially when other locks are
    already present).
  - Optionally have systat(1) display not only "total free mbufs" but also
    "total free mbufs per CPU pool."
  - Fix minor length-fetching issues in netstat(1) related to recently
    re-enabled option to read mbuf stats from a core file.
  - Move reference counters at least for mbuf clusters into an unused portion
    of the cluster itself, to save space and need to allocate a counter.
  - Look into introducing resource freeing possibly from a kproc.

Reviewed by (in parts): jlemon, jake, silby, terry
Tested by: jlemon (Intel & Alpha), mjacob (Intel & Alpha)
Preliminary performance measurements: jlemon (and me, obviously)
URL: http://people.freebsd.org/~bmilekic/mb_alloc/
2001-06-22 06:35:32 +00:00
Peter Wemm
0978669829 "Fix" the previous initial attempt at fixing TUNABLE_INT(). This time
around, use a common function for looking up and extracting the tunables
from the kernel environment.  This saves duplicating the same function
over and over again.  This way typically has an overhead of 8 bytes + the
path string, versus about 26 bytes + the path string.
2001-06-08 05:24:21 +00:00
Peter Wemm
4422746fdf Back out part of my previous commit. This was a last minute change
and I botched testing.  This is a perfect example of how NOT to do
this sort of thing. :-(
2001-06-07 03:17:26 +00:00
Peter Wemm
81930014ef Make the TUNABLE_*() macros look and behave more consistantly like the
SYSCTL_*() macros.  TUNABLE_INT_DECL() was an odd name because it didn't
actually declare the int, which is what the name suggests it would do.
2001-06-06 22:17:08 +00:00
Mark Murray
fb919e4d5a Undo part of the tangle of having sys/lock.h and sys/mutex.h included in
other "system" header files.

Also help the deprecation of lockmgr.h by making it a sub-include of
sys/lock.h and removing sys/lockmgr.h form kernel .c files.

Sort sys/*.h includes where possible in affected files.

OK'ed by:	bde (with reservations)
2001-05-01 08:13:21 +00:00
Bosko Milekic
d04d50d1f7 Fix inconsistency in setup of kernel_map: we need to make sure that
we also reserve _adequate_ space for the mb_map submap; i.e. we need
space for nmbclusters, nmbufs, _and_ nmbcnt. Furthermore, we need to
rounddown, and not roundup, so that we are consistent.

Pointed out by: bde
2001-04-18 23:54:13 +00:00
Bosko Milekic
9ed346bab0 Change and clean the mutex lock interface.
mtx_enter(lock, type) becomes:

mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks)
mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized)

similarily, for releasing a lock, we now have:

mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN.
We change the caller interface for the two different types of locks
because the semantics are entirely different for each case, and this
makes it explicitly clear and, at the same time, it rids us of the
extra `type' argument.

The enter->lock and exit->unlock change has been made with the idea
that we're "locking data" and not "entering locked code" in mind.

Further, remove all additional "flags" previously passed to the
lock acquire/release routines with the exception of two:

MTX_QUIET and MTX_NOSWITCH

The functionality of these flags is preserved and they can be passed
to the lock/unlock routines by calling the corresponding wrappers:

mtx_{lock, unlock}_flags(lock, flag(s)) and
mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN
locks, respectively.

Re-inline some lock acq/rel code; in the sleep lock case, we only
inline the _obtain_lock()s in order to ensure that the inlined code
fits into a cache line. In the spin lock case, we inline recursion and
actually only perform a function call if we need to spin. This change
has been made with the idea that we generally tend to avoid spin locks
and that also the spin locks that we do have and are heavily used
(i.e. sched_lock) do recurse, and therefore in an effort to reduce
function call overhead for some architectures (such as alpha), we
inline recursion for this case.

Create a new malloc type for the witness code and retire from using
the M_DEV type. The new type is called M_WITNESS and is only declared
if WITNESS is enabled.

Begin cleaning up some machdep/mutex.h code - specifically updated the
"optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN
and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently
need those.

Finally, caught up to the interface changes in all sys code.

Contributors: jake, jhb, jasone (in no particular order)
2001-02-09 06:11:45 +00:00
Boris Popov
1707240d2a Let M_PANIC go back to the private tree as its intention isn't understood well
for now.
2001-01-31 04:50:20 +00:00
Boris Popov
9211b0b657 Add M_PANIC flag to the list of available flags passed to malloc().
With this flag set malloc() will panic if memory allocation failed.
This usable only in critical places where failed allocation is fatal.

Reviewed by:	peter
2001-01-29 12:48:37 +00:00
Peter Wemm
0fee3d3550 p->p_intr_nesting_level is MI now and initialized to 0 in kern_fork.c,
so it should be save to KASSERT() on it even on an arch that may not
use it.
2001-01-27 06:32:20 +00:00
John Baldwin
0d6d6aa373 Don't grab Giant when calling kmem_alloc/kmem_free as this is just
encouraging other people to follow the same practice.  If this is going
to be done, then it should be done inside of those two functions instead.
2001-01-24 00:36:03 +00:00
Jake Burkholder
a448b62ac9 Make intr_nesting_level per-process, rather than per-cpu. Setup
interrupt threads to run with it always >= 1, so that malloc can
detect M_WAITOK from "interrupt" context.  This is also necessary
in order to context switch from sched_ithd() directly.

Reviewed By:	peter
2001-01-21 19:25:07 +00:00
Jason Evans
d1c1b8413e Remove MUTEX_DECLARE() and MTX_COLD. Instead, postpone full mutex
initialization until after malloc() is safe to call, then iterate through
all mutexes and complete their initialization.

This change is necessary in order to avoid some circular bootstrapping
dependencies.
2001-01-21 07:52:20 +00:00
Jake Burkholder
ef73ae4b0c Use PCPU_GET, PCPU_PTR and PCPU_SET to access all per-cpu variables
other then curproc.
2001-01-10 04:43:51 +00:00
Poul-Henning Kamp
1921a06d6a Introduce the M_ZERO flag to malloc(9)
Instead of:

        foo = malloc(sizeof(foo), M_WAIT);
        bzero(foo, sizeof(foo));

You can now (and please do) use:

        foo = malloc(sizeof(foo), M_WAIT | M_ZERO);

In the future this will enable us to do idle-time pre-zeroing of
malloc-space.
2000-10-20 17:54:55 +00:00
John Baldwin
eec258d257 - machine/mutex.h -> sys/mutex.h
- Use MUTEX_DECLARE() and MTX_COLD for the malloc_mtx mutex
2000-10-20 07:29:16 +00:00
Jason Evans
9a02e8c68f Don't #include <sys/proc.h>, since machine/mutex.h does it now. 2000-09-23 00:01:37 +00:00
John Baldwin
606f8eb27a Remove the mtx_t, witness_t, and witness_blessed_t types. Instead, just
use struct mtx, struct witness, and struct witness_blessed.

Requested by:	bde
2000-09-14 20:15:16 +00:00
Jason Evans
69ef67f983 Add malloc_mtx to protect malloc and friends, so that they're thread-safe.
Reviewed by:	peter
2000-09-11 02:32:30 +00:00
Jason Evans
28d4c2dde3 Back out the addition of malloc_mtx. It was incompletely conceived, and
will be done correctly in the future.
2000-09-10 01:54:15 +00:00
Jason Evans
9360d3ebdd Add a mutex to the malloc interfaces so that it can safely be called
without owning the Giant lock.
2000-09-09 22:27:35 +00:00
Boris Popov
5badeabaca Move #ifdef to the right place. 2000-06-29 09:26:26 +00:00
Boris Popov
99063cf89e If kernel compiled with INVARIANTS:
On unload, remove references from freelist to memory type defined by module.
Print a warning if module defines and allocate its own memory type, but
didn't free it all on unload.

Reviewed by:	peter
2000-06-29 03:41:30 +00:00
Bruce Evans
8e8cac5555 sys/malloc.h:
Order the SYSINIT() for MALLOC_DEFINE() correctly so that malloc()
doesn't have to waste time initializing itself.  The
(SI_SUB_KMEM, SI_ORDER_ANY) order was shared with syscons' SYSINIT()
for scmeminit(), and scmeminit() calls malloc(), so malloc()
initialization was not always complete on the first call to malloc().

kern/kern_malloc.c:
- Removed self-initialization in malloc().
- Removed half-baked sanity check in free().  Trust MALLOC_DEFINE().
2000-06-14 18:31:42 +00:00
Jun Kuriyama
dc76063419 Print "previous type" correctly when INVARIANTS is defined.
Reviewed by:	current@FreeBSD.org
2000-03-14 14:58:04 +00:00
Matthew Dillon
1f6889a1eb Fix null-pointer dereference crash when the system is intentionally
run out of KVM through a mmap()/fork() bomb that allocates hundreds
    of thousands of vm_map_entry structures.

    Add panic to make null-pointer dereference crash a little more verbose.

    Add a new sysctl, vm.max_proc_mmap, which specifies the maximum number
    of mmap()'d spaces (discrete vm_map_entry's in the process).  The value
    defaults to around 9000 for a 128MB machine.  The test is scaled for the
    number of processes sharing a vmspace (aka linux threads).  Setting
    the value to 0 disables the feature.

PR: kern/16573
Approved by: jkh
2000-02-16 21:11:33 +00:00
David Greenman
27b8623f21 Fixed sign and overflow bugs that caused the allocation size of the kernel
malloc region (kmem_map) to be wrong and semi-random on systems with more
than 1GB of RAM. This is not a complete fix, but is sufficient for
machines with 4GB or less of memory. A complete fix will require some
changes to the getenv stuff so that 64bit values can be passed around.

NOT FIXED: machines with more than 4GB of RAM (e.g. some large Alphas)
since we're still using ints to hold some of the values.

Reviewed by:	bde
2000-01-28 04:04:58 +00:00
Yoshinobu Inoue
82cd038d51 KAME netinet6 basic part(no IPsec,no V6 Multicast Forwarding, no UDP/TCP
for IPv6 yet)

With this patch, you can assigne IPv6 addr automatically, and can reply to
IPv6 ping.

Reviewed by: freebsd-arch, cvs-committers
Obtained from: KAME project
1999-11-22 02:45:11 +00:00
Poul-Henning Kamp
3b6fb88590 Before we start to mess with the VFS name-cache clean things up a little bit:
Isolate the namecache in its own file, and give it a dedicated malloc type.
1999-10-03 12:18:29 +00:00