freebsd-nq

Author	SHA1	Message	Date
Bosko Milekic	e4eb384b47	Bring in MemGuard, a very simple and small replacement allocator designed to help detect tamper-after-free scenarios, a problem more and more common and likely with multithreaded kernels where race conditions are more prevalent. Currently MemGuard can only take over malloc()/realloc()/free() for particular (a) malloc type(s) and the code brought in with this change manually instruments it to take over M_SUBPROC allocations as an example. If you are planning to use it, for now you must: 1) Put "options DEBUG_MEMGUARD" in your kernel config. 2) Edit src/sys/kern/kern_malloc.c manually, look for "XXX CHANGEME" and replace the M_SUBPROC comparison with the appropriate malloc type (this might require additional but small/simple code modification if, say, the malloc type is declared out of scope). 3) Build and install your kernel. Tune vm.memguard_divisor boot-time tunable which is used to scale how much of kmem_map you want to allott for MemGuard's use. The default is 10, so kmem_size/10. ToDo: 1) Bring in a memguard(9) man page. 2) Better instrumentation (e.g., boot-time) of MemGuard taking over malloc types. 3) Teach UMA about MemGuard to allow MemGuard to override zone allocations too. 4) Improve MemGuard if necessary. This work is partly based on some old patches from Ian Dowse.	2005-01-21 18:09:17 +00:00
Warner Losh	9454b2d864	/* -> /*- for copyright notices, minor format tweaks as necessary	2005-01-06 23:35:40 +00:00
Dag-Erling Smørgrav	479439b4fe	Turn VM_KMEM_SIZE_MAX and VM_KMEM_SIZE_SCALE into tunables. MFC after: 3 days	2004-09-29 14:21:40 +00:00
Brian Feldman	4362fada8f	Reimplement contigmalloc(9) with an algorithm which stands a greatly- improved chance of working despite pressure from running programs. Instead of trying to throw a bunch of pages out to swap and hope for the best, only a range that can potentially fulfill contigmalloc(9)'s request will have its contents paged out (potentially, not forcibly) at a time. The new contigmalloc operation still operates in three passes, but it could potentially be tuned to more or less. The first pass only looks at pages in the cache and free pages, so they would be thrown out without having to block. If this is not enough, the subsequent passes page out any unwired memory. To combat memory pressure refragmenting the section of memory being laundered, each page is removed from the systems' free memory queue once it has been freed so that blocking later doesn't cause the memory laundered so far to get reallocated. The page-out operations are now blocking, as it would make little sense to try to push out a page, then get its status immediately afterward to remove it from the available free pages queue, if it's unlikely to have been freed. Another change is that if KVA allocation fails, the allocated memory segment will be freed and not leaked. There is a sysctl/tunable, defaulting to on, which causes the old contigmalloc() algorithm to be used. Nonetheless, I have been using vm.old_contigmalloc=0 for over a month. It is safe to switch at run-time to see the difference it makes. A new interface has been used which does not require mapping the allocated pages into KVA: vm_page.h functions vm_page_alloc_contig() and vm_page_release_contig(). These are what vm.old_contigmalloc=0 uses internally, so the sysctl/tunable does not affect their operation. When using the contigmalloc(9) and contigfree(9) interfaces, memory is now tracked with malloc(9) stats. Several functions have been exported from kern_malloc.c to allow other subsystems to use these statistics, as well. This invalidates the BUGS section of the contigmalloc(9) manpage.	2004-07-19 06:21:27 +00:00
Marcel Moolenaar	2d50560abc	Update for the KDB framework: o Make debugging code conditional upon KDB instead of DDB. o Call kdb_enter() instead of Debugger(). o Call kdb_backtrace() instead of db_print_backtrace() or backtrace(). kern_mutex.c: o Replace checks for db_active with checks for kdb_active and make them unconditional. kern_shutdown.c: o s/DDB_UNATTENDED/KDB_UNATTENDED/g o s/DDB_TRACE/KDB_TRACE/g o Save the TID of the thread doing the kernel dump so the debugger knows which thread to select as the current when debugging the kernel core file. o Clear kdb_active instead of db_active and do so unconditionally. o Remove backtrace() implementation. kern_synch.c: o Call kdb_reenter() instead of db_error().	2004-07-10 21:36:01 +00:00
Bosko Milekic	099a0e588c	Bring in mbuma to replace mballoc. mbuma is an Mbuf & Cluster allocator built on top of a number of extensions to the UMA framework, all included herein. Extensions to UMA worth noting: - Better layering between slab <-> zone caches; introduce Keg structure which splits off slab cache away from the zone structure and allows multiple zones to be stacked on top of a single Keg (single type of slab cache); perhaps we should look into defining a subset API on top of the Keg for special use by malloc(9), for example. - UMA_ZONE_REFCNT zones can now be added, and reference counters automagically allocated for them within the end of the associated slab structures. uma_find_refcnt() does a kextract to fetch the slab struct reference from the underlying page, and lookup the corresponding refcnt. mbuma things worth noting: - integrates mbuf & cluster allocations with extended UMA and provides caches for commonly-allocated items; defines several zones (two primary, one secondary) and two kegs. - change up certain code paths that always used to do: m_get() + m_clget() to instead just use m_getcl() and try to take advantage of the newly defined secondary Packet zone. - netstat(1) and systat(1) quickly hacked up to do basic stat reporting but additional stats work needs to be done once some other details within UMA have been taken care of and it becomes clearer to how stats will work within the modified framework. From the user perspective, one implication is that the NMBCLUSTERS compile-time option is no longer used. The maximum number of clusters is still capped off according to maxusers, but it can be made unlimited by setting the kern.ipc.nmbclusters boot-time tunable to zero. Work should be done to write an appropriate sysctl handler allowing dynamic tuning of kern.ipc.nmbclusters at runtime. Additional things worth noting/known issues (READ): - One report of 'ips' (ServeRAID) driver acting really slow in conjunction with mbuma. Need more data. Latest report is that ips is equally sucking with and without mbuma. - Giant leak in NFS code sometimes occurs, can't reproduce but currently analyzing; brueffer is able to reproduce but THIS IS NOT an mbuma-specific problem and currently occurs even WITHOUT mbuma. - Issues in network locking: there is at least one code path in the rip code where one or more locks are acquired and we end up in m_prepend() with M_WAITOK, which causes WITNESS to whine from within UMA. Current temporary solution: force all UMA allocations to be M_NOWAIT from within UMA for now to avoid deadlocks unless WITNESS is defined and we can determine with certainty that we're not holding any locks when we're M_WAITOK. - I've seen at least one weird socketbuffer empty-but- mbuf-still-attached panic. I don't believe this to be related to mbuma but please keep your eyes open, turn on debugging, and capture crash dumps. This change removes more code than it adds. A paper is available detailing the change and considering various performance issues, it was presented at BSDCan2004: http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf Please read the paper for Future Work and implementation details, as well as credits. Testing and Debugging: rwatson, brueffer, Ketrien I. Saihr-Kesenchedra, ... Reviewed by: Lots of people (for different parts)	2004-05-31 21:46:06 +00:00
Warner Losh	7f8a436ff2	Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999. Approved by: core	2004-04-05 21:03:37 +00:00
Dag-Erling Smørgrav	84344f9fbf	Rename the kern.vm.kmem.size tunable to the more logical vm.kmem_size. To assure backward compatibility (conditional on !BURN_BRIDGES), look it up by its old name first, and log a warning (but accept the setting) if it was found. If both the old and new name are defined, the new name takes precedence. Also export vm.kmem_size as a read-only sysctl variable; I find it hard to tune a parameter when I don't know its default value, especially when that default value is computed at boot time.	2004-01-27 15:59:38 +00:00
Jeff Roberson	9fb535dec5	- Only use UMA to cache malloc requests up to PAGE_SIZE. Values larger than this are requested very infrequently and waste memory when we cache spares.	2003-09-19 04:39:08 +00:00
Poul-Henning Kamp	68f2d20b70	Revert stuff which accidentally ended up in the previous commit.	2003-07-22 10:36:36 +00:00
Poul-Henning Kamp	55d1d7034f	Don't attempt to inline large functions mb_alloc() and mb_free(), it more than doubles the text size of this file. GCC has wisely ignored us on this previously	2003-07-22 10:24:41 +00:00
Mike Silbersack	347194c172	Add init_param3() to subr_param. This function is called immediately after the kernel map has been sized, and is the optimal place for the autosizing of memory allocations which occur within the kernel map to occur. Suggested by: bde	2003-07-11 00:01:03 +00:00
Paul Saab	1795d0cdec	Don't overflow when calculating vm_kmem_size. This fixes kmem_map too small panics on PAE machines which have odd > 4GB sizes (4.5 gig would render a 20MB of KVA for kmem_map instead of 200MB). Submitted by: John Cagle <john.cagle@hp.com>, jeff Reviewed by: jeff, peter, scottl, lots of USENIX folks	2003-06-11 05:18:59 +00:00
David E. O'Brien	677b542ea2	Use __FBSDID().	2003-06-11 00:56:59 +00:00
Poul-Henning Kamp	1282e9acea	Don't pass NULL pointer to memset if we are compiled with DIAGNOSTIC Approved by: re/rwatson	2003-05-12 05:09:56 +00:00
Poul-Henning Kamp	8cb72d6174	Add two KASSERTS which trigger if free(9) would drag the "memuse" statistic for a malloc bucket under zero. This typically happens if you malloc(9) from one bucket and free to another.	2003-05-05 08:32:53 +00:00
Poul-Henning Kamp	3f6ee876c1	Update the "last malloc failure timestamp" also for simulated malloc errors.	2003-04-25 21:49:24 +00:00
Robert Watson	f2538508f6	Permit debug.malloc.failure_rate to be specified using a tunable so that the feature can be enabled during the boot process. Note the continued limitation that FreeBSD fails so rapidly with this setting enabled that it's hard to narrow down particular failures for correction; we really need per-malloc type failure rates.	2003-03-26 20:44:29 +00:00
Robert Watson	eae870cdb4	Add a new kernel option, MALLOC_MAKE_FAILURES, which compiles in a debugging feature causing M_NOWAIT allocations to fail at a specified rate. This can be useful for detecting poor handling of M_NOWAIT: the most frequent problems I've bumped into are unconditional deference of the pointer even though it's NULL, and hangs as a result of a lost event where memory for the event couldn't be allocated. Two sysctls are added: debug.malloc.failure_rate How often to generate a failure: if set to 0 (default), this feature is disabled. Otherwise, the frequency of failures -- I've been using 10 (one in ten mallocs fails), but other popular settings might be much lower or much higher. debug.malloc.failure_count Number of times a coerced malloc failure has occurred as a result of this feature. Useful for tracking what might have happened and whether failures are being generated. Useful possible additions: tying failure rate to malloc type, printfs indicating the thread that experienced the coerced failure. Reviewed by: jeffr, jhb	2003-03-26 20:18:40 +00:00
Poul-Henning Kamp	194a0abf73	PHCC[1]: I had commented the #ifdef INVARIANTS checks out to make sure I ran this code in all kernels and forgot to comment the #ifdefs back in before I committed. Spotted by: bmilekic [1] PHCC = Pointy Hat Correction Commit	2003-03-10 20:24:54 +00:00
Poul-Henning Kamp	d3c11994e1	Make malloc and mbuf allocation mode flags nonoverlapping. Under INVARIANTS whine if we get incompatible flags. Submitted by: imp	2003-03-10 19:39:53 +00:00
Bosko Milekic	025b4be197	o Allow "buckets" in mb_alloc to be differently sized (according to compile-time constants). That is, a "bucket" now is not necessarily a page-worth of mbufs or clusters, but it is MBUF_BUCK_SZ, CLUS_BUCK_SZ worth of mbufs, clusters. o Rename {mbuf,clust}_limit to {mbuf,clust}_hiwm and introduce {mbuf,clust}_lowm, which currently has no effect but will be used to set the low watermarks. o Fix netstat so that it can deal with the differently-sized buckets and teach it about the low watermarks too. o Make sure the per-cpu stats for an absent CPU has mb_active set to 0, explicitly. o Get rid of the allocate refcounts from mbuf map mess. Instead, just malloc() the refcounts in one shot from mbuf_init() o Clean up / update comments in subr_mbuf.c	2003-02-20 04:26:58 +00:00
Warner Losh	a163d034fa	Back out M_* changes, per decision of the TRB. Approved by: trb	2003-02-19 05:47:46 +00:00
Poul-Henning Kamp	4db4f5c87f	Under #ifdef DIAGNOSTIC, fill malloc(9) allocations which do not have M_ZERO specified with 0x70. (malloc_flags=J for the kernel :-)	2003-02-01 10:07:49 +00:00
Alfred Perlstein	44956c9863	Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0. Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.	2003-01-21 08:56:16 +00:00
Poul-Henning Kamp	1fb14a47a1	Introduce malloc_last_fail() which returns the number of seconds since malloc(9) failed last time. This is intended to help code adjust memory usage to the current circumstances. A typical use could be: if (malloc_last_fail() < 60) reduce_cache_by_one();	2002-11-01 18:58:12 +00:00
Jeff Roberson	99571dc345	- Split UMA_ZFLAG_OFFPAGE into UMA_ZFLAG_OFFPAGE and UMA_ZFLAG_HASH. - Remove all instances of the mallochash. - Stash the slab pointer in the vm page's object pointer when allocating from the kmem_obj. - Use the overloaded object pointer to find slabs for malloced memory.	2002-09-18 08:26:30 +00:00
Robert Drehmel	280759e75e	- Replace the bandaid introduced in revision 1.110 with a better solution. - Add braces for a ``for'' statement containing a single multi-line statement.	2002-05-31 09:41:09 +00:00
Jake Burkholder	45eefe7176	Add a bandaid so that sysctl kern.malloc works on sparc64.	2002-05-20 18:29:37 +00:00
John Baldwin	42e498655d	Fix the td_intr_nesting_level check to work ok if a flag like M_ZERO is passed in with M_WAITOK to malloc().	2002-05-20 17:46:57 +00:00
Jeff Roberson	8f70816cf2	Hide a pointer to the malloc_type bucket at the end of the freed memory. If this memory is modified after it has been freed we can now report it's previous owner.	2002-05-02 09:07:04 +00:00
Jeff Roberson	5a34a9f089	malloc/free(9) no longer require Giant. Use the malloc_mtx to protect the mallochash. Mallochash is going to go away as soon as I introduce the kfree/kmalloc api and partially overhaul the malloc wrapper. This can't happen until all users of the malloc api that expect memory to be aligned on the size of the allocation are fixed.	2002-05-02 07:22:19 +00:00
Jeff Roberson	639c9550fb	Remove the temporary alignment check in free(). Implement the following checks on freed memory in the bucket path: - Slab membership - Alignment - Duplicate free This previously was only done if we skipped the buckets. This code will slow down INVARIANTS a bit, but it is smp safe. The checks were moved out of the normal path and into hooks supplied in uma_dbg.	2002-05-02 02:08:48 +00:00
Jeff Roberson	289f207c81	Convert longs to u_longs in stats. This will hold off wrap arounds for a while longer.	2002-04-30 22:39:32 +00:00
Jeff Roberson	8efc4eff00	Add a new UMA debugging facility. This will overwrite freed memory with 0xdeadc0de and then check for it just before memory is handed off as part of a new request. This will catch any post free/pre alloc modification of memory, as well as introduce errors for anything that tries to dereference it as a pointer. This code takes the form of special init, fini, ctor and dtor routines that are specificly used by malloc. It is in a seperate file because additional debugging aids will want to live here as well.	2002-04-30 07:54:25 +00:00
Jeff Roberson	2cc35ff9c6	Move the implementation of M_ZERO into UMA so that it can be passed to uma_zalloc and friends. Remove this functionality from the malloc wrapper. Document this change in uma.h and adjust variable names in uma_core.	2002-04-30 04:26:34 +00:00
Robert Watson	43a7c4e919	Re-add the 16384 bucket also. Submitted by: green	2002-04-29 17:53:23 +00:00
Robert Watson	bd796eb25f	Revert a portion of kern_malloc.c:1.99, which (in addition to adding malloc profiling) also modified the set of pre-defined buckets for the memory allocator. For reasons unknown to me, this resulted in extensive memory corruption in the kernel, in particular on SMP boxes, so I'm committing this work-around until Jeff gets a chance to debug it properly. David Wolfskill pointed me at this commit as the one that might be a problem; I've been running this code on two dual-processor burn-in boxes for about 12 hours now, and the rate of panics due to memory corruption has dropped to zero (from one every five minutes). Hopefully not treading on the toes of: jeff	2002-04-29 17:12:02 +00:00
Poul-Henning Kamp	708da94ef2	Add a basic sanity check on pointers passed to free(9). Should be improved by: jeff	2002-04-23 18:50:25 +00:00
Jeff Roberson	5e914b96b9	Finish adding support code for sysctl kern.mprof. This dumps some malloc information related to bucket size effeciency. Three things are printed on each row: Size is the size the user actually asked for rounded to 16 bytes. Requests is the number of times this size was asked for. Real Size is the size we actually handed out. At the end the total memory used and total waste is displayed. Currently my system displays about 33% wasted memory. The intent of this code is to gather statistics for tuning the malloc bucket sizes. It is not intended to be run with INVARIANTS and it is not entirely mp safe. It can be enabled via 'options MALLOC_PROFILE' which was commited earlier.	2002-04-15 05:24:01 +00:00
Jeff Roberson	6f2671750e	Remove malloc_type's ks_limit. Updated the kmemzones logic such that the ks_size bitmap can be used as an index into it to report the size of the zone used. Create the kern.malloc sysctl which replaces the kvm mechanism to report similar data. This will provide an easy place for statistics aggregation if malloc_type statistics become per cpu data. Add some code ifdef'd under MALLOC_PROFILING to facilitate a tool for sizing the malloc buckets.	2002-04-15 04:05:53 +00:00
John Baldwin	6008862bc2	Change callers of mtx_init() to pass in an appropriate lock type name. In most cases NULL is passed, but in some cases such as network driver locks (which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used. Tested on: i386, alpha, sparc64	2002-04-04 21:03:38 +00:00
Alfred Perlstein	4d77a549fe	Remove __P.	2002-03-19 21:25:46 +00:00
Jeff Roberson	8355f576a9	This is the first part of the new kernel memory allocator. This replaces malloc(9) and vm_zone with a slab like allocator. Reviewed by: arch@	2002-03-19 09:11:49 +00:00
Archie Cobbs	44a8ff315e	Add realloc() and reallocf(), and make free(NULL, ...) acceptable. Reviewed by: alfred	2002-03-13 01:42:33 +00:00
Julian Elischer	b40ce4165d	KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha	2001-09-12 08:38:13 +00:00
John Baldwin	c4a448100c	- Remove asleep(), await(), and M_ASLEEP. - Callers of asleep() and await() have been converted to calling tsleep(). The only caller outside of M_ASLEEP was the ata driver, which called both asleep() and await() with spl-raised, so there was no need for the asleep() and await() pair. M_ASLEEP was unused. Reviewed by: jasone, peter	2001-08-10 06:45:43 +00:00
Bosko Milekic	ba3e88262e	Rename mb_init() mbuf subsystem initialization routine to mbuf_init(), in order to avoid namespace collision with subr_mchain.c's mb_init(). This wasn't "fatal" as the mbuf initialization routine mb_init() was local to subr_mbuf.c which in turn didn't pull in subr_mchain.c's mb_init() declaration, but it should deffinately be changed now before it creates headache.	2001-08-03 05:05:32 +00:00
Jake Burkholder	f74250ca46	Remove some code that appears to have endian problems with INVARIANTS. This is #if BIG_ENDIAN, but is only necessary if malloc types are shorts, not struct malloc_type * like they are now.	2001-08-03 03:31:45 +00:00
Bosko Milekic	08442f8a82	Introduce numerous SMP friendly changes to the mbuf allocator. Namely, introduce a modified allocation mechanism for mbufs and mbuf clusters; one which can scale under SMP and which offers the possibility of resource reclamation to be implemented in the future. Notable advantages: o Reduce contention for SMP by offering per-CPU pools and locks. o Better use of data cache due to per-CPU pools. o Much less code cache pollution due to excessively large allocation macros. o Framework for `grouping' objects from same page together so as to be able to possibly free wired-down pages back to the system if they are no longer needed by the network stacks. Additional things changed with this addition: - Moved some mbuf specific declarations and initializations from sys/conf/param.c into mbuf-specific code where they belong. - m_getclr() has been renamed to m_get_clrd() because the old name is really confusing. m_getclr() HAS been preserved though and is defined to the new name. No tree sweep has been done "to change the interface," as the old name will continue to be supported and is not depracated. The change was merely done because m_getclr() sounds too much like "m_get a cluster." - TEMPORARILY disabled mbtypes statistics displaying in netstat(1) and systat(1) (see TODO below). - Fixed systat(1) to display number of "free mbufs" based on new per-CPU stat structures. - Fixed netstat(1) to display new per-CPU stats based on sysctl-exported per-CPU stat structures. All infos are fetched via sysctl. TODO (in order of priority): - Re-enable mbtypes statistics in both netstat(1) and systat(1) after introducing an SMP friendly way to collect the mbtypes stats under the already introduced per-CPU locks (i.e. hopefully don't use atomic() - it seems too costly for a mere stat update, especially when other locks are already present). - Optionally have systat(1) display not only "total free mbufs" but also "total free mbufs per CPU pool." - Fix minor length-fetching issues in netstat(1) related to recently re-enabled option to read mbuf stats from a core file. - Move reference counters at least for mbuf clusters into an unused portion of the cluster itself, to save space and need to allocate a counter. - Look into introducing resource freeing possibly from a kproc. Reviewed by (in parts): jlemon, jake, silby, terry Tested by: jlemon (Intel & Alpha), mjacob (Intel & Alpha) Preliminary performance measurements: jlemon (and me, obviously) URL: http://people.freebsd.org/~bmilekic/mb_alloc/	2001-06-22 06:35:32 +00:00

1 2 3

138 Commits