Commit Graph

103 Commits

Author SHA1 Message Date
Julian Elischer
4a3276d4a4 While well intentionned the check to see it there is a packet
header and return that length, was misguided.

The check itself didn't take into account the fact that the
mbuf pointer pased in may be null, and the function is
defined specifically for cases where the caller knows what it wants.
Rather than fix the check I'm removing it as phk suggested.

Submitted by:	 phk@freebsd.org
2002-09-19 08:28:41 +00:00
Julian Elischer
4a49235b89 fix style.. Return in the kernel always has () around the arguments. 2002-09-19 03:18:44 +00:00
Julian Elischer
1494277d50 Compiler was correct:
m WAS being used uninitialized..
2002-09-19 03:15:39 +00:00
Darren Reed
e62497713c If M_PKTHDR is set then we don't need to do a loop to find the total length. 2002-09-19 01:21:24 +00:00
Bosko Milekic
bd395ae8f6 style nit: unsigned -> u_int in the kernel, particularly to
stay consistent in this file, and keep m_length() and m_fixhdr()
consistent with their prototypes in mbuf.h

Inspired by: bde
2002-09-18 22:33:52 +00:00
Poul-Henning Kamp
4e4425d486 Make m_length() and m_fixhdr() return unsigned.
Suggested by:	arr
2002-09-18 19:42:06 +00:00
Poul-Henning Kamp
ac6e585d24 Introduce the m_length() function which will return the accumulated
length of an mbuf-chain and optionally a pointer to the last mbuf.
2002-09-18 14:57:35 +00:00
Poul-Henning Kamp
3f2e06c5e1 Move m_fixhdr() from "mbchain" to "mbuf" where it belongs. 2002-09-18 13:41:37 +00:00
Robert Watson
f9d0d52459 Include file cleanup; mac.h and malloc.h at one point had ordering
relationship requirements, and no longer do.

Reminded by:	bde
2002-08-01 17:47:56 +00:00
Robert Watson
e32a5b94d8 Introduce support for Mandatory Access Control and extensible
kernel access control.

Invoke additional MAC entry points when an mbuf packet header is
copied to another mbuf: release the old label if any, reinitialize
the new header, and ask the MAC framework to copy the header label
data.  Note that this requires a potential allocation operation,
but m_copy_pkthdr() is not permitted to fail, so we must block.
Since we now use interrupt threads, this is possible, but not
desirable.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-31 01:51:34 +00:00
Robert Watson
e37b1fcdee Make M_COPY_PKTHDR() macro into a wrapper for a m_copy_pkthdr()
function.  This permits conditionally compiled extensions to the
packet header copying semantic, such as extensions to copy MAC
labels.

Reviewed by:	bmilekic
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-30 18:28:58 +00:00
Bosko Milekic
4151d2e620 Move m_freem() from uipc_mbuf.c to subr_mbuf.c so it can take advantage
of the inlines, like its cousin, m_free().  Also, make a small (first
step?) optimisation of m_free() to use the MBP_PERSIST{,ENT} interface
to hold the lock across frees when possible.  The thing is that right
now, we can only do this easily for at most across one mbuf + one
cluster free, as the comment mentions (it also explains why).  Anyway,
some basic tests revealed a 5-10% overall improvement.  Some of the
results can be found here:
http://people.freebsd.org/~bmilekic/code/measure.txt
2002-07-24 15:11:23 +00:00
Bosko Milekic
185c2244ce o Introduce new m_getcl() interface routine that allocates an mbuf
and a cluster in one shot.
o Introduce MBP_PERSIST and MBP_PERSISTENT control bits to mb_alloc();
  MBP_PERSIST means "if you can allocate, then keep the cache lock
  held on exit," and MBP_PERSISTENT means "a cache lock is alredy held
  on entry, so allocate from the specified (already locked) cache."
  They may be used in combination.
o m_getcl() uses the MBP_PERSIST/MBP_PERSISTENT interface so that it
  doesn't drop the cache lock in between the mbuf and cluster allocations.
o m_getm(), which takes a size and allocates an mbuf + cluster "best fit"
  chain, has been moved from uipc_mbuf.c to subr_mbuf.c and shown how to
  use MBP_PERSIST/MBP_PERSISTENT to attempt to do a grouped allocation
  without dropping the cache lock in between.

Why this is good: much less bus-locked lock acquires/drops when they're
not needed.  Also, prototype for m_getcl():
struct mbuf * m_getcl(int how, short type, int flags);
"how" and "type" are self-explanatory.  "flags" may be M_PKTHDR, in
which case m_getcl() will make the mbuf a pkthdr-mbuf.

While I'm in subr_mbuf.c:
o Every exported routine now has a nice comment with a description of
  the expected arguments.  Eventually, mbuf(9) needs to be re-vamped
  but there's still more code to write/finalize before I get to that.
o internal macros have been changed a bit.
o consistently use 'short' for "type."  This somehow slipped through
  before (that 'type' was sometimes declared as int).

Alfred has been pushing for the MBP_PERSIST{,ENT} thing for almost a
year now.  Luigi asked for m_getcl(), and will probably MFC that
part of this commit.

TODO [Related]: teach mb_free() about MBP_PERSIST{, ENT}.
2002-07-15 15:32:59 +00:00
Archie Cobbs
48d183faca Fix a bug in m_split(): the "m->m_ext.ext_size" field of an mbuf was being
set to zero. This field indicates the total space in the external buffer
and therefore should not be modified after the external buffer is added.

Add a comment warning that the mbufs returned by m_split() might be read-only.

Fix M_TRAILINGSPACE() to return zero if !M_WRITABLE(m).

Reviewed by:	freebsd-net
Obtained from:	Vernier Networks, Inc.
MFC after:	1 week
2002-05-31 22:09:57 +00:00
Jeffrey Hsu
4037698769 Fix corner case where m_len was not being initialized.
Submitted by:	Maksim Yevmenkin <myevmenk@digisle.net>
MFC after:	1 week
2002-04-12 00:01:50 +00:00
Matthew Dillon
ecde8f7c29 Get rid of the twisted MFREE() macro entirely.
Reviewed by:	dg, bmilekic
MFC after:	3 days
2002-02-05 02:00:56 +00:00
David E. O'Brien
a48740b6c5 Update to C99, s/__FUNCTION__/__func__/. 2001-12-10 05:51:45 +00:00
Julian Elischer
a8cfc0ee40 Forgot to remove this un-needed test. (M_WAITOK won't fail)
I vaguely remember someone once proving it COULD return NULL..
was that changed?

Reminded by: BDE

MFC after:	2 weeks
2001-08-19 04:30:13 +00:00
Bosko Milekic
08442f8a82 Introduce numerous SMP friendly changes to the mbuf allocator. Namely,
introduce a modified allocation mechanism for mbufs and mbuf clusters; one
which can scale under SMP and which offers the possibility of resource
reclamation to be implemented in the future. Notable advantages:

 o Reduce contention for SMP by offering per-CPU pools and locks.
 o Better use of data cache due to per-CPU pools.
 o Much less code cache pollution due to excessively large allocation macros.
 o Framework for `grouping' objects from same page together so as to be able
   to possibly free wired-down pages back to the system if they are no longer
   needed by the network stacks.

 Additional things changed with this addition:

  - Moved some mbuf specific declarations and initializations from
    sys/conf/param.c into mbuf-specific code where they belong.
  - m_getclr() has been renamed to m_get_clrd() because the old name is really
    confusing. m_getclr() HAS been preserved though and is defined to the new
    name. No tree sweep has been done "to change the interface," as the old
    name will continue to be supported and is not depracated. The change was
    merely done because m_getclr() sounds too much like "m_get a cluster."
  - TEMPORARILY disabled mbtypes statistics displaying in netstat(1) and
    systat(1) (see TODO below).
  - Fixed systat(1) to display number of "free mbufs" based on new per-CPU
    stat structures.
  - Fixed netstat(1) to display new per-CPU stats based on sysctl-exported
    per-CPU stat structures. All infos are fetched via sysctl.

 TODO (in order of priority):

  - Re-enable mbtypes statistics in both netstat(1) and systat(1) after
    introducing an SMP friendly way to collect the mbtypes stats under the
    already introduced per-CPU locks (i.e. hopefully don't use atomic() - it
    seems too costly for a mere stat update, especially when other locks are
    already present).
  - Optionally have systat(1) display not only "total free mbufs" but also
    "total free mbufs per CPU pool."
  - Fix minor length-fetching issues in netstat(1) related to recently
    re-enabled option to read mbuf stats from a core file.
  - Move reference counters at least for mbuf clusters into an unused portion
    of the cluster itself, to save space and need to allocate a counter.
  - Look into introducing resource freeing possibly from a kproc.

Reviewed by (in parts): jlemon, jake, silby, terry
Tested by: jlemon (Intel & Alpha), mjacob (Intel & Alpha)
Preliminary performance measurements: jlemon (and me, obviously)
URL: http://people.freebsd.org/~bmilekic/mb_alloc/
2001-06-22 06:35:32 +00:00
Bosko Milekic
f5eece3fb9 Change m_devget()'s outdated and unused `offset' argument to actually mean
something: offset into the first mbuf of the target chain before copying
the source data over.

Make drivers using m_devget() with a first argument "data - ETHER_ALIGN"
to use the offset argument to pass ETHER_ALIGN in. The way it was previously
done is potentially dangerous if the source data was at the top of a page
and the offset caused the previous page to be copied (if the
previous page has not yet been appropriately mapped).

The old `offset' argument in m_devget() is not used anywhere (it's always
0) and dates back to ~1995 (and earlier?) when support for ethernet trailers
existed. With that support gone, it was merely collecting dust.

Tested on alpha by: jlemon
Partially submitted by: jlemon
Reviewed by: jlemon
MFC after: 3 weeks
2001-06-20 19:48:35 +00:00
Peter Wemm
db957588c9 Patch up a blunder I made a few days ago. nmbcnt was being initialized
too late.

Noted by:      bmilekic
Pointy-hat to: peter
2001-06-13 00:36:41 +00:00
Hajimu UMEMOTO
3384154590 Sync with recent KAME.
This work was based on kame-20010528-freebsd43-snap.tgz and some
critical problem after the snap was out were fixed.
There are many many changes since last KAME merge.

TODO:
  - The definitions of SADB_* in sys/net/pfkeyv2.h are still different
    from RFC2407/IANA assignment because of binary compatibility
    issue.  It should be fixed under 5-CURRENT.
  - ip6po_m member of struct ip6_pktopts is no longer used.  But, it
    is still there because of binary compatibility issue.  It should
    be removed under 5-CURRENT.

Reviewed by:	itojun
Obtained from:	KAME
MFC after:	3 weeks
2001-06-11 12:39:29 +00:00
Peter Wemm
0978669829 "Fix" the previous initial attempt at fixing TUNABLE_INT(). This time
around, use a common function for looking up and extracting the tunables
from the kernel environment.  This saves duplicating the same function
over and over again.  This way typically has an overhead of 8 bytes + the
path string, versus about 26 bytes + the path string.
2001-06-08 05:24:21 +00:00
Peter Wemm
4422746fdf Back out part of my previous commit. This was a last minute change
and I botched testing.  This is a perfect example of how NOT to do
this sort of thing. :-(
2001-06-07 03:17:26 +00:00
Peter Wemm
81930014ef Make the TUNABLE_*() macros look and behave more consistantly like the
SYSCTL_*() macros.  TUNABLE_INT_DECL() was an odd name because it didn't
actually declare the int, which is what the name suggests it would do.
2001-06-06 22:17:08 +00:00
David E. O'Brien
240ef84277 Back out jesper's 2001/05/31 14:58:11 PDT commit. It does not compile. 2001-06-01 09:51:14 +00:00
Jesper Skriver
e916d96e64 Move the definition of NMBCLUSTERS from src/sys/kern/uipc_mbuf.c
to <sys/param.h>, so it's available to src/sys/netinet/ip_input.c,
and remove the now unneeded includes of "opt_param.h".

MFC after:	1 week
2001-05-31 21:56:44 +00:00
Bosko Milekic
629db60492 Increment mbstat.m_mpfail, not mbstat.m_mcfail, when m_pullup() fails.
This slipped in accidently a few commits back.
2001-05-23 20:44:54 +00:00
Mark Murray
fb919e4d5a Undo part of the tangle of having sys/lock.h and sys/mutex.h included in
other "system" header files.

Also help the deprecation of lockmgr.h by making it a sub-include of
sys/lock.h and removing sys/lockmgr.h form kernel .c files.

Sort sys/*.h includes where possible in affected files.

OK'ed by:	bde (with reservations)
2001-05-01 08:13:21 +00:00
Bosko Milekic
d04d50d1f7 Fix inconsistency in setup of kernel_map: we need to make sure that
we also reserve _adequate_ space for the mb_map submap; i.e. we need
space for nmbclusters, nmbufs, _and_ nmbcnt. Furthermore, we need to
rounddown, and not roundup, so that we are consistent.

Pointed out by: bde
2001-04-18 23:54:13 +00:00
Bosko Milekic
4b8ae40a7c - Change the msleep()s to condition variables.
The mbuf and mcluster free lists now each "own" a condition variable,
  m_starved.

- Clean up minor indentention issues in sys/mbuf.h caused by previous
  commit.
2001-04-03 04:50:13 +00:00
Alfred Perlstein
5ea487f34d Use only one mutex for the entire mbuf subsystem.
Don't use atomic operations for the stats updating, instead protect
the counts with the mbuf mutex.  Most twiddling of the stats was
done right before or after releasing a mutex.  By doing this we
reduce the number of locked ops needed as well as allow a sysctl
to gain a consitant view of the entire stats structure.

In the future...

  This will allow us to chain common mbuf operations that would
  normally need to aquire/release 2 or 3 of the locks to build an
  mbuf with a cluster or external data attached into a single op
  requiring only one lock.

  Simplify the per-cpu locks that are planned.

There's also some if (1) code that should check if the "how"
operation specifies blocking/non-blocking behavior, we _could_ make
it so that we hold onto the mutex through calls into kmem_alloc
when non-blocking requests are made, but for safety reasons we
currently drop and reaquire the mutex around the calls.

Also, note that calling kmem_alloc is rare and only happens during
a shortage so drop/re-getting the mutex will not be a common
occurance.

Remove some #define's that seemed to obfuscate the code to me.

Remove an extranious comment.

Remove an XXX, including mutex.h isn't a crime.

Reviewed by: bmilekic
2001-04-03 03:15:11 +00:00
Bosko Milekic
2ba1a89559 Move the atomic() mbstat.m_drops incrementing to the MGET(HDR) and
MCLGET macros in order to avoid incrementing the drop count twice.
Otherwise, in some cases, we may increment m_drops once in m_mballoc()
for example, and increment it again in m_mballoc_wait() if the
wait fails.
2001-03-24 23:47:52 +00:00
Bosko Milekic
9612101e4c Fix a couple of things in the internal mbuf allocation interface:
- Make sure that m_mballoc() really doesn't allow over nmbufs mbufs to
  be allocated from mb_map. In the case where nmbufs-reserved space is not
  an exact multiple of PAGE_SIZE (which it should be, but anyway...), we
  hold nmbufs as an absolute maximum which need not ever be reached.

- Clean up m_clalloc(); make it more consistent in the sense that the first
  argument `ncl' really means "the number of clusters ensured to be allocated"
  and not "the number of pages worth of clusters to be allocated," as was
  previously the case. This also makes it consistent with m_mballoc() as well
  as the comment that preceeds it.

Reviewed by: jlemon
2001-03-17 23:23:24 +00:00
Boris Popov
03137ec82e Fix parameter order in the calls to MGET(). 2001-02-21 09:24:13 +00:00
Luigi Rizzo
5fe86675f0 Preserve alignment of first mbuf in m_copypacket.
This is useful when doing copies of packet where some leading
space has been preallocated to insert protocol headers.
Note that there are in fact almost no users of m_copypacket.

MFC candidate.
2001-02-20 08:23:41 +00:00
Bosko Milekic
fffd12bd72 Implement m_getm() which will perform an "all or nothing" mbuf + cluster
allocation, as required.

If m_getm() receives NULL as a first argument, then it allocates `len'
(second argument) bytes worth of mbufs + clusters and returns the chain
only if it was able to allocate everything.
If the first argument is non-NULL, then it should be an existing mbuf
chain (e.g. pre-allocated mbuf sitting on a ring, on some list, etc.) and
so it will allocate `len' bytes worth of clusters and mbufs, as needed,
and append them to the tail of the passed in chain, only if it was able
to allocate everything requested.

If allocation fails, only what was allocated by the routine will be freed,
and NULL will be returned.

Also, get rid of existing m_getm() in netncp code and replace calls to it
to calls to this new generic code.

Heavily Reviewed by: bp
2001-02-14 05:13:04 +00:00
Bosko Milekic
122a814af5 Long awaited style fixup in mbuf code. Get rid of K&R style prototyping
and function argument declarations. Make sure that functions that are
supposed to return a pointer return NULL in case of failure. Don't cast
NULL. Finally, get rid of annoying `register' uses.
2001-02-11 05:02:06 +00:00
Bosko Milekic
9ed346bab0 Change and clean the mutex lock interface.
mtx_enter(lock, type) becomes:

mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks)
mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized)

similarily, for releasing a lock, we now have:

mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN.
We change the caller interface for the two different types of locks
because the semantics are entirely different for each case, and this
makes it explicitly clear and, at the same time, it rids us of the
extra `type' argument.

The enter->lock and exit->unlock change has been made with the idea
that we're "locking data" and not "entering locked code" in mind.

Further, remove all additional "flags" previously passed to the
lock acquire/release routines with the exception of two:

MTX_QUIET and MTX_NOSWITCH

The functionality of these flags is preserved and they can be passed
to the lock/unlock routines by calling the corresponding wrappers:

mtx_{lock, unlock}_flags(lock, flag(s)) and
mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN
locks, respectively.

Re-inline some lock acq/rel code; in the sleep lock case, we only
inline the _obtain_lock()s in order to ensure that the inlined code
fits into a cache line. In the spin lock case, we inline recursion and
actually only perform a function call if we need to spin. This change
has been made with the idea that we generally tend to avoid spin locks
and that also the spin locks that we do have and are heavily used
(i.e. sched_lock) do recurse, and therefore in an effort to reduce
function call overhead for some architectures (such as alpha), we
inline recursion for this case.

Create a new malloc type for the witness code and retire from using
the M_DEV type. The new type is called M_WITNESS and is only declared
if WITNESS is enabled.

Begin cleaning up some machdep/mutex.h code - specifically updated the
"optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN
and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently
need those.

Finally, caught up to the interface changes in all sys code.

Contributors: jake, jhb, jasone (in no particular order)
2001-02-09 06:11:45 +00:00
John Baldwin
5dbc7fe2d7 Don't bother with acquiring/releasing Giant around kmem_malloc() and
kmem_free() for now.  Kmem_malloc() and kmem_free() now have appropriate
assertions in place, and these checks aren't feasible until more of the
networking code is locked down.  Also, the extra assertions here should
already be caught by the WITNESS code as lock order violations should
mutex operations on Giant be reintroduced here later.
2001-02-08 00:27:38 +00:00
Bosko Milekic
56acb799b2 When short of mbufs or mbuf clusters, we sleep on appropriate "counters."
The counters are incremented when a thread goes to sleep and decremented
either when a thread is woken up by another thread or when the sleep
times out. There existed a race where the sleep count could be decremented
twice resulting in an eventual underflow.
Move the decrementing of the "counters" to the thread initiating the sleep
and thus remedy the problem.
2001-01-20 21:29:10 +00:00
Bosko Milekic
35c05ac61b Add some KASSERTs valid if WITNESS is defined to verify that the mbuf
allocation routines are being called safely. Since we drop our relevant
mbuf mutex and acquire Giant before we call kmem_malloc(), we have
to make sure that this does not pave the way for a fatal lock order
reversal. Check that either Giant is already held (in which case it's safe
to grab it again and recurse on it) or, if Giant is not held, that no
other locks are held before we try to acquire Giant.

Similarily, add a KASSERT valid in the WITNESS case in m_reclaim() to
nail callers who end up in m_reclaim() and hold a lock.

Pointed out by: jhb
2001-01-16 01:53:13 +00:00
Bosko Milekic
d113d3857e In m_mballoc_wait(), drop the mmbfree mutex lock prior to calling
m_reclaim() and re-acquire it when m_reclaim() returns. This means that
we now call the drain routines without holding the mutex lock and
recursing into it. This was done for mainly two reasons:

(i) Avoid the long recursion; long recursions are typically bad and this
    is the case here because we block all other code from freeing mbufs
    if they need to. Doing that is kind of counter-productive, since we're
    really hoping that someone will free.

(ii) More importantly, avoid a potential lock order reversal. Right now,
     not all the locks have been added to our networking code; but
     without this change, we're introducing the possibility for deadlock.
     Consider for example ip_drain(). We will likely eventually introduce
     a lock for ipq there, and so ip_freef() will be called with ipq lock
     held. But, ip_freef() calls m_freem() which in turn acquires the
     mmbfree lock. Since we were previously calling ip_drain() with mmbfree
     held, our lock order would be: mmbfree->ipq->mmbfree. Some other code
     may very well lock ipq first and then call ip_freef(). This would
     result in the regular lock order, ipq->mmbfree. Clearly, we have
     deadlock if one thread acquires the ipq lock and sits waiting for
     mmbfree while another thread calling m_reclaim() acquires mmbfree
     and sits waiting for the ipq lock.

Also, make sure to add a comment above m_reclaim()'s definition briefly
explaining this. Also document this above the call to m_reclaim() in
m_mballoc_wait().

Suggested and reviewed by: alfred
2001-01-09 23:58:56 +00:00
Bosko Milekic
2a0c503e7a * Rename M_WAIT mbuf subsystem flag to M_TRYWAIT.
This is because calls with M_WAIT (now M_TRYWAIT) may not wait
  forever when nothing is available for allocation, and may end up
  returning NULL. Hopefully we now communicate more of the right thing
  to developers and make it very clear that it's necessary to check whether
  calls with M_(TRY)WAIT also resulted in a failed allocation.
  M_TRYWAIT basically means "try harder, block if necessary, but don't
  necessarily wait forever." The time spent blocking is tunable with
  the kern.ipc.mbuf_wait sysctl.
  M_WAIT is now deprecated but still defined for the next little while.

* Fix a typo in a comment in mbuf.h

* Fix some code that was actually passing the mbuf subsystem's M_WAIT to
  malloc(). Made it pass M_WAITOK instead. If we were ever to redefine the
  value of the M_WAIT flag, this could have became a big problem.
2000-12-21 21:44:31 +00:00
John Baldwin
35e0e5b311 Catch up to moving headers:
- machine/ipl.h -> sys/ipl.h
- machine/mutex.h -> sys/mutex.h
2000-10-20 07:58:15 +00:00
Bosko Milekic
181d2a1564 Add nmbcnt sysctl and make it tunable at boottime; nmbcnt is the
number of ext_buf counters that are possibly allocatable.

Do this because:

  (i) It will make it easier to influence EXT_COUNTERS for if_sk,
      if_ti (or similar) users where the driver allocates its own
      ext_bufs and where it is important for the mbuf system to take
      it into account when reserving necessary space for counters.

  (ii) Facilitate some percentile calculation for netstat(1)
2000-10-15 06:24:07 +00:00
Bosko Milekic
7d03271452 Big mbuf subsystem diff #1: incorporate mutexes and fix things up somewhat
to accomodate the changes.

 Here's a list of things that have changed (I may have left out a few); for a
 relatively complete list, see http://people.freebsd.org/~bmilekic/mtx_journal

   * Remove old (once useful) mcluster code for MCLBYTES > PAGE_SIZE which
     nobody uses anymore. It was great while it lasted, but now we're moving
     onto bigger and better things (Approved by: wollman).

   * Practically re-wrote the allocation macros in sys/sys/mbuf.h to accomodate
     new allocations which grab the necessary lock.

   * Make sure that necessary mbstat variables are manipulated with
     corresponding atomic() routines.

   * Changed the "wait" routines, cleaned it up, made one routine that does
     the job.

   * Generalized MWAKEUP() macro. Got rid of m_retry and m_retryhdr, as they
     are now included in the generalized "wait" routines.

   * Sleep routines now use msleep().

   * Free lists have locks.

   * etc... probably other stuff I'm missing...

  Things to look out for and work on later:

   * find a better way to (dynamically) adjust EXT_COUNTERS

   * move necessity to recurse on a lock from drain routines by providing
     lock-free lower-level version of MFREE() (and possibly m_free()?).

   * checkout include of mutex.h in sys/sys/mbuf.h - probably violating
     general philosophy here.

   The code has been reviewed quite a bit, but problems may arise... please,
   don't panic! Send me Emails: bmilekic@freebsd.org

Reviewed by: jlemon, cp, alfred, others?
2000-09-30 06:30:39 +00:00
Peter Wemm
9579e8c145 m_mballoc_wait() had a spl/tsleep race. mbufs can be freed in interrupt
context, which can cause a wakeup.. which can race with this.
2000-08-25 22:28:08 +00:00
David Malone
a5c4836d39 Replace the mbuf external reference counting code with something
that should be better.

The old code counted references to mbuf clusters by using the offset
of the cluster from the start of memory allocated for mbufs and
clusters as an index into an array of chars, which did the reference
counting. If the external storage was not a cluster then reference
counting had to be done by the code using that external storage.

NetBSD's system of linked lists of mbufs was cosidered, but Alfred
felt it would have locking issues when the kernel was made more
SMP friendly.

The system implimented uses a pool of unions to track external
storage. The union contains an int for counting the references and
a pointer for forming a free list. The reference counts are
incremented and decremented atomically and so should be SMP friendly.
This system can track reference counts for any sort of external
storage.

Access to the reference counting stuff is now through macros defined
in mbuf.h, so it should be easier to make changes to the system in
the future.

The possibility of storing the reference count in one of the
referencing mbufs was considered, but was rejected 'cos it would
often leave extra mbufs allocated. Storing the reference count in
the cluster was also considered, but because the external storage
may not be a cluster this isn't an option.

The size of the pool of reference counters is available in the
stats provided by "netstat -m".

PR:		19866
Submitted by:	Bosko Milekic <bmilekic@dsuper.net>
Reviewed by:	alfred (glanced at by others on -net)
2000-08-19 08:32:59 +00:00
Alfred Perlstein
9ad48853de mbstat should be a read-only sysctl.
Submitted by: Bosko Milekic <bmilekic@dsuper.net>
2000-07-31 09:24:32 +00:00