Commit Graph

118 Commits

Author SHA1 Message Date
luigi
ce685a3e04 Preserve alignment of first mbuf in m_copypacket.
This is useful when doing copies of packet where some leading
space has been preallocated to insert protocol headers.
Note that there are in fact almost no users of m_copypacket.

MFC candidate.
2001-02-20 08:23:41 +00:00
bmilekic
cc2f31e1a4 Implement m_getm() which will perform an "all or nothing" mbuf + cluster
allocation, as required.

If m_getm() receives NULL as a first argument, then it allocates `len'
(second argument) bytes worth of mbufs + clusters and returns the chain
only if it was able to allocate everything.
If the first argument is non-NULL, then it should be an existing mbuf
chain (e.g. pre-allocated mbuf sitting on a ring, on some list, etc.) and
so it will allocate `len' bytes worth of clusters and mbufs, as needed,
and append them to the tail of the passed in chain, only if it was able
to allocate everything requested.

If allocation fails, only what was allocated by the routine will be freed,
and NULL will be returned.

Also, get rid of existing m_getm() in netncp code and replace calls to it
to calls to this new generic code.

Heavily Reviewed by: bp
2001-02-14 05:13:04 +00:00
bmilekic
cc52eb42bf Long awaited style fixup in mbuf code. Get rid of K&R style prototyping
and function argument declarations. Make sure that functions that are
supposed to return a pointer return NULL in case of failure. Don't cast
NULL. Finally, get rid of annoying `register' uses.
2001-02-11 05:02:06 +00:00
bmilekic
f364d4ac36 Change and clean the mutex lock interface.
mtx_enter(lock, type) becomes:

mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks)
mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized)

similarily, for releasing a lock, we now have:

mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN.
We change the caller interface for the two different types of locks
because the semantics are entirely different for each case, and this
makes it explicitly clear and, at the same time, it rids us of the
extra `type' argument.

The enter->lock and exit->unlock change has been made with the idea
that we're "locking data" and not "entering locked code" in mind.

Further, remove all additional "flags" previously passed to the
lock acquire/release routines with the exception of two:

MTX_QUIET and MTX_NOSWITCH

The functionality of these flags is preserved and they can be passed
to the lock/unlock routines by calling the corresponding wrappers:

mtx_{lock, unlock}_flags(lock, flag(s)) and
mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN
locks, respectively.

Re-inline some lock acq/rel code; in the sleep lock case, we only
inline the _obtain_lock()s in order to ensure that the inlined code
fits into a cache line. In the spin lock case, we inline recursion and
actually only perform a function call if we need to spin. This change
has been made with the idea that we generally tend to avoid spin locks
and that also the spin locks that we do have and are heavily used
(i.e. sched_lock) do recurse, and therefore in an effort to reduce
function call overhead for some architectures (such as alpha), we
inline recursion for this case.

Create a new malloc type for the witness code and retire from using
the M_DEV type. The new type is called M_WITNESS and is only declared
if WITNESS is enabled.

Begin cleaning up some machdep/mutex.h code - specifically updated the
"optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN
and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently
need those.

Finally, caught up to the interface changes in all sys code.

Contributors: jake, jhb, jasone (in no particular order)
2001-02-09 06:11:45 +00:00
jhb
9b6b2fecee Don't bother with acquiring/releasing Giant around kmem_malloc() and
kmem_free() for now.  Kmem_malloc() and kmem_free() now have appropriate
assertions in place, and these checks aren't feasible until more of the
networking code is locked down.  Also, the extra assertions here should
already be caught by the WITNESS code as lock order violations should
mutex operations on Giant be reintroduced here later.
2001-02-08 00:27:38 +00:00
bmilekic
adf34d0448 When short of mbufs or mbuf clusters, we sleep on appropriate "counters."
The counters are incremented when a thread goes to sleep and decremented
either when a thread is woken up by another thread or when the sleep
times out. There existed a race where the sleep count could be decremented
twice resulting in an eventual underflow.
Move the decrementing of the "counters" to the thread initiating the sleep
and thus remedy the problem.
2001-01-20 21:29:10 +00:00
bmilekic
3650624f86 Add some KASSERTs valid if WITNESS is defined to verify that the mbuf
allocation routines are being called safely. Since we drop our relevant
mbuf mutex and acquire Giant before we call kmem_malloc(), we have
to make sure that this does not pave the way for a fatal lock order
reversal. Check that either Giant is already held (in which case it's safe
to grab it again and recurse on it) or, if Giant is not held, that no
other locks are held before we try to acquire Giant.

Similarily, add a KASSERT valid in the WITNESS case in m_reclaim() to
nail callers who end up in m_reclaim() and hold a lock.

Pointed out by: jhb
2001-01-16 01:53:13 +00:00
bmilekic
3726db774d In m_mballoc_wait(), drop the mmbfree mutex lock prior to calling
m_reclaim() and re-acquire it when m_reclaim() returns. This means that
we now call the drain routines without holding the mutex lock and
recursing into it. This was done for mainly two reasons:

(i) Avoid the long recursion; long recursions are typically bad and this
    is the case here because we block all other code from freeing mbufs
    if they need to. Doing that is kind of counter-productive, since we're
    really hoping that someone will free.

(ii) More importantly, avoid a potential lock order reversal. Right now,
     not all the locks have been added to our networking code; but
     without this change, we're introducing the possibility for deadlock.
     Consider for example ip_drain(). We will likely eventually introduce
     a lock for ipq there, and so ip_freef() will be called with ipq lock
     held. But, ip_freef() calls m_freem() which in turn acquires the
     mmbfree lock. Since we were previously calling ip_drain() with mmbfree
     held, our lock order would be: mmbfree->ipq->mmbfree. Some other code
     may very well lock ipq first and then call ip_freef(). This would
     result in the regular lock order, ipq->mmbfree. Clearly, we have
     deadlock if one thread acquires the ipq lock and sits waiting for
     mmbfree while another thread calling m_reclaim() acquires mmbfree
     and sits waiting for the ipq lock.

Also, make sure to add a comment above m_reclaim()'s definition briefly
explaining this. Also document this above the call to m_reclaim() in
m_mballoc_wait().

Suggested and reviewed by: alfred
2001-01-09 23:58:56 +00:00
bmilekic
4b6a7bddad * Rename M_WAIT mbuf subsystem flag to M_TRYWAIT.
This is because calls with M_WAIT (now M_TRYWAIT) may not wait
  forever when nothing is available for allocation, and may end up
  returning NULL. Hopefully we now communicate more of the right thing
  to developers and make it very clear that it's necessary to check whether
  calls with M_(TRY)WAIT also resulted in a failed allocation.
  M_TRYWAIT basically means "try harder, block if necessary, but don't
  necessarily wait forever." The time spent blocking is tunable with
  the kern.ipc.mbuf_wait sysctl.
  M_WAIT is now deprecated but still defined for the next little while.

* Fix a typo in a comment in mbuf.h

* Fix some code that was actually passing the mbuf subsystem's M_WAIT to
  malloc(). Made it pass M_WAITOK instead. If we were ever to redefine the
  value of the M_WAIT flag, this could have became a big problem.
2000-12-21 21:44:31 +00:00
jhb
d944886e4d Catch up to moving headers:
- machine/ipl.h -> sys/ipl.h
- machine/mutex.h -> sys/mutex.h
2000-10-20 07:58:15 +00:00
bmilekic
1f6d21dd38 Add nmbcnt sysctl and make it tunable at boottime; nmbcnt is the
number of ext_buf counters that are possibly allocatable.

Do this because:

  (i) It will make it easier to influence EXT_COUNTERS for if_sk,
      if_ti (or similar) users where the driver allocates its own
      ext_bufs and where it is important for the mbuf system to take
      it into account when reserving necessary space for counters.

  (ii) Facilitate some percentile calculation for netstat(1)
2000-10-15 06:24:07 +00:00
bmilekic
73f1784807 Big mbuf subsystem diff #1: incorporate mutexes and fix things up somewhat
to accomodate the changes.

 Here's a list of things that have changed (I may have left out a few); for a
 relatively complete list, see http://people.freebsd.org/~bmilekic/mtx_journal

   * Remove old (once useful) mcluster code for MCLBYTES > PAGE_SIZE which
     nobody uses anymore. It was great while it lasted, but now we're moving
     onto bigger and better things (Approved by: wollman).

   * Practically re-wrote the allocation macros in sys/sys/mbuf.h to accomodate
     new allocations which grab the necessary lock.

   * Make sure that necessary mbstat variables are manipulated with
     corresponding atomic() routines.

   * Changed the "wait" routines, cleaned it up, made one routine that does
     the job.

   * Generalized MWAKEUP() macro. Got rid of m_retry and m_retryhdr, as they
     are now included in the generalized "wait" routines.

   * Sleep routines now use msleep().

   * Free lists have locks.

   * etc... probably other stuff I'm missing...

  Things to look out for and work on later:

   * find a better way to (dynamically) adjust EXT_COUNTERS

   * move necessity to recurse on a lock from drain routines by providing
     lock-free lower-level version of MFREE() (and possibly m_free()?).

   * checkout include of mutex.h in sys/sys/mbuf.h - probably violating
     general philosophy here.

   The code has been reviewed quite a bit, but problems may arise... please,
   don't panic! Send me Emails: bmilekic@freebsd.org

Reviewed by: jlemon, cp, alfred, others?
2000-09-30 06:30:39 +00:00
peter
84c0ff464e m_mballoc_wait() had a spl/tsleep race. mbufs can be freed in interrupt
context, which can cause a wakeup.. which can race with this.
2000-08-25 22:28:08 +00:00
dwmalone
df0e25bf6c Replace the mbuf external reference counting code with something
that should be better.

The old code counted references to mbuf clusters by using the offset
of the cluster from the start of memory allocated for mbufs and
clusters as an index into an array of chars, which did the reference
counting. If the external storage was not a cluster then reference
counting had to be done by the code using that external storage.

NetBSD's system of linked lists of mbufs was cosidered, but Alfred
felt it would have locking issues when the kernel was made more
SMP friendly.

The system implimented uses a pool of unions to track external
storage. The union contains an int for counting the references and
a pointer for forming a free list. The reference counts are
incremented and decremented atomically and so should be SMP friendly.
This system can track reference counts for any sort of external
storage.

Access to the reference counting stuff is now through macros defined
in mbuf.h, so it should be easier to make changes to the system in
the future.

The possibility of storing the reference count in one of the
referencing mbufs was considered, but was rejected 'cos it would
often leave extra mbufs allocated. Storing the reference count in
the cluster was also considered, but because the external storage
may not be a cluster this isn't an option.

The size of the pool of reference counters is available in the
stats provided by "netstat -m".

PR:		19866
Submitted by:	Bosko Milekic <bmilekic@dsuper.net>
Reviewed by:	alfred (glanced at by others on -net)
2000-08-19 08:32:59 +00:00
alfred
36e9193ef5 mbstat should be a read-only sysctl.
Submitted by: Bosko Milekic <bmilekic@dsuper.net>
2000-07-31 09:24:32 +00:00
alfred
19218742c7 Make mbstat.m_mtypes seperate and viewable via sysctl, also
expand the size from short to ulong

Submitted by: Ian Dowse <iedowse@maths.tcd.ie>
PR: kern/19809
2000-07-15 06:02:48 +00:00
itojun
5f4e854de1 sync with kame tree as of july00. tons of bug fixes/improvements.
API changes:
- additional IPv6 ioctls
- IPsec PF_KEY API was changed, it is mandatory to upgrade setkey(8).
  (also syntax change)
2000-07-04 16:35:15 +00:00
msmith
04d30dbb49 Actively limit the allocation of mbufs to NMBUFS/nmbufs and mbuf clusters
to NMBCLUSTERS/nmbclusters/kern.ipc.nmbclusters.

Add a read-only sysctl kern.ipc.nmbufs matching kern.ipc.nmbclusters.

Submitted by:	Bosko Milekic <bmilekic@dsuper.net>
1999-12-28 06:35:57 +00:00
eivind
d120f08b18 Make m_print const correct (avoids a warning) 1999-12-20 18:10:00 +00:00
green
794cbab436 Woops, I'm so sorry I forgot this! From the last mbuf.h change:
m_mballoc_wakeup() (inline) -> MMBWAKEUP() (macro)
	m_clalloc_wakeup() (inline) -> MCLWAKEUP() (macro)

Noticed by:	peter
1999-12-18 20:04:19 +00:00
green
92bdda92f4 Bug fix:
The variables "m_mclalloc_wid" and "m_mballoc_wid" were not in the
proper place.  They should have been in uipc_mbuf.c and have been global,
not in mbuf.h and local per each file that uses mbuf.h.

Sorta bug fix:
   In mbuf.h, the definitions of various things for KERNEL and not
KERNEL cases were very screwy.   This fixes all of that which I could
find.
1999-12-14 02:23:14 +00:00
green
56a46611e1 This is Bosko Milekic's mbuf allocation waiting code. Basically, this
means that running out of mbuf space isn't a panic anymore, and code
which runs out of network memory will sleep to wait for it.

Submitted by:	Bosko Milekic <bmilekic@dsuper.net>
Reviewed by:	green, wollman
1999-12-12 05:52:51 +00:00
archie
8e6dc36b73 The functions m_copym() and m_copypacket() return read-only copies,
because in the case of mbuf clusters they only increment the reference
count rather than actually copying the data.

Add comments to this effect, and add a new routine called m_dup() that
returns a real, writable copy of an mbuf chain.

This is preliminary work required for implementing 'ipfw tee'.

Reviewed by:	julian
1999-12-01 22:31:32 +00:00
peter
0630e91e80 Fix a warning. 1999-11-18 06:29:57 +00:00
phk
d4a4beb863 New function:
m_print(struct mbuf *);
hexdumps a mbuf.
1999-11-01 15:03:20 +00:00
alfred
c92072a8ed change identical and "programming error" panic("mcopy*")'s into
more verbose messages using KASSERT.

Reviewed by: eivind, des
1999-10-13 09:55:42 +00:00
peter
3b842d34e8 $Id$ -> $FreeBSD$ 1999-08-28 01:08:13 +00:00
msmith
af7615d39a Move the initialisation/tuning of nmbclusters from param.c/machdep.c
into uipc_mbuf.c.  This reduces three sets of identical tunable code to
one set, and puts the initialisation with the mbuf code proper.

Make NMBUFs tunable as well.

Move the nmbclusters sysctl here as well.

Move the initialisation of maxsockets from param.c to uipc_socket2.c,
next to its corresponding sysctl.

Use the new tunable macros for the kern.vm.kmem.size tunable (this should have
been in a separate commit, whoops).
1999-07-05 08:52:54 +00:00
peter
6cb5fe6c6b Slight reorganization of kernel thread/process creation. Instead of using
SYSINIT_KT() etc (which is a static, compile-time procedure), use a
NetBSD-style kthread_create() interface.  kproc_start is still available
as a SYSINIT() hook.  This allowed simplification of chunks of the
sysinit code in the process.  This kthread_create() is our old kproc_start
internals, with the SYSINIT_KT fork hooks grafted in and tweaked to work
the same as the NetBSD one.

One thing I'd like to do shortly is get rid of nfsiod as a user initiated
process.  It makes sense for the nfs client code to create them on the
fly as needed up to a user settable limit.  This means that nfsiod
doesn't need to be in /sbin and is always "available".  This is a fair bit
easier to do outside of the SYSINIT_KT() framework.
1999-07-01 13:21:46 +00:00
des
9ecd5a48c9 Typo in comment. 1999-04-12 10:07:15 +00:00
dfr
22ceb237f0 * Change sysctl from using linker_set to construct its tree using SLISTs.
This makes it possible to change the sysctl tree at runtime.

* Change KLD to find and register any sysctl nodes contained in the loaded
  file and to unregister them when the file is unloaded.

Reviewed by: Archie Cobbs <archie@whistle.com>,
	Peter Wemm <peter@netplex.com.au> (well they looked at it anyway)
1999-02-16 10:49:55 +00:00
dg
2b482f824a Only call m_reclaim() if M_WAIT since calling it from an interrupt can
cause problems.
PR:	7403
1998-07-27 03:59:48 +00:00
phk
2d9a9ab304 Update M_EXT support in m_copypacket().
PR:		7122
Reviewed by:	phk
Submitted by:	Castor Fu <castor@geocast.com>
Originally forgotten by:	julian
1998-07-03 08:36:48 +00:00
dg
6ffbdd6b2e If we are out of mb_map space and we failed to m_reclaim() anything and
the alloc is not M_DONTWAIT, then panic with "Out of mbuf clusters".
Callers that specify M_WAIT can't deal with getting a NULL buffer, so this
is a more graceful failure than randomly page faulting in the socket code
or elsewhere.
1998-06-05 21:41:48 +00:00
bde
999552194c Don't depend on "implicit int". 1998-02-20 13:37:40 +00:00
bde
485d4efb1c Restored used include of <sys/malloc.h>. malloc() is not used
here, but kmem_malloc() is used and it takes the same "flags" as
malloc().

Use the mbuf allocation "flags" M_WAIT and M_DONTWAIT consistently.
There is really only one boolean flag, M_DONTWAIT, but the "flags"
were always treated as enum-like values, except in some places here
where the values are tacitly converted to boolean flags.  Treat
them as enum-like values everywhere, except where we tacitly assume
that there are only two values in order to convert them to the
corresponding two kmem_malloc() "flags".
1997-12-28 01:01:13 +00:00
bde
fb826377ff Removed unused #includes. 1997-10-28 15:59:26 +00:00
phk
36e7a51ea1 Last major round (Unless Bruce thinks of somthing :-) of malloc changes.
Distribute all but the most fundamental malloc types.  This time I also
remembered the trick to making things static:  Put "static" in front of
them.

A couple of finer points by:	bde
1997-10-12 20:26:33 +00:00
bde
9195bd1ec7 Removed unused #includes. 1997-08-02 14:33:27 +00:00
wollman
a6e3c2a3ce Create a new branch of the kernel MIB, kern.ipc, to store
all of the configurables and instrumentation related to
inter-process communication mechanisms.  Some variables,
like mbuf statistics, are instrumented here for the first
time.

For mbuf statistics: also keep track of m_copym() and
m_pullup() failures, and provide for the user's inspection
the compiled-in values of MSIZE, MHLEN, MCLBYTES, and MINCLSIZE.
1997-02-24 20:30:58 +00:00
wollman
348a992441 uipc_mbuf.c: do a better job of counting how often we have to wait
for memory, or are denied a cluster.

uipc_socket2.c: define some generic ``operation-not-supported'' entry points
for pr_usrreqs.
1997-02-18 20:43:07 +00:00
wollman
1f8c9c194a Provide an alternative mbuf cluster allocator which permits use of
clusters greater than one page in length by calling contigmalloc1().
This uses a helper process `mclalloc' to do the allocation if
the system runs out at interrupt time to avoid calling contigmalloc
at high spl.  It is not yet clear to me whether this works.
1997-02-13 19:41:40 +00:00
dg
78afd808d5 Fix bug related to map entry allocations where a sleep might be attempted
when allocating memory for network buffers at interrupt time. This is due
to inadequate checking for the new mcl_map. Fixed by merging mb_map and
mcl_map into a single mb_map.

Reviewed by:	wollman
1997-01-15 20:46:02 +00:00
jkh
808a36ef65 Make the long-awaited change from $Id$ to $FreeBSD$
This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore.  This update would have been
insane otherwise.
1997-01-14 07:20:47 +00:00
julian
7017ff366e fix handling of external objects referenced by mbufs
somehow this got broken between 4.3 tahoe and here, though I've been using
these fixes for over a year here..
1996-08-19 03:32:10 +00:00
phk
e27ee568ac Ups, I removed NMB_INIT too.
Complained about by:	asami
1996-05-12 07:48:47 +00:00
phk
a27724a359 Nail down NCL_INIT = 1, and put a comment there telling what it is. 1996-05-11 20:43:23 +00:00
wollman
a28a8481af Allocate mbufs from a separate submap so that NMBCLUSTERS works as
expected.
1996-05-10 19:28:55 +00:00
wollman
577771db62 Our new-old mbugf allocator. This is actually something of a blast from
the past, since it returns to the old system of allocating mbufs out of
a private area rather than using the kernel malloc().  While this may seem
like a backwards step to some, the new allocator is some 20% faster than
the old one and has much better caching properties.

Written by: John Wroclawski <jtw@lcs.mit.edu>
1996-05-08 19:38:27 +00:00
phk
05b36227ad An old typo MCLBYTES/CLBYTES became more obvious bogus now.
Submitted by:		wollman
1996-05-06 17:18:12 +00:00
phk
5a6fb3a7da removed:
CLBYTES PD_SHIFT PGSHIFT NBPG PGOFSET CLSIZELOG2 CLSIZE pdei()
        ptei() kvtopte() ptetov() ispt() ptetoav() &c &c
new:
        NPDEPG

Major macro cleanup.
1996-05-02 14:21:14 +00:00
phk
63ec2c0ae9 A Major staticize sweep. Generates a couple of warnings that I'll deal
with later.
A number of unused vars removed.
A number of unused procs removed or #ifdefed.
1995-12-14 08:32:45 +00:00
dg
c30f46c534 Untangled the vm.h include file spaghetti. 1995-12-07 12:48:31 +00:00
bde
3089ef0816 Completed function declarations and/or added prototypes. 1995-12-02 18:58:56 +00:00
bde
338d20055d Finished (?) cleaning up sysinit stuff. 1995-12-02 17:11:20 +00:00
phk
88d6fa4d4a Second batch of cleanup changes.
This time mostly making a lot of things static and some unused
variables here and there.
1995-10-29 15:33:36 +00:00
dg
573c688a68 Fixed init functions argument type - caddr_t -> void *. Fixed a couple of
compiler warnings.
1995-09-09 18:10:37 +00:00
julian
ebb726ec45 Reviewed by: julian with quick glances by bruce and others
Submitted by:	terry (terry lambert)
This is  a composite of 3 patch sets submitted by terry.
they are:
New low-level init code that supports loadbal modules better
some cleanups in the namei code to help terry in 16-bit character support
some changes to the mount-root code to make it a little more
modular..

NOTE: mounting root off cdrom or NFS MIGHT be broken as I haven't been able
to test those cases..

certainly mounting root of disk still works just fine..
mfs should work but is untested. (tomorrows task)

The low level init stuff includes a total rewrite of init_main.c
to make it possible for new modules to have an init phase by simply
adding an entry to a TEXT_SET (or is it DATA_SET) list. thus a new module can
be added to the kernel without editing any other files other than the
'files' file.
1995-08-28 09:19:25 +00:00
bde
38a74a7bc2 Eliminate sloppy common-style declarations. There should be none left for
the LINT configuation.
1995-07-29 11:44:31 +00:00
dg
c20bdce1e0 Special cased the handling of mb_map in the M_WAITOK case. kmem_malloc()
now returns NULL and sets a global 'mb_map_full' when the map is full.
m_clalloc() has further been taught to expect this and do the right thing.
This should fix the "mb_map full" panics that several people have reported.
1995-03-15 07:52:06 +00:00
dg
3f8844f942 Implemented mbstat.m_wait and mbstat.m_drops. 1995-02-23 19:10:21 +00:00
bde
5131cd5fd6 Update kmem_malloc() call to new waitflag(s) interface.
This might fix recent problems on thud and freefall.
1995-02-05 07:08:27 +00:00
dg
8ec51aef1b Got rid of map.h. It's a leftover from the rmap code, and we use rlists.
Changed swapmap into swaplist.
1994-10-09 07:35:18 +00:00
phk
094a6feedc Moved m_copyback into uipc_mbuf.c 1994-10-04 06:50:01 +00:00
phk
c3e4945541 All of this is cosmetic. prototypes, #includes, printfs and so on. Makes
GCC a lot more silent.
1994-10-02 17:35:40 +00:00
dg
8d205697aa Added $Id$ 1994-08-02 07:55:43 +00:00
rgrimes
2469c867a1 The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch.
Reviewed by:	Rodney W. Grimes
Submitted by:	John Dyson and David Greenman
1994-05-25 09:21:21 +00:00
rgrimes
8fb65ce818 BSD 4.4 Lite Kernel Sources 1994-05-24 10:09:53 +00:00