2001-12-23 22:04:08 +00:00
|
|
|
/*-
|
1994-05-24 10:09:53 +00:00
|
|
|
* Copyright (c) 1982, 1986, 1988, 1993
|
2007-03-24 20:19:44 +00:00
|
|
|
* The Regents of the University of California.
|
|
|
|
* All rights reserved.
|
1994-05-24 10:09:53 +00:00
|
|
|
*
|
|
|
|
* Redistribution and use in source and binary forms, with or without
|
|
|
|
* modification, are permitted provided that the following conditions
|
|
|
|
* are met:
|
|
|
|
* 1. Redistributions of source code must retain the above copyright
|
|
|
|
* notice, this list of conditions and the following disclaimer.
|
|
|
|
* 2. Redistributions in binary form must reproduce the above copyright
|
|
|
|
* notice, this list of conditions and the following disclaimer in the
|
|
|
|
* documentation and/or other materials provided with the distribution.
|
Bring in mbuma to replace mballoc.
mbuma is an Mbuf & Cluster allocator built on top of a number of
extensions to the UMA framework, all included herein.
Extensions to UMA worth noting:
- Better layering between slab <-> zone caches; introduce
Keg structure which splits off slab cache away from the
zone structure and allows multiple zones to be stacked
on top of a single Keg (single type of slab cache);
perhaps we should look into defining a subset API on
top of the Keg for special use by malloc(9),
for example.
- UMA_ZONE_REFCNT zones can now be added, and reference
counters automagically allocated for them within the end
of the associated slab structures. uma_find_refcnt()
does a kextract to fetch the slab struct reference from
the underlying page, and lookup the corresponding refcnt.
mbuma things worth noting:
- integrates mbuf & cluster allocations with extended UMA
and provides caches for commonly-allocated items; defines
several zones (two primary, one secondary) and two kegs.
- change up certain code paths that always used to do:
m_get() + m_clget() to instead just use m_getcl() and
try to take advantage of the newly defined secondary
Packet zone.
- netstat(1) and systat(1) quickly hacked up to do basic
stat reporting but additional stats work needs to be
done once some other details within UMA have been taken
care of and it becomes clearer to how stats will work
within the modified framework.
From the user perspective, one implication is that the
NMBCLUSTERS compile-time option is no longer used. The
maximum number of clusters is still capped off according
to maxusers, but it can be made unlimited by setting
the kern.ipc.nmbclusters boot-time tunable to zero.
Work should be done to write an appropriate sysctl
handler allowing dynamic tuning of kern.ipc.nmbclusters
at runtime.
Additional things worth noting/known issues (READ):
- One report of 'ips' (ServeRAID) driver acting really
slow in conjunction with mbuma. Need more data.
Latest report is that ips is equally sucking with
and without mbuma.
- Giant leak in NFS code sometimes occurs, can't
reproduce but currently analyzing; brueffer is
able to reproduce but THIS IS NOT an mbuma-specific
problem and currently occurs even WITHOUT mbuma.
- Issues in network locking: there is at least one
code path in the rip code where one or more locks
are acquired and we end up in m_prepend() with
M_WAITOK, which causes WITNESS to whine from within
UMA. Current temporary solution: force all UMA
allocations to be M_NOWAIT from within UMA for now
to avoid deadlocks unless WITNESS is defined and we
can determine with certainty that we're not holding
any locks when we're M_WAITOK.
- I've seen at least one weird socketbuffer empty-but-
mbuf-still-attached panic. I don't believe this
to be related to mbuma but please keep your eyes
open, turn on debugging, and capture crash dumps.
This change removes more code than it adds.
A paper is available detailing the change and considering
various performance issues, it was presented at BSDCan2004:
http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf
Please read the paper for Future Work and implementation
details, as well as credits.
Testing and Debugging:
rwatson,
brueffer,
Ketrien I. Saihr-Kesenchedra,
...
Reviewed by: Lots of people (for different parts)
2004-05-31 21:46:06 +00:00
|
|
|
* 3. Neither the name of the University nor the names of its contributors
|
1994-05-24 10:09:53 +00:00
|
|
|
* may be used to endorse or promote products derived from this software
|
|
|
|
* without specific prior written permission.
|
|
|
|
*
|
|
|
|
* THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
|
|
|
|
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
|
|
|
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
|
|
|
* ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
|
|
|
|
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
|
|
|
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
|
|
|
|
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
|
|
|
|
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
|
|
|
|
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
|
|
|
|
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
|
|
|
* SUCH DAMAGE.
|
|
|
|
*
|
1996-03-11 02:14:16 +00:00
|
|
|
* @(#)mbuf.h 8.5 (Berkeley) 2/19/95
|
1999-08-28 01:08:13 +00:00
|
|
|
* $FreeBSD$
|
1994-05-24 10:09:53 +00:00
|
|
|
*/
|
|
|
|
|
1994-08-21 19:19:39 +00:00
|
|
|
#ifndef _SYS_MBUF_H_
|
1999-12-18 13:52:44 +00:00
|
|
|
#define _SYS_MBUF_H_
|
1994-08-21 04:42:17 +00:00
|
|
|
|
Bring in mbuma to replace mballoc.
mbuma is an Mbuf & Cluster allocator built on top of a number of
extensions to the UMA framework, all included herein.
Extensions to UMA worth noting:
- Better layering between slab <-> zone caches; introduce
Keg structure which splits off slab cache away from the
zone structure and allows multiple zones to be stacked
on top of a single Keg (single type of slab cache);
perhaps we should look into defining a subset API on
top of the Keg for special use by malloc(9),
for example.
- UMA_ZONE_REFCNT zones can now be added, and reference
counters automagically allocated for them within the end
of the associated slab structures. uma_find_refcnt()
does a kextract to fetch the slab struct reference from
the underlying page, and lookup the corresponding refcnt.
mbuma things worth noting:
- integrates mbuf & cluster allocations with extended UMA
and provides caches for commonly-allocated items; defines
several zones (two primary, one secondary) and two kegs.
- change up certain code paths that always used to do:
m_get() + m_clget() to instead just use m_getcl() and
try to take advantage of the newly defined secondary
Packet zone.
- netstat(1) and systat(1) quickly hacked up to do basic
stat reporting but additional stats work needs to be
done once some other details within UMA have been taken
care of and it becomes clearer to how stats will work
within the modified framework.
From the user perspective, one implication is that the
NMBCLUSTERS compile-time option is no longer used. The
maximum number of clusters is still capped off according
to maxusers, but it can be made unlimited by setting
the kern.ipc.nmbclusters boot-time tunable to zero.
Work should be done to write an appropriate sysctl
handler allowing dynamic tuning of kern.ipc.nmbclusters
at runtime.
Additional things worth noting/known issues (READ):
- One report of 'ips' (ServeRAID) driver acting really
slow in conjunction with mbuma. Need more data.
Latest report is that ips is equally sucking with
and without mbuma.
- Giant leak in NFS code sometimes occurs, can't
reproduce but currently analyzing; brueffer is
able to reproduce but THIS IS NOT an mbuma-specific
problem and currently occurs even WITHOUT mbuma.
- Issues in network locking: there is at least one
code path in the rip code where one or more locks
are acquired and we end up in m_prepend() with
M_WAITOK, which causes WITNESS to whine from within
UMA. Current temporary solution: force all UMA
allocations to be M_NOWAIT from within UMA for now
to avoid deadlocks unless WITNESS is defined and we
can determine with certainty that we're not holding
any locks when we're M_WAITOK.
- I've seen at least one weird socketbuffer empty-but-
mbuf-still-attached panic. I don't believe this
to be related to mbuma but please keep your eyes
open, turn on debugging, and capture crash dumps.
This change removes more code than it adds.
A paper is available detailing the change and considering
various performance issues, it was presented at BSDCan2004:
http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf
Please read the paper for Future Work and implementation
details, as well as credits.
Testing and Debugging:
rwatson,
brueffer,
Ketrien I. Saihr-Kesenchedra,
...
Reviewed by: Lots of people (for different parts)
2004-05-31 21:46:06 +00:00
|
|
|
/* XXX: These includes suck. Sorry! */
|
2002-10-16 01:54:46 +00:00
|
|
|
#include <sys/queue.h>
|
Bring in mbuma to replace mballoc.
mbuma is an Mbuf & Cluster allocator built on top of a number of
extensions to the UMA framework, all included herein.
Extensions to UMA worth noting:
- Better layering between slab <-> zone caches; introduce
Keg structure which splits off slab cache away from the
zone structure and allows multiple zones to be stacked
on top of a single Keg (single type of slab cache);
perhaps we should look into defining a subset API on
top of the Keg for special use by malloc(9),
for example.
- UMA_ZONE_REFCNT zones can now be added, and reference
counters automagically allocated for them within the end
of the associated slab structures. uma_find_refcnt()
does a kextract to fetch the slab struct reference from
the underlying page, and lookup the corresponding refcnt.
mbuma things worth noting:
- integrates mbuf & cluster allocations with extended UMA
and provides caches for commonly-allocated items; defines
several zones (two primary, one secondary) and two kegs.
- change up certain code paths that always used to do:
m_get() + m_clget() to instead just use m_getcl() and
try to take advantage of the newly defined secondary
Packet zone.
- netstat(1) and systat(1) quickly hacked up to do basic
stat reporting but additional stats work needs to be
done once some other details within UMA have been taken
care of and it becomes clearer to how stats will work
within the modified framework.
From the user perspective, one implication is that the
NMBCLUSTERS compile-time option is no longer used. The
maximum number of clusters is still capped off according
to maxusers, but it can be made unlimited by setting
the kern.ipc.nmbclusters boot-time tunable to zero.
Work should be done to write an appropriate sysctl
handler allowing dynamic tuning of kern.ipc.nmbclusters
at runtime.
Additional things worth noting/known issues (READ):
- One report of 'ips' (ServeRAID) driver acting really
slow in conjunction with mbuma. Need more data.
Latest report is that ips is equally sucking with
and without mbuma.
- Giant leak in NFS code sometimes occurs, can't
reproduce but currently analyzing; brueffer is
able to reproduce but THIS IS NOT an mbuma-specific
problem and currently occurs even WITHOUT mbuma.
- Issues in network locking: there is at least one
code path in the rip code where one or more locks
are acquired and we end up in m_prepend() with
M_WAITOK, which causes WITNESS to whine from within
UMA. Current temporary solution: force all UMA
allocations to be M_NOWAIT from within UMA for now
to avoid deadlocks unless WITNESS is defined and we
can determine with certainty that we're not holding
any locks when we're M_WAITOK.
- I've seen at least one weird socketbuffer empty-but-
mbuf-still-attached panic. I don't believe this
to be related to mbuma but please keep your eyes
open, turn on debugging, and capture crash dumps.
This change removes more code than it adds.
A paper is available detailing the change and considering
various performance issues, it was presented at BSDCan2004:
http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf
Please read the paper for Future Work and implementation
details, as well as credits.
Testing and Debugging:
rwatson,
brueffer,
Ketrien I. Saihr-Kesenchedra,
...
Reviewed by: Lots of people (for different parts)
2004-05-31 21:46:06 +00:00
|
|
|
#ifdef _KERNEL
|
|
|
|
#include <sys/systm.h>
|
|
|
|
#include <vm/uma.h>
|
2004-07-21 07:12:24 +00:00
|
|
|
#ifdef WITNESS
|
|
|
|
#include <sys/lock.h>
|
|
|
|
#endif
|
Bring in mbuma to replace mballoc.
mbuma is an Mbuf & Cluster allocator built on top of a number of
extensions to the UMA framework, all included herein.
Extensions to UMA worth noting:
- Better layering between slab <-> zone caches; introduce
Keg structure which splits off slab cache away from the
zone structure and allows multiple zones to be stacked
on top of a single Keg (single type of slab cache);
perhaps we should look into defining a subset API on
top of the Keg for special use by malloc(9),
for example.
- UMA_ZONE_REFCNT zones can now be added, and reference
counters automagically allocated for them within the end
of the associated slab structures. uma_find_refcnt()
does a kextract to fetch the slab struct reference from
the underlying page, and lookup the corresponding refcnt.
mbuma things worth noting:
- integrates mbuf & cluster allocations with extended UMA
and provides caches for commonly-allocated items; defines
several zones (two primary, one secondary) and two kegs.
- change up certain code paths that always used to do:
m_get() + m_clget() to instead just use m_getcl() and
try to take advantage of the newly defined secondary
Packet zone.
- netstat(1) and systat(1) quickly hacked up to do basic
stat reporting but additional stats work needs to be
done once some other details within UMA have been taken
care of and it becomes clearer to how stats will work
within the modified framework.
From the user perspective, one implication is that the
NMBCLUSTERS compile-time option is no longer used. The
maximum number of clusters is still capped off according
to maxusers, but it can be made unlimited by setting
the kern.ipc.nmbclusters boot-time tunable to zero.
Work should be done to write an appropriate sysctl
handler allowing dynamic tuning of kern.ipc.nmbclusters
at runtime.
Additional things worth noting/known issues (READ):
- One report of 'ips' (ServeRAID) driver acting really
slow in conjunction with mbuma. Need more data.
Latest report is that ips is equally sucking with
and without mbuma.
- Giant leak in NFS code sometimes occurs, can't
reproduce but currently analyzing; brueffer is
able to reproduce but THIS IS NOT an mbuma-specific
problem and currently occurs even WITHOUT mbuma.
- Issues in network locking: there is at least one
code path in the rip code where one or more locks
are acquired and we end up in m_prepend() with
M_WAITOK, which causes WITNESS to whine from within
UMA. Current temporary solution: force all UMA
allocations to be M_NOWAIT from within UMA for now
to avoid deadlocks unless WITNESS is defined and we
can determine with certainty that we're not holding
any locks when we're M_WAITOK.
- I've seen at least one weird socketbuffer empty-but-
mbuf-still-attached panic. I don't believe this
to be related to mbuma but please keep your eyes
open, turn on debugging, and capture crash dumps.
This change removes more code than it adds.
A paper is available detailing the change and considering
various performance issues, it was presented at BSDCan2004:
http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf
Please read the paper for Future Work and implementation
details, as well as credits.
Testing and Debugging:
rwatson,
brueffer,
Ketrien I. Saihr-Kesenchedra,
...
Reviewed by: Lots of people (for different parts)
2004-05-31 21:46:06 +00:00
|
|
|
#endif
|
2002-07-30 22:03:57 +00:00
|
|
|
|
1994-05-24 10:09:53 +00:00
|
|
|
/*
|
2007-03-24 20:19:44 +00:00
|
|
|
* Mbufs are of a single size, MSIZE (sys/param.h), which includes overhead.
|
|
|
|
* An mbuf may add a single "mbuf cluster" of size MCLBYTES (also in
|
|
|
|
* sys/param.h), which has no additional overhead and is used instead of the
|
|
|
|
* internal data area; this is done when at least MINCLSIZE of data must be
|
|
|
|
* stored. Additionally, it is possible to allocate a separate buffer
|
|
|
|
* externally and attach it to the mbuf in a way similar to that of mbuf
|
|
|
|
* clusters.
|
1994-05-24 10:09:53 +00:00
|
|
|
*/
|
|
|
|
#define MLEN (MSIZE - sizeof(struct m_hdr)) /* normal data len */
|
|
|
|
#define MHLEN (MLEN - sizeof(struct pkthdr)) /* data len w/pkthdr */
|
1999-11-05 14:41:39 +00:00
|
|
|
#define MINCLSIZE (MHLEN + 1) /* smallest amount to put in cluster */
|
1994-05-24 10:09:53 +00:00
|
|
|
#define M_MAXCOMPRESS (MHLEN / 2) /* max amount to copy for compression */
|
|
|
|
|
2001-04-05 03:55:27 +00:00
|
|
|
#ifdef _KERNEL
|
2010-07-18 20:57:53 +00:00
|
|
|
/*-
|
2009-06-20 18:27:19 +00:00
|
|
|
* Macro for type conversion: convert mbuf pointer to data pointer of correct
|
|
|
|
* type:
|
|
|
|
*
|
2001-12-23 22:04:08 +00:00
|
|
|
* mtod(m, t) -- Convert mbuf pointer to data pointer of correct type.
|
1994-05-24 10:09:53 +00:00
|
|
|
*/
|
1999-12-19 01:47:16 +00:00
|
|
|
#define mtod(m, t) ((t)((m)->m_data))
|
Bring in mbuma to replace mballoc.
mbuma is an Mbuf & Cluster allocator built on top of a number of
extensions to the UMA framework, all included herein.
Extensions to UMA worth noting:
- Better layering between slab <-> zone caches; introduce
Keg structure which splits off slab cache away from the
zone structure and allows multiple zones to be stacked
on top of a single Keg (single type of slab cache);
perhaps we should look into defining a subset API on
top of the Keg for special use by malloc(9),
for example.
- UMA_ZONE_REFCNT zones can now be added, and reference
counters automagically allocated for them within the end
of the associated slab structures. uma_find_refcnt()
does a kextract to fetch the slab struct reference from
the underlying page, and lookup the corresponding refcnt.
mbuma things worth noting:
- integrates mbuf & cluster allocations with extended UMA
and provides caches for commonly-allocated items; defines
several zones (two primary, one secondary) and two kegs.
- change up certain code paths that always used to do:
m_get() + m_clget() to instead just use m_getcl() and
try to take advantage of the newly defined secondary
Packet zone.
- netstat(1) and systat(1) quickly hacked up to do basic
stat reporting but additional stats work needs to be
done once some other details within UMA have been taken
care of and it becomes clearer to how stats will work
within the modified framework.
From the user perspective, one implication is that the
NMBCLUSTERS compile-time option is no longer used. The
maximum number of clusters is still capped off according
to maxusers, but it can be made unlimited by setting
the kern.ipc.nmbclusters boot-time tunable to zero.
Work should be done to write an appropriate sysctl
handler allowing dynamic tuning of kern.ipc.nmbclusters
at runtime.
Additional things worth noting/known issues (READ):
- One report of 'ips' (ServeRAID) driver acting really
slow in conjunction with mbuma. Need more data.
Latest report is that ips is equally sucking with
and without mbuma.
- Giant leak in NFS code sometimes occurs, can't
reproduce but currently analyzing; brueffer is
able to reproduce but THIS IS NOT an mbuma-specific
problem and currently occurs even WITHOUT mbuma.
- Issues in network locking: there is at least one
code path in the rip code where one or more locks
are acquired and we end up in m_prepend() with
M_WAITOK, which causes WITNESS to whine from within
UMA. Current temporary solution: force all UMA
allocations to be M_NOWAIT from within UMA for now
to avoid deadlocks unless WITNESS is defined and we
can determine with certainty that we're not holding
any locks when we're M_WAITOK.
- I've seen at least one weird socketbuffer empty-but-
mbuf-still-attached panic. I don't believe this
to be related to mbuma but please keep your eyes
open, turn on debugging, and capture crash dumps.
This change removes more code than it adds.
A paper is available detailing the change and considering
various performance issues, it was presented at BSDCan2004:
http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf
Please read the paper for Future Work and implementation
details, as well as credits.
Testing and Debugging:
rwatson,
brueffer,
Ketrien I. Saihr-Kesenchedra,
...
Reviewed by: Lots of people (for different parts)
2004-05-31 21:46:06 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Argument structure passed to UMA routines during mbuf and packet
|
|
|
|
* allocations.
|
|
|
|
*/
|
|
|
|
struct mb_args {
|
|
|
|
int flags; /* Flags for mbuf being allocated */
|
|
|
|
short type; /* Type of mbuf being allocated */
|
|
|
|
};
|
2001-04-05 03:55:27 +00:00
|
|
|
#endif /* _KERNEL */
|
1994-05-24 10:09:53 +00:00
|
|
|
|
2007-04-14 19:42:20 +00:00
|
|
|
#if defined(__LP64__)
|
|
|
|
#define M_HDR_PAD 6
|
|
|
|
#else
|
|
|
|
#define M_HDR_PAD 2
|
|
|
|
#endif
|
|
|
|
|
2001-06-22 06:35:32 +00:00
|
|
|
/*
|
|
|
|
* Header present at the beginning of every mbuf.
|
|
|
|
*/
|
1994-05-24 10:09:53 +00:00
|
|
|
struct m_hdr {
|
2007-03-24 20:19:44 +00:00
|
|
|
struct mbuf *mh_next; /* next buffer in chain */
|
|
|
|
struct mbuf *mh_nextpkt; /* next chain in queue/record */
|
|
|
|
caddr_t mh_data; /* location of data */
|
|
|
|
int mh_len; /* amount of data in this mbuf */
|
|
|
|
int mh_flags; /* flags; see below */
|
|
|
|
short mh_type; /* type of data in this mbuf */
|
2007-04-14 19:42:20 +00:00
|
|
|
uint8_t pad[M_HDR_PAD];/* word align */
|
1994-05-24 10:09:53 +00:00
|
|
|
};
|
|
|
|
|
2002-10-16 01:54:46 +00:00
|
|
|
/*
|
|
|
|
* Packet tag structure (see below for details).
|
|
|
|
*/
|
|
|
|
struct m_tag {
|
|
|
|
SLIST_ENTRY(m_tag) m_tag_link; /* List of packet tags */
|
|
|
|
u_int16_t m_tag_id; /* Tag ID */
|
|
|
|
u_int16_t m_tag_len; /* Length of data */
|
|
|
|
u_int32_t m_tag_cookie; /* ABI/Module ID */
|
2004-01-02 17:27:39 +00:00
|
|
|
void (*m_tag_free)(struct m_tag *);
|
2002-10-16 01:54:46 +00:00
|
|
|
};
|
|
|
|
|
2001-06-22 06:35:32 +00:00
|
|
|
/*
|
|
|
|
* Record/packet header in first mbuf of chain; valid only if M_PKTHDR is set.
|
|
|
|
*/
|
1999-12-18 13:52:44 +00:00
|
|
|
struct pkthdr {
|
2007-03-24 20:19:44 +00:00
|
|
|
struct ifnet *rcvif; /* rcv interface */
|
1998-08-24 07:47:39 +00:00
|
|
|
/* variables for ip and tcp reassembly */
|
2007-03-24 20:19:44 +00:00
|
|
|
void *header; /* pointer to packet header */
|
2007-04-14 19:42:20 +00:00
|
|
|
int len; /* total packet length */
|
2008-11-22 08:46:16 +00:00
|
|
|
uint32_t flowid; /* packet's 4-tuple system
|
|
|
|
* flow identifier
|
|
|
|
*/
|
2000-03-27 19:14:27 +00:00
|
|
|
/* variables for hardware checksum */
|
2007-03-24 20:19:44 +00:00
|
|
|
int csum_flags; /* flags regarding checksum */
|
|
|
|
int csum_data; /* data field used by csum routines */
|
|
|
|
u_int16_t tso_segsz; /* TSO segment size */
|
2009-03-04 02:55:04 +00:00
|
|
|
union {
|
|
|
|
u_int16_t vt_vtag; /* Ethernet 802.1p+q vlan tag */
|
|
|
|
u_int16_t vt_nrecs; /* # of IGMPv3 records in this chain */
|
|
|
|
} PH_vt;
|
2002-10-16 01:54:46 +00:00
|
|
|
SLIST_HEAD(packet_tags, m_tag) tags; /* list of packet tags */
|
1994-05-24 10:09:53 +00:00
|
|
|
};
|
2009-03-04 02:55:04 +00:00
|
|
|
#define ether_vtag PH_vt.vt_vtag
|
1994-05-24 10:09:53 +00:00
|
|
|
|
2001-06-22 06:35:32 +00:00
|
|
|
/*
|
2007-03-24 20:19:44 +00:00
|
|
|
* Description of external storage mapped into mbuf; valid only if M_EXT is
|
|
|
|
* set.
|
2001-06-22 06:35:32 +00:00
|
|
|
*/
|
1994-05-24 10:09:53 +00:00
|
|
|
struct m_ext {
|
2007-03-24 20:19:44 +00:00
|
|
|
caddr_t ext_buf; /* start of buffer */
|
|
|
|
void (*ext_free) /* free routine if not the usual */
|
|
|
|
(void *, void *);
|
2008-02-01 19:36:27 +00:00
|
|
|
void *ext_arg1; /* optional argument pointer */
|
|
|
|
void *ext_arg2; /* optional argument pointer */
|
2007-03-24 20:19:44 +00:00
|
|
|
u_int ext_size; /* size of buffer, for ext_free */
|
|
|
|
volatile u_int *ref_cnt; /* pointer to ref count info */
|
|
|
|
int ext_type; /* type of external storage */
|
1994-05-24 10:09:53 +00:00
|
|
|
};
|
|
|
|
|
2001-06-22 06:35:32 +00:00
|
|
|
/*
|
2007-03-24 20:19:44 +00:00
|
|
|
* The core of the mbuf object along with some shortcut defines for practical
|
|
|
|
* purposes.
|
2001-06-22 06:35:32 +00:00
|
|
|
*/
|
1994-05-24 10:09:53 +00:00
|
|
|
struct mbuf {
|
2007-03-24 20:19:44 +00:00
|
|
|
struct m_hdr m_hdr;
|
1994-05-24 10:09:53 +00:00
|
|
|
union {
|
|
|
|
struct {
|
2007-03-24 20:19:44 +00:00
|
|
|
struct pkthdr MH_pkthdr; /* M_PKTHDR set */
|
1994-05-24 10:09:53 +00:00
|
|
|
union {
|
2007-03-24 20:19:44 +00:00
|
|
|
struct m_ext MH_ext; /* M_EXT set */
|
|
|
|
char MH_databuf[MHLEN];
|
1994-05-24 10:09:53 +00:00
|
|
|
} MH_dat;
|
|
|
|
} MH;
|
|
|
|
char M_databuf[MLEN]; /* !M_PKTHDR, !M_EXT */
|
|
|
|
} M_dat;
|
|
|
|
};
|
|
|
|
#define m_next m_hdr.mh_next
|
|
|
|
#define m_len m_hdr.mh_len
|
|
|
|
#define m_data m_hdr.mh_data
|
|
|
|
#define m_type m_hdr.mh_type
|
|
|
|
#define m_flags m_hdr.mh_flags
|
|
|
|
#define m_nextpkt m_hdr.mh_nextpkt
|
|
|
|
#define m_act m_nextpkt
|
|
|
|
#define m_pkthdr M_dat.MH.MH_pkthdr
|
|
|
|
#define m_ext M_dat.MH.MH_dat.MH_ext
|
|
|
|
#define m_pktdat M_dat.MH.MH_dat.MH_databuf
|
|
|
|
#define m_dat M_dat.M_databuf
|
|
|
|
|
2001-06-22 06:35:32 +00:00
|
|
|
/*
|
2001-12-23 22:04:08 +00:00
|
|
|
* mbuf flags.
|
2001-06-22 06:35:32 +00:00
|
|
|
*/
|
2008-03-24 19:01:29 +00:00
|
|
|
#define M_EXT 0x00000001 /* has associated external storage */
|
|
|
|
#define M_PKTHDR 0x00000002 /* start of record */
|
|
|
|
#define M_EOR 0x00000004 /* end of record */
|
|
|
|
#define M_RDONLY 0x00000008 /* associated data is marked read-only */
|
|
|
|
#define M_PROTO1 0x00000010 /* protocol-specific */
|
|
|
|
#define M_PROTO2 0x00000020 /* protocol-specific */
|
|
|
|
#define M_PROTO3 0x00000040 /* protocol-specific */
|
|
|
|
#define M_PROTO4 0x00000080 /* protocol-specific */
|
|
|
|
#define M_PROTO5 0x00000100 /* protocol-specific */
|
|
|
|
#define M_BCAST 0x00000200 /* send/received as link-level broadcast */
|
|
|
|
#define M_MCAST 0x00000400 /* send/received as link-level multicast */
|
|
|
|
#define M_FRAG 0x00000800 /* packet is a fragment of a larger packet */
|
|
|
|
#define M_FIRSTFRAG 0x00001000 /* packet is first fragment */
|
|
|
|
#define M_LASTFRAG 0x00002000 /* packet is last fragment */
|
|
|
|
#define M_SKIP_FIREWALL 0x00004000 /* skip firewall processing */
|
|
|
|
#define M_FREELIST 0x00008000 /* mbuf is on the free list */
|
|
|
|
#define M_VLANTAG 0x00010000 /* ether_vtag is valid */
|
|
|
|
#define M_PROMISC 0x00020000 /* packet was not for us */
|
|
|
|
#define M_NOFREE 0x00040000 /* do not free mbuf, embedded in cluster */
|
|
|
|
#define M_PROTO6 0x00080000 /* protocol-specific */
|
|
|
|
#define M_PROTO7 0x00100000 /* protocol-specific */
|
|
|
|
#define M_PROTO8 0x00200000 /* protocol-specific */
|
2009-04-10 06:16:14 +00:00
|
|
|
#define M_FLOWID 0x00400000 /* flowid is valid */
|
Add code to allow the system to handle multiple routing tables.
This particular implementation is designed to be fully backwards compatible
and to be MFC-able to 7.x (and 6.x)
Currently the only protocol that can make use of the multiple tables is IPv4
Similar functionality exists in OpenBSD and Linux.
From my notes:
-----
One thing where FreeBSD has been falling behind, and which by chance I
have some time to work on is "policy based routing", which allows
different
packet streams to be routed by more than just the destination address.
Constraints:
------------
I want to make some form of this available in the 6.x tree
(and by extension 7.x) , but FreeBSD in general needs it so I might as
well do it in -current and back port the portions I need.
One of the ways that this can be done is to have the ability to
instantiate multiple kernel routing tables (which I will now
refer to as "Forwarding Information Bases" or "FIBs" for political
correctness reasons). Which FIB a particular packet uses to make
the next hop decision can be decided by a number of mechanisms.
The policies these mechanisms implement are the "Policies" referred
to in "Policy based routing".
One of the constraints I have if I try to back port this work to
6.x is that it must be implemented as a EXTENSION to the existing
ABIs in 6.x so that third party applications do not need to be
recompiled in timespan of the branch.
This first version will not have some of the bells and whistles that
will come with later versions. It will, for example, be limited to 16
tables in the first commit.
Implementation method, Compatible version. (part 1)
-------------------------------
For this reason I have implemented a "sufficient subset" of a
multiple routing table solution in Perforce, and back-ported it
to 6.x. (also in Perforce though not always caught up with what I
have done in -current/P4). The subset allows a number of FIBs
to be defined at compile time (8 is sufficient for my purposes in 6.x)
and implements the changes needed to allow IPV4 to use them. I have not
done the changes for ipv6 simply because I do not need it, and I do not
have enough knowledge of ipv6 (e.g. neighbor discovery) needed to do it.
Other protocol families are left untouched and should there be
users with proprietary protocol families, they should continue to work
and be oblivious to the existence of the extra FIBs.
To understand how this is done, one must know that the current FIB
code starts everything off with a single dimensional array of
pointers to FIB head structures (One per protocol family), each of
which in turn points to the trie of routes available to that family.
The basic change in the ABI compatible version of the change is to
extent that array to be a 2 dimensional array, so that
instead of protocol family X looking at rt_tables[X] for the
table it needs, it looks at rt_tables[Y][X] when for all
protocol families except ipv4 Y is always 0.
Code that is unaware of the change always just sees the first row
of the table, which of course looks just like the one dimensional
array that existed before.
The entry points rtrequest(), rtalloc(), rtalloc1(), rtalloc_ign()
are all maintained, but refer only to the first row of the array,
so that existing callers in proprietary protocols can continue to
do the "right thing".
Some new entry points are added, for the exclusive use of ipv4 code
called in_rtrequest(), in_rtalloc(), in_rtalloc1() and in_rtalloc_ign(),
which have an extra argument which refers the code to the correct row.
In addition, there are some new entry points (currently called
rtalloc_fib() and friends) that check the Address family being
looked up and call either rtalloc() (and friends) if the protocol
is not IPv4 forcing the action to row 0 or to the appropriate row
if it IS IPv4 (and that info is available). These are for calling
from code that is not specific to any particular protocol. The way
these are implemented would change in the non ABI preserving code
to be added later.
One feature of the first version of the code is that for ipv4,
the interface routes show up automatically on all the FIBs, so
that no matter what FIB you select you always have the basic
direct attached hosts available to you. (rtinit() does this
automatically).
You CAN delete an interface route from one FIB should you want
to but by default it's there. ARP information is also available
in each FIB. It's assumed that the same machine would have the
same MAC address, regardless of which FIB you are using to get
to it.
This brings us as to how the correct FIB is selected for an outgoing
IPV4 packet.
Firstly, all packets have a FIB associated with them. if nothing
has been done to change it, it will be FIB 0. The FIB is changed
in the following ways.
Packets fall into one of a number of classes.
1/ locally generated packets, coming from a socket/PCB.
Such packets select a FIB from a number associated with the
socket/PCB. This in turn is inherited from the process,
but can be changed by a socket option. The process in turn
inherits it on fork. I have written a utility call setfib
that acts a bit like nice..
setfib -3 ping target.example.com # will use fib 3 for ping.
It is an obvious extension to make it a property of a jail
but I have not done so. It can be achieved by combining the setfib and
jail commands.
2/ packets received on an interface for forwarding.
By default these packets would use table 0,
(or possibly a number settable in a sysctl(not yet)).
but prior to routing the firewall can inspect them (see below).
(possibly in the future you may be able to associate a FIB
with packets received on an interface.. An ifconfig arg, but not yet.)
3/ packets inspected by a packet classifier, which can arbitrarily
associate a fib with it on a packet by packet basis.
A fib assigned to a packet by a packet classifier
(such as ipfw) would over-ride a fib associated by
a more default source. (such as cases 1 or 2).
4/ a tcp listen socket associated with a fib will generate
accept sockets that are associated with that same fib.
5/ Packets generated in response to some other packet (e.g. reset
or icmp packets). These should use the FIB associated with the
packet being reponded to.
6/ Packets generated during encapsulation.
gif, tun and other tunnel interfaces will encapsulate using the FIB
that was in effect withthe proces that set up the tunnel.
thus setfib 1 ifconfig gif0 [tunnel instructions]
will set the fib for the tunnel to use to be fib 1.
Routing messages would be associated with their
process, and thus select one FIB or another.
messages from the kernel would be associated with the fib they
refer to and would only be received by a routing socket associated
with that fib. (not yet implemented)
In addition Netstat has been edited to be able to cope with the
fact that the array is now 2 dimensional. (It looks in system
memory using libkvm (!)). Old versions of netstat see only the first FIB.
In addition two sysctls are added to give:
a) the number of FIBs compiled in (active)
b) the default FIB of the calling process.
Early testing experience:
-------------------------
Basically our (IronPort's) appliance does this functionality already
using ipfw fwd but that method has some drawbacks.
For example,
It can't fully simulate a routing table because it can't influence the
socket's choice of local address when a connect() is done.
Testing during the generating of these changes has been
remarkably smooth so far. Multiple tables have co-existed
with no notable side effects, and packets have been routes
accordingly.
ipfw has grown 2 new keywords:
setfib N ip from anay to any
count ip from any to any fib N
In pf there seems to be a requirement to be able to give symbolic names to the
fibs but I do not have that capacity. I am not sure if it is required.
SCTP has interestingly enough built in support for this, called VRFs
in Cisco parlance. it will be interesting to see how that handles it
when it suddenly actually does something.
Where to next:
--------------------
After committing the ABI compatible version and MFCing it, I'd
like to proceed in a forward direction in -current. this will
result in some roto-tilling in the routing code.
Firstly: the current code's idea of having a separate tree per
protocol family, all of the same format, and pointed to by the
1 dimensional array is a bit silly. Especially when one considers that
there is code that makes assumptions about every protocol having the
same internal structures there. Some protocols don't WANT that
sort of structure. (for example the whole idea of a netmask is foreign
to appletalk). This needs to be made opaque to the external code.
My suggested first change is to add routing method pointers to the
'domain' structure, along with information pointing the data.
instead of having an array of pointers to uniform structures,
there would be an array pointing to the 'domain' structures
for each protocol address domain (protocol family),
and the methods this reached would be called. The methods would have
an argument that gives FIB number, but the protocol would be free
to ignore it.
When the ABI can be changed it raises the possibilty of the
addition of a fib entry into the "struct route". Currently,
the structure contains the sockaddr of the desination, and the resulting
fib entry. To make this work fully, one could add a fib number
so that given an address and a fib, one can find the third element, the
fib entry.
Interaction with the ARP layer/ LL layer would need to be
revisited as well. Qing Li has been working on this already.
This work was sponsored by Ironport Systems/Cisco
Reviewed by: several including rwatson, bz and mlair (parts each)
Obtained from: Ironport systems/Cisco
2008-05-09 23:03:00 +00:00
|
|
|
/*
|
|
|
|
* For RELENG_{6,7} steal these flags for limited multiple routing table
|
|
|
|
* support. In RELENG_8 and beyond, use just one flag and a tag.
|
|
|
|
*/
|
|
|
|
#define M_FIB 0xF0000000 /* steal some bits to store fib number. */
|
2008-03-24 19:01:29 +00:00
|
|
|
|
|
|
|
#define M_NOTIFICATION M_PROTO5 /* SCTP notification */
|
1994-05-24 10:09:53 +00:00
|
|
|
|
2001-06-22 06:35:32 +00:00
|
|
|
/*
|
2008-03-24 19:01:29 +00:00
|
|
|
* Flags to purge when crossing layers.
|
|
|
|
*/
|
|
|
|
#define M_PROTOFLAGS \
|
|
|
|
(M_PROTO1|M_PROTO2|M_PROTO3|M_PROTO4|M_PROTO5|M_PROTO6|M_PROTO7|M_PROTO8)
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Flags preserved when copying m_pkthdr.
|
2001-06-22 06:35:32 +00:00
|
|
|
*/
|
2008-03-24 19:01:29 +00:00
|
|
|
#define M_COPYFLAGS \
|
|
|
|
(M_PKTHDR|M_EOR|M_RDONLY|M_PROTOFLAGS|M_SKIP_FIREWALL|M_BCAST|M_MCAST|\
|
Add code to allow the system to handle multiple routing tables.
This particular implementation is designed to be fully backwards compatible
and to be MFC-able to 7.x (and 6.x)
Currently the only protocol that can make use of the multiple tables is IPv4
Similar functionality exists in OpenBSD and Linux.
From my notes:
-----
One thing where FreeBSD has been falling behind, and which by chance I
have some time to work on is "policy based routing", which allows
different
packet streams to be routed by more than just the destination address.
Constraints:
------------
I want to make some form of this available in the 6.x tree
(and by extension 7.x) , but FreeBSD in general needs it so I might as
well do it in -current and back port the portions I need.
One of the ways that this can be done is to have the ability to
instantiate multiple kernel routing tables (which I will now
refer to as "Forwarding Information Bases" or "FIBs" for political
correctness reasons). Which FIB a particular packet uses to make
the next hop decision can be decided by a number of mechanisms.
The policies these mechanisms implement are the "Policies" referred
to in "Policy based routing".
One of the constraints I have if I try to back port this work to
6.x is that it must be implemented as a EXTENSION to the existing
ABIs in 6.x so that third party applications do not need to be
recompiled in timespan of the branch.
This first version will not have some of the bells and whistles that
will come with later versions. It will, for example, be limited to 16
tables in the first commit.
Implementation method, Compatible version. (part 1)
-------------------------------
For this reason I have implemented a "sufficient subset" of a
multiple routing table solution in Perforce, and back-ported it
to 6.x. (also in Perforce though not always caught up with what I
have done in -current/P4). The subset allows a number of FIBs
to be defined at compile time (8 is sufficient for my purposes in 6.x)
and implements the changes needed to allow IPV4 to use them. I have not
done the changes for ipv6 simply because I do not need it, and I do not
have enough knowledge of ipv6 (e.g. neighbor discovery) needed to do it.
Other protocol families are left untouched and should there be
users with proprietary protocol families, they should continue to work
and be oblivious to the existence of the extra FIBs.
To understand how this is done, one must know that the current FIB
code starts everything off with a single dimensional array of
pointers to FIB head structures (One per protocol family), each of
which in turn points to the trie of routes available to that family.
The basic change in the ABI compatible version of the change is to
extent that array to be a 2 dimensional array, so that
instead of protocol family X looking at rt_tables[X] for the
table it needs, it looks at rt_tables[Y][X] when for all
protocol families except ipv4 Y is always 0.
Code that is unaware of the change always just sees the first row
of the table, which of course looks just like the one dimensional
array that existed before.
The entry points rtrequest(), rtalloc(), rtalloc1(), rtalloc_ign()
are all maintained, but refer only to the first row of the array,
so that existing callers in proprietary protocols can continue to
do the "right thing".
Some new entry points are added, for the exclusive use of ipv4 code
called in_rtrequest(), in_rtalloc(), in_rtalloc1() and in_rtalloc_ign(),
which have an extra argument which refers the code to the correct row.
In addition, there are some new entry points (currently called
rtalloc_fib() and friends) that check the Address family being
looked up and call either rtalloc() (and friends) if the protocol
is not IPv4 forcing the action to row 0 or to the appropriate row
if it IS IPv4 (and that info is available). These are for calling
from code that is not specific to any particular protocol. The way
these are implemented would change in the non ABI preserving code
to be added later.
One feature of the first version of the code is that for ipv4,
the interface routes show up automatically on all the FIBs, so
that no matter what FIB you select you always have the basic
direct attached hosts available to you. (rtinit() does this
automatically).
You CAN delete an interface route from one FIB should you want
to but by default it's there. ARP information is also available
in each FIB. It's assumed that the same machine would have the
same MAC address, regardless of which FIB you are using to get
to it.
This brings us as to how the correct FIB is selected for an outgoing
IPV4 packet.
Firstly, all packets have a FIB associated with them. if nothing
has been done to change it, it will be FIB 0. The FIB is changed
in the following ways.
Packets fall into one of a number of classes.
1/ locally generated packets, coming from a socket/PCB.
Such packets select a FIB from a number associated with the
socket/PCB. This in turn is inherited from the process,
but can be changed by a socket option. The process in turn
inherits it on fork. I have written a utility call setfib
that acts a bit like nice..
setfib -3 ping target.example.com # will use fib 3 for ping.
It is an obvious extension to make it a property of a jail
but I have not done so. It can be achieved by combining the setfib and
jail commands.
2/ packets received on an interface for forwarding.
By default these packets would use table 0,
(or possibly a number settable in a sysctl(not yet)).
but prior to routing the firewall can inspect them (see below).
(possibly in the future you may be able to associate a FIB
with packets received on an interface.. An ifconfig arg, but not yet.)
3/ packets inspected by a packet classifier, which can arbitrarily
associate a fib with it on a packet by packet basis.
A fib assigned to a packet by a packet classifier
(such as ipfw) would over-ride a fib associated by
a more default source. (such as cases 1 or 2).
4/ a tcp listen socket associated with a fib will generate
accept sockets that are associated with that same fib.
5/ Packets generated in response to some other packet (e.g. reset
or icmp packets). These should use the FIB associated with the
packet being reponded to.
6/ Packets generated during encapsulation.
gif, tun and other tunnel interfaces will encapsulate using the FIB
that was in effect withthe proces that set up the tunnel.
thus setfib 1 ifconfig gif0 [tunnel instructions]
will set the fib for the tunnel to use to be fib 1.
Routing messages would be associated with their
process, and thus select one FIB or another.
messages from the kernel would be associated with the fib they
refer to and would only be received by a routing socket associated
with that fib. (not yet implemented)
In addition Netstat has been edited to be able to cope with the
fact that the array is now 2 dimensional. (It looks in system
memory using libkvm (!)). Old versions of netstat see only the first FIB.
In addition two sysctls are added to give:
a) the number of FIBs compiled in (active)
b) the default FIB of the calling process.
Early testing experience:
-------------------------
Basically our (IronPort's) appliance does this functionality already
using ipfw fwd but that method has some drawbacks.
For example,
It can't fully simulate a routing table because it can't influence the
socket's choice of local address when a connect() is done.
Testing during the generating of these changes has been
remarkably smooth so far. Multiple tables have co-existed
with no notable side effects, and packets have been routes
accordingly.
ipfw has grown 2 new keywords:
setfib N ip from anay to any
count ip from any to any fib N
In pf there seems to be a requirement to be able to give symbolic names to the
fibs but I do not have that capacity. I am not sure if it is required.
SCTP has interestingly enough built in support for this, called VRFs
in Cisco parlance. it will be interesting to see how that handles it
when it suddenly actually does something.
Where to next:
--------------------
After committing the ABI compatible version and MFCing it, I'd
like to proceed in a forward direction in -current. this will
result in some roto-tilling in the routing code.
Firstly: the current code's idea of having a separate tree per
protocol family, all of the same format, and pointed to by the
1 dimensional array is a bit silly. Especially when one considers that
there is code that makes assumptions about every protocol having the
same internal structures there. Some protocols don't WANT that
sort of structure. (for example the whole idea of a netmask is foreign
to appletalk). This needs to be made opaque to the external code.
My suggested first change is to add routing method pointers to the
'domain' structure, along with information pointing the data.
instead of having an array of pointers to uniform structures,
there would be an array pointing to the 'domain' structures
for each protocol address domain (protocol family),
and the methods this reached would be called. The methods would have
an argument that gives FIB number, but the protocol would be free
to ignore it.
When the ABI can be changed it raises the possibilty of the
addition of a fib entry into the "struct route". Currently,
the structure contains the sockaddr of the desination, and the resulting
fib entry. To make this work fully, one could add a fib number
so that given an address and a fib, one can find the third element, the
fib entry.
Interaction with the ARP layer/ LL layer would need to be
revisited as well. Qing Li has been working on this already.
This work was sponsored by Ironport Systems/Cisco
Reviewed by: several including rwatson, bz and mlair (parts each)
Obtained from: Ironport systems/Cisco
2008-05-09 23:03:00 +00:00
|
|
|
M_FRAG|M_FIRSTFRAG|M_LASTFRAG|M_VLANTAG|M_PROMISC|M_FIB)
|
2000-11-11 23:12:27 +00:00
|
|
|
|
2001-06-22 06:35:32 +00:00
|
|
|
/*
|
2001-12-23 22:04:08 +00:00
|
|
|
* External buffer types: identify ext_buf type.
|
2001-06-22 06:35:32 +00:00
|
|
|
*/
|
2000-11-11 23:12:27 +00:00
|
|
|
#define EXT_CLUSTER 1 /* mbuf cluster */
|
|
|
|
#define EXT_SFBUF 2 /* sendfile(2)'s sf_bufs */
|
2006-02-17 14:14:15 +00:00
|
|
|
#define EXT_JUMBOP 3 /* jumbo cluster 4096 bytes */
|
2005-12-08 13:13:06 +00:00
|
|
|
#define EXT_JUMBO9 4 /* jumbo cluster 9216 bytes */
|
|
|
|
#define EXT_JUMBO16 5 /* jumbo cluster 16184 bytes */
|
|
|
|
#define EXT_PACKET 6 /* mbuf+cluster from packet zone */
|
2007-04-04 00:31:49 +00:00
|
|
|
#define EXT_MBUF 7 /* external mbuf reference (M_IOVEC) */
|
2000-11-11 23:12:27 +00:00
|
|
|
#define EXT_NET_DRV 100 /* custom ext_buf provided by net driver(s) */
|
2000-11-13 02:59:57 +00:00
|
|
|
#define EXT_MOD_TYPE 200 /* custom module's ext_buf type */
|
At long last, commit the zero copy sockets code.
MAKEDEV: Add MAKEDEV glue for the ti(4) device nodes.
ti.4: Update the ti(4) man page to include information on the
TI_JUMBO_HDRSPLIT and TI_PRIVATE_JUMBOS kernel options,
and also include information about the new character
device interface and the associated ioctls.
man9/Makefile: Add jumbo.9 and zero_copy.9 man pages and associated
links.
jumbo.9: New man page describing the jumbo buffer allocator
interface and operation.
zero_copy.9: New man page describing the general characteristics of
the zero copy send and receive code, and what an
application author should do to take advantage of the
zero copy functionality.
NOTES: Add entries for ZERO_COPY_SOCKETS, TI_PRIVATE_JUMBOS,
TI_JUMBO_HDRSPLIT, MSIZE, and MCLSHIFT.
conf/files: Add uipc_jumbo.c and uipc_cow.c.
conf/options: Add the 5 options mentioned above.
kern_subr.c: Receive side zero copy implementation. This takes
"disposable" pages attached to an mbuf, gives them to
a user process, and then recycles the user's page.
This is only active when ZERO_COPY_SOCKETS is turned on
and the kern.ipc.zero_copy.receive sysctl variable is
set to 1.
uipc_cow.c: Send side zero copy functions. Takes a page written
by the user and maps it copy on write and assigns it
kernel virtual address space. Removes copy on write
mapping once the buffer has been freed by the network
stack.
uipc_jumbo.c: Jumbo disposable page allocator code. This allocates
(optionally) disposable pages for network drivers that
want to give the user the option of doing zero copy
receive.
uipc_socket.c: Add kern.ipc.zero_copy.{send,receive} sysctls that are
enabled if ZERO_COPY_SOCKETS is turned on.
Add zero copy send support to sosend() -- pages get
mapped into the kernel instead of getting copied if
they meet size and alignment restrictions.
uipc_syscalls.c:Un-staticize some of the sf* functions so that they
can be used elsewhere. (uipc_cow.c)
if_media.c: In the SIOCGIFMEDIA ioctl in ifmedia_ioctl(), avoid
calling malloc() with M_WAITOK. Return an error if
the M_NOWAIT malloc fails.
The ti(4) driver and the wi(4) driver, at least, call
this with a mutex held. This causes witness warnings
for 'ifconfig -a' with a wi(4) or ti(4) board in the
system. (I've only verified for ti(4)).
ip_output.c: Fragment large datagrams so that each segment contains
a multiple of PAGE_SIZE amount of data plus headers.
This allows the receiver to potentially do page
flipping on receives.
if_ti.c: Add zero copy receive support to the ti(4) driver. If
TI_PRIVATE_JUMBOS is not defined, it now uses the
jumbo(9) buffer allocator for jumbo receive buffers.
Add a new character device interface for the ti(4)
driver for the new debugging interface. This allows
(a patched version of) gdb to talk to the Tigon board
and debug the firmware. There are also a few additional
debugging ioctls available through this interface.
Add header splitting support to the ti(4) driver.
Tweak some of the default interrupt coalescing
parameters to more useful defaults.
Add hooks for supporting transmit flow control, but
leave it turned off with a comment describing why it
is turned off.
if_tireg.h: Change the firmware rev to 12.4.11, since we're really
at 12.4.11 plus fixes from 12.4.13.
Add defines needed for debugging.
Remove the ti_stats structure, it is now defined in
sys/tiio.h.
ti_fw.h: 12.4.11 firmware.
ti_fw2.h: 12.4.11 firmware, plus selected fixes from 12.4.13,
and my header splitting patches. Revision 12.4.13
doesn't handle 10/100 negotiation properly. (This
firmware is the same as what was in the tree previously,
with the addition of header splitting support.)
sys/jumbo.h: Jumbo buffer allocator interface.
sys/mbuf.h: Add a new external mbuf type, EXT_DISPOSABLE, to
indicate that the payload buffer can be thrown away /
flipped to a userland process.
socketvar.h: Add prototype for socow_setup.
tiio.h: ioctl interface to the character portion of the ti(4)
driver, plus associated structure/type definitions.
uio.h: Change prototype for uiomoveco() so that we'll know
whether the source page is disposable.
ufs_readwrite.c:Update for new prototype of uiomoveco().
vm_fault.c: In vm_fault(), check to see whether we need to do a page
based copy on write fault.
vm_object.c: Add a new function, vm_object_allocate_wait(). This
does the same thing that vm_object allocate does, except
that it gives the caller the opportunity to specify whether
it should wait on the uma_zalloc() of the object structre.
This allows vm objects to be allocated while holding a
mutex. (Without generating WITNESS warnings.)
vm_object_allocate() is implemented as a call to
vm_object_allocate_wait() with the malloc flag set to
M_WAITOK.
vm_object.h: Add prototype for vm_object_allocate_wait().
vm_page.c: Add page-based copy on write setup, clear and fault
routines.
vm_page.h: Add page based COW function prototypes and variable in
the vm_page structure.
Many thanks to Drew Gallatin, who wrote the zero copy send and receive
code, and to all the other folks who have tested and reviewed this code
over the years.
2002-06-26 03:37:47 +00:00
|
|
|
#define EXT_DISPOSABLE 300 /* can throw this buffer away w/page flipping */
|
2004-05-28 14:20:06 +00:00
|
|
|
#define EXT_EXTREF 400 /* has externally maintained ref_cnt ptr */
|
1994-05-24 10:09:53 +00:00
|
|
|
|
2001-06-22 06:35:32 +00:00
|
|
|
/*
|
2007-03-24 20:19:44 +00:00
|
|
|
* Flags indicating hw checksum support and sw checksum requirements. This
|
|
|
|
* field can be directly tested against if_data.ifi_hwassist.
|
2001-06-22 06:35:32 +00:00
|
|
|
*/
|
2001-12-23 22:04:08 +00:00
|
|
|
#define CSUM_IP 0x0001 /* will csum IP */
|
|
|
|
#define CSUM_TCP 0x0002 /* will csum TCP */
|
|
|
|
#define CSUM_UDP 0x0004 /* will csum UDP */
|
|
|
|
#define CSUM_IP_FRAGS 0x0008 /* will csum IP fragments */
|
|
|
|
#define CSUM_FRAGMENT 0x0010 /* will do IP fragmentation */
|
2006-09-06 21:51:59 +00:00
|
|
|
#define CSUM_TSO 0x0020 /* will do TSO */
|
2009-01-06 12:23:19 +00:00
|
|
|
#define CSUM_SCTP 0x0040 /* will csum SCTP */
|
2000-03-27 19:14:27 +00:00
|
|
|
|
2001-12-23 22:04:08 +00:00
|
|
|
#define CSUM_IP_CHECKED 0x0100 /* did csum IP */
|
|
|
|
#define CSUM_IP_VALID 0x0200 /* ... the csum is valid */
|
|
|
|
#define CSUM_DATA_VALID 0x0400 /* csum_data field is valid */
|
|
|
|
#define CSUM_PSEUDO_HDR 0x0800 /* csum_data has pseudo hdr */
|
2009-01-06 12:23:19 +00:00
|
|
|
#define CSUM_SCTP_VALID 0x1000 /* SCTP checksum is valid */
|
2000-03-27 19:14:27 +00:00
|
|
|
|
2001-12-23 22:04:08 +00:00
|
|
|
#define CSUM_DELAY_DATA (CSUM_TCP | CSUM_UDP)
|
|
|
|
#define CSUM_DELAY_IP (CSUM_IP) /* XXX add ipv6 here too? */
|
2000-03-27 19:14:27 +00:00
|
|
|
|
2001-06-22 06:35:32 +00:00
|
|
|
/*
|
2001-12-23 22:04:08 +00:00
|
|
|
* mbuf types.
|
2001-06-22 06:35:32 +00:00
|
|
|
*/
|
2001-09-30 01:58:39 +00:00
|
|
|
#define MT_NOTMBUF 0 /* USED INTERNALLY ONLY! Object is not mbuf */
|
1994-05-24 10:09:53 +00:00
|
|
|
#define MT_DATA 1 /* dynamic (data) allocation */
|
2005-11-02 13:46:32 +00:00
|
|
|
#define MT_HEADER MT_DATA /* packet header, use M_PKTHDR instead */
|
1994-05-24 10:09:53 +00:00
|
|
|
#define MT_SONAME 8 /* socket name */
|
1999-12-18 13:52:44 +00:00
|
|
|
#define MT_CONTROL 14 /* extra-data protocol message */
|
|
|
|
#define MT_OOBDATA 15 /* expedited data */
|
2000-07-15 06:02:48 +00:00
|
|
|
#define MT_NTYPES 16 /* number of mbuf types for mbtypes[] */
|
|
|
|
|
2005-11-02 16:20:36 +00:00
|
|
|
#define MT_NOINIT 255 /* Not a type but a flag to allocate
|
|
|
|
a non-initialized mbuf */
|
|
|
|
|
2007-10-06 21:13:55 +00:00
|
|
|
#define MB_NOTAGS 0x1UL /* no tags attached to mbuf */
|
|
|
|
|
2001-06-22 06:35:32 +00:00
|
|
|
/*
|
2001-09-30 01:58:39 +00:00
|
|
|
* General mbuf allocator statistics structure.
|
2006-07-24 01:49:57 +00:00
|
|
|
*
|
|
|
|
* Many of these statistics are no longer used; we instead track many
|
|
|
|
* allocator statistics through UMA's built in statistics mechanism.
|
1999-12-14 02:23:14 +00:00
|
|
|
*/
|
|
|
|
struct mbstat {
|
Bring in mbuma to replace mballoc.
mbuma is an Mbuf & Cluster allocator built on top of a number of
extensions to the UMA framework, all included herein.
Extensions to UMA worth noting:
- Better layering between slab <-> zone caches; introduce
Keg structure which splits off slab cache away from the
zone structure and allows multiple zones to be stacked
on top of a single Keg (single type of slab cache);
perhaps we should look into defining a subset API on
top of the Keg for special use by malloc(9),
for example.
- UMA_ZONE_REFCNT zones can now be added, and reference
counters automagically allocated for them within the end
of the associated slab structures. uma_find_refcnt()
does a kextract to fetch the slab struct reference from
the underlying page, and lookup the corresponding refcnt.
mbuma things worth noting:
- integrates mbuf & cluster allocations with extended UMA
and provides caches for commonly-allocated items; defines
several zones (two primary, one secondary) and two kegs.
- change up certain code paths that always used to do:
m_get() + m_clget() to instead just use m_getcl() and
try to take advantage of the newly defined secondary
Packet zone.
- netstat(1) and systat(1) quickly hacked up to do basic
stat reporting but additional stats work needs to be
done once some other details within UMA have been taken
care of and it becomes clearer to how stats will work
within the modified framework.
From the user perspective, one implication is that the
NMBCLUSTERS compile-time option is no longer used. The
maximum number of clusters is still capped off according
to maxusers, but it can be made unlimited by setting
the kern.ipc.nmbclusters boot-time tunable to zero.
Work should be done to write an appropriate sysctl
handler allowing dynamic tuning of kern.ipc.nmbclusters
at runtime.
Additional things worth noting/known issues (READ):
- One report of 'ips' (ServeRAID) driver acting really
slow in conjunction with mbuma. Need more data.
Latest report is that ips is equally sucking with
and without mbuma.
- Giant leak in NFS code sometimes occurs, can't
reproduce but currently analyzing; brueffer is
able to reproduce but THIS IS NOT an mbuma-specific
problem and currently occurs even WITHOUT mbuma.
- Issues in network locking: there is at least one
code path in the rip code where one or more locks
are acquired and we end up in m_prepend() with
M_WAITOK, which causes WITNESS to whine from within
UMA. Current temporary solution: force all UMA
allocations to be M_NOWAIT from within UMA for now
to avoid deadlocks unless WITNESS is defined and we
can determine with certainty that we're not holding
any locks when we're M_WAITOK.
- I've seen at least one weird socketbuffer empty-but-
mbuf-still-attached panic. I don't believe this
to be related to mbuma but please keep your eyes
open, turn on debugging, and capture crash dumps.
This change removes more code than it adds.
A paper is available detailing the change and considering
various performance issues, it was presented at BSDCan2004:
http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf
Please read the paper for Future Work and implementation
details, as well as credits.
Testing and Debugging:
rwatson,
brueffer,
Ketrien I. Saihr-Kesenchedra,
...
Reviewed by: Lots of people (for different parts)
2004-05-31 21:46:06 +00:00
|
|
|
u_long m_mbufs; /* XXX */
|
|
|
|
u_long m_mclusts; /* XXX */
|
|
|
|
|
1999-12-14 02:23:14 +00:00
|
|
|
u_long m_drain; /* times drained protocols for space */
|
2001-09-30 01:58:39 +00:00
|
|
|
u_long m_mcfail; /* XXX: times m_copym failed */
|
|
|
|
u_long m_mpfail; /* XXX: times m_pullup failed */
|
1999-12-14 02:23:14 +00:00
|
|
|
u_long m_msize; /* length of an mbuf */
|
|
|
|
u_long m_mclbytes; /* length of an mbuf cluster */
|
|
|
|
u_long m_minclsize; /* min length of data to allocate a cluster */
|
|
|
|
u_long m_mlen; /* length of data in an mbuf */
|
|
|
|
u_long m_mhlen; /* length of data in a header mbuf */
|
Bring in mbuma to replace mballoc.
mbuma is an Mbuf & Cluster allocator built on top of a number of
extensions to the UMA framework, all included herein.
Extensions to UMA worth noting:
- Better layering between slab <-> zone caches; introduce
Keg structure which splits off slab cache away from the
zone structure and allows multiple zones to be stacked
on top of a single Keg (single type of slab cache);
perhaps we should look into defining a subset API on
top of the Keg for special use by malloc(9),
for example.
- UMA_ZONE_REFCNT zones can now be added, and reference
counters automagically allocated for them within the end
of the associated slab structures. uma_find_refcnt()
does a kextract to fetch the slab struct reference from
the underlying page, and lookup the corresponding refcnt.
mbuma things worth noting:
- integrates mbuf & cluster allocations with extended UMA
and provides caches for commonly-allocated items; defines
several zones (two primary, one secondary) and two kegs.
- change up certain code paths that always used to do:
m_get() + m_clget() to instead just use m_getcl() and
try to take advantage of the newly defined secondary
Packet zone.
- netstat(1) and systat(1) quickly hacked up to do basic
stat reporting but additional stats work needs to be
done once some other details within UMA have been taken
care of and it becomes clearer to how stats will work
within the modified framework.
From the user perspective, one implication is that the
NMBCLUSTERS compile-time option is no longer used. The
maximum number of clusters is still capped off according
to maxusers, but it can be made unlimited by setting
the kern.ipc.nmbclusters boot-time tunable to zero.
Work should be done to write an appropriate sysctl
handler allowing dynamic tuning of kern.ipc.nmbclusters
at runtime.
Additional things worth noting/known issues (READ):
- One report of 'ips' (ServeRAID) driver acting really
slow in conjunction with mbuma. Need more data.
Latest report is that ips is equally sucking with
and without mbuma.
- Giant leak in NFS code sometimes occurs, can't
reproduce but currently analyzing; brueffer is
able to reproduce but THIS IS NOT an mbuma-specific
problem and currently occurs even WITHOUT mbuma.
- Issues in network locking: there is at least one
code path in the rip code where one or more locks
are acquired and we end up in m_prepend() with
M_WAITOK, which causes WITNESS to whine from within
UMA. Current temporary solution: force all UMA
allocations to be M_NOWAIT from within UMA for now
to avoid deadlocks unless WITNESS is defined and we
can determine with certainty that we're not holding
any locks when we're M_WAITOK.
- I've seen at least one weird socketbuffer empty-but-
mbuf-still-attached panic. I don't believe this
to be related to mbuma but please keep your eyes
open, turn on debugging, and capture crash dumps.
This change removes more code than it adds.
A paper is available detailing the change and considering
various performance issues, it was presented at BSDCan2004:
http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf
Please read the paper for Future Work and implementation
details, as well as credits.
Testing and Debugging:
rwatson,
brueffer,
Ketrien I. Saihr-Kesenchedra,
...
Reviewed by: Lots of people (for different parts)
2004-05-31 21:46:06 +00:00
|
|
|
|
Add code to allow the system to handle multiple routing tables.
This particular implementation is designed to be fully backwards compatible
and to be MFC-able to 7.x (and 6.x)
Currently the only protocol that can make use of the multiple tables is IPv4
Similar functionality exists in OpenBSD and Linux.
From my notes:
-----
One thing where FreeBSD has been falling behind, and which by chance I
have some time to work on is "policy based routing", which allows
different
packet streams to be routed by more than just the destination address.
Constraints:
------------
I want to make some form of this available in the 6.x tree
(and by extension 7.x) , but FreeBSD in general needs it so I might as
well do it in -current and back port the portions I need.
One of the ways that this can be done is to have the ability to
instantiate multiple kernel routing tables (which I will now
refer to as "Forwarding Information Bases" or "FIBs" for political
correctness reasons). Which FIB a particular packet uses to make
the next hop decision can be decided by a number of mechanisms.
The policies these mechanisms implement are the "Policies" referred
to in "Policy based routing".
One of the constraints I have if I try to back port this work to
6.x is that it must be implemented as a EXTENSION to the existing
ABIs in 6.x so that third party applications do not need to be
recompiled in timespan of the branch.
This first version will not have some of the bells and whistles that
will come with later versions. It will, for example, be limited to 16
tables in the first commit.
Implementation method, Compatible version. (part 1)
-------------------------------
For this reason I have implemented a "sufficient subset" of a
multiple routing table solution in Perforce, and back-ported it
to 6.x. (also in Perforce though not always caught up with what I
have done in -current/P4). The subset allows a number of FIBs
to be defined at compile time (8 is sufficient for my purposes in 6.x)
and implements the changes needed to allow IPV4 to use them. I have not
done the changes for ipv6 simply because I do not need it, and I do not
have enough knowledge of ipv6 (e.g. neighbor discovery) needed to do it.
Other protocol families are left untouched and should there be
users with proprietary protocol families, they should continue to work
and be oblivious to the existence of the extra FIBs.
To understand how this is done, one must know that the current FIB
code starts everything off with a single dimensional array of
pointers to FIB head structures (One per protocol family), each of
which in turn points to the trie of routes available to that family.
The basic change in the ABI compatible version of the change is to
extent that array to be a 2 dimensional array, so that
instead of protocol family X looking at rt_tables[X] for the
table it needs, it looks at rt_tables[Y][X] when for all
protocol families except ipv4 Y is always 0.
Code that is unaware of the change always just sees the first row
of the table, which of course looks just like the one dimensional
array that existed before.
The entry points rtrequest(), rtalloc(), rtalloc1(), rtalloc_ign()
are all maintained, but refer only to the first row of the array,
so that existing callers in proprietary protocols can continue to
do the "right thing".
Some new entry points are added, for the exclusive use of ipv4 code
called in_rtrequest(), in_rtalloc(), in_rtalloc1() and in_rtalloc_ign(),
which have an extra argument which refers the code to the correct row.
In addition, there are some new entry points (currently called
rtalloc_fib() and friends) that check the Address family being
looked up and call either rtalloc() (and friends) if the protocol
is not IPv4 forcing the action to row 0 or to the appropriate row
if it IS IPv4 (and that info is available). These are for calling
from code that is not specific to any particular protocol. The way
these are implemented would change in the non ABI preserving code
to be added later.
One feature of the first version of the code is that for ipv4,
the interface routes show up automatically on all the FIBs, so
that no matter what FIB you select you always have the basic
direct attached hosts available to you. (rtinit() does this
automatically).
You CAN delete an interface route from one FIB should you want
to but by default it's there. ARP information is also available
in each FIB. It's assumed that the same machine would have the
same MAC address, regardless of which FIB you are using to get
to it.
This brings us as to how the correct FIB is selected for an outgoing
IPV4 packet.
Firstly, all packets have a FIB associated with them. if nothing
has been done to change it, it will be FIB 0. The FIB is changed
in the following ways.
Packets fall into one of a number of classes.
1/ locally generated packets, coming from a socket/PCB.
Such packets select a FIB from a number associated with the
socket/PCB. This in turn is inherited from the process,
but can be changed by a socket option. The process in turn
inherits it on fork. I have written a utility call setfib
that acts a bit like nice..
setfib -3 ping target.example.com # will use fib 3 for ping.
It is an obvious extension to make it a property of a jail
but I have not done so. It can be achieved by combining the setfib and
jail commands.
2/ packets received on an interface for forwarding.
By default these packets would use table 0,
(or possibly a number settable in a sysctl(not yet)).
but prior to routing the firewall can inspect them (see below).
(possibly in the future you may be able to associate a FIB
with packets received on an interface.. An ifconfig arg, but not yet.)
3/ packets inspected by a packet classifier, which can arbitrarily
associate a fib with it on a packet by packet basis.
A fib assigned to a packet by a packet classifier
(such as ipfw) would over-ride a fib associated by
a more default source. (such as cases 1 or 2).
4/ a tcp listen socket associated with a fib will generate
accept sockets that are associated with that same fib.
5/ Packets generated in response to some other packet (e.g. reset
or icmp packets). These should use the FIB associated with the
packet being reponded to.
6/ Packets generated during encapsulation.
gif, tun and other tunnel interfaces will encapsulate using the FIB
that was in effect withthe proces that set up the tunnel.
thus setfib 1 ifconfig gif0 [tunnel instructions]
will set the fib for the tunnel to use to be fib 1.
Routing messages would be associated with their
process, and thus select one FIB or another.
messages from the kernel would be associated with the fib they
refer to and would only be received by a routing socket associated
with that fib. (not yet implemented)
In addition Netstat has been edited to be able to cope with the
fact that the array is now 2 dimensional. (It looks in system
memory using libkvm (!)). Old versions of netstat see only the first FIB.
In addition two sysctls are added to give:
a) the number of FIBs compiled in (active)
b) the default FIB of the calling process.
Early testing experience:
-------------------------
Basically our (IronPort's) appliance does this functionality already
using ipfw fwd but that method has some drawbacks.
For example,
It can't fully simulate a routing table because it can't influence the
socket's choice of local address when a connect() is done.
Testing during the generating of these changes has been
remarkably smooth so far. Multiple tables have co-existed
with no notable side effects, and packets have been routes
accordingly.
ipfw has grown 2 new keywords:
setfib N ip from anay to any
count ip from any to any fib N
In pf there seems to be a requirement to be able to give symbolic names to the
fibs but I do not have that capacity. I am not sure if it is required.
SCTP has interestingly enough built in support for this, called VRFs
in Cisco parlance. it will be interesting to see how that handles it
when it suddenly actually does something.
Where to next:
--------------------
After committing the ABI compatible version and MFCing it, I'd
like to proceed in a forward direction in -current. this will
result in some roto-tilling in the routing code.
Firstly: the current code's idea of having a separate tree per
protocol family, all of the same format, and pointed to by the
1 dimensional array is a bit silly. Especially when one considers that
there is code that makes assumptions about every protocol having the
same internal structures there. Some protocols don't WANT that
sort of structure. (for example the whole idea of a netmask is foreign
to appletalk). This needs to be made opaque to the external code.
My suggested first change is to add routing method pointers to the
'domain' structure, along with information pointing the data.
instead of having an array of pointers to uniform structures,
there would be an array pointing to the 'domain' structures
for each protocol address domain (protocol family),
and the methods this reached would be called. The methods would have
an argument that gives FIB number, but the protocol would be free
to ignore it.
When the ABI can be changed it raises the possibilty of the
addition of a fib entry into the "struct route". Currently,
the structure contains the sockaddr of the desination, and the resulting
fib entry. To make this work fully, one could add a fib number
so that given an address and a fib, one can find the third element, the
fib entry.
Interaction with the ARP layer/ LL layer would need to be
revisited as well. Qing Li has been working on this already.
This work was sponsored by Ironport Systems/Cisco
Reviewed by: several including rwatson, bz and mlair (parts each)
Obtained from: Ironport systems/Cisco
2008-05-09 23:03:00 +00:00
|
|
|
/* Number of mbtypes (gives # elems in mbtypes[] array) */
|
2001-12-23 22:04:08 +00:00
|
|
|
short m_numtypes;
|
Bring in mbuma to replace mballoc.
mbuma is an Mbuf & Cluster allocator built on top of a number of
extensions to the UMA framework, all included herein.
Extensions to UMA worth noting:
- Better layering between slab <-> zone caches; introduce
Keg structure which splits off slab cache away from the
zone structure and allows multiple zones to be stacked
on top of a single Keg (single type of slab cache);
perhaps we should look into defining a subset API on
top of the Keg for special use by malloc(9),
for example.
- UMA_ZONE_REFCNT zones can now be added, and reference
counters automagically allocated for them within the end
of the associated slab structures. uma_find_refcnt()
does a kextract to fetch the slab struct reference from
the underlying page, and lookup the corresponding refcnt.
mbuma things worth noting:
- integrates mbuf & cluster allocations with extended UMA
and provides caches for commonly-allocated items; defines
several zones (two primary, one secondary) and two kegs.
- change up certain code paths that always used to do:
m_get() + m_clget() to instead just use m_getcl() and
try to take advantage of the newly defined secondary
Packet zone.
- netstat(1) and systat(1) quickly hacked up to do basic
stat reporting but additional stats work needs to be
done once some other details within UMA have been taken
care of and it becomes clearer to how stats will work
within the modified framework.
From the user perspective, one implication is that the
NMBCLUSTERS compile-time option is no longer used. The
maximum number of clusters is still capped off according
to maxusers, but it can be made unlimited by setting
the kern.ipc.nmbclusters boot-time tunable to zero.
Work should be done to write an appropriate sysctl
handler allowing dynamic tuning of kern.ipc.nmbclusters
at runtime.
Additional things worth noting/known issues (READ):
- One report of 'ips' (ServeRAID) driver acting really
slow in conjunction with mbuma. Need more data.
Latest report is that ips is equally sucking with
and without mbuma.
- Giant leak in NFS code sometimes occurs, can't
reproduce but currently analyzing; brueffer is
able to reproduce but THIS IS NOT an mbuma-specific
problem and currently occurs even WITHOUT mbuma.
- Issues in network locking: there is at least one
code path in the rip code where one or more locks
are acquired and we end up in m_prepend() with
M_WAITOK, which causes WITNESS to whine from within
UMA. Current temporary solution: force all UMA
allocations to be M_NOWAIT from within UMA for now
to avoid deadlocks unless WITNESS is defined and we
can determine with certainty that we're not holding
any locks when we're M_WAITOK.
- I've seen at least one weird socketbuffer empty-but-
mbuf-still-attached panic. I don't believe this
to be related to mbuma but please keep your eyes
open, turn on debugging, and capture crash dumps.
This change removes more code than it adds.
A paper is available detailing the change and considering
various performance issues, it was presented at BSDCan2004:
http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf
Please read the paper for Future Work and implementation
details, as well as credits.
Testing and Debugging:
rwatson,
brueffer,
Ketrien I. Saihr-Kesenchedra,
...
Reviewed by: Lots of people (for different parts)
2004-05-31 21:46:06 +00:00
|
|
|
|
2003-12-28 08:57:09 +00:00
|
|
|
/* XXX: Sendfile stats should eventually move to their own struct */
|
|
|
|
u_long sf_iocnt; /* times sendfile had to do disk I/O */
|
|
|
|
u_long sf_allocfail; /* times sfbuf allocation failed */
|
|
|
|
u_long sf_allocwait; /* times sfbuf allocation had to wait */
|
1999-12-14 02:23:14 +00:00
|
|
|
};
|
1999-12-12 05:52:51 +00:00
|
|
|
|
2001-06-22 06:35:32 +00:00
|
|
|
/*
|
|
|
|
* Flags specifying how an allocation should be made.
|
Bring in mbuma to replace mballoc.
mbuma is an Mbuf & Cluster allocator built on top of a number of
extensions to the UMA framework, all included herein.
Extensions to UMA worth noting:
- Better layering between slab <-> zone caches; introduce
Keg structure which splits off slab cache away from the
zone structure and allows multiple zones to be stacked
on top of a single Keg (single type of slab cache);
perhaps we should look into defining a subset API on
top of the Keg for special use by malloc(9),
for example.
- UMA_ZONE_REFCNT zones can now be added, and reference
counters automagically allocated for them within the end
of the associated slab structures. uma_find_refcnt()
does a kextract to fetch the slab struct reference from
the underlying page, and lookup the corresponding refcnt.
mbuma things worth noting:
- integrates mbuf & cluster allocations with extended UMA
and provides caches for commonly-allocated items; defines
several zones (two primary, one secondary) and two kegs.
- change up certain code paths that always used to do:
m_get() + m_clget() to instead just use m_getcl() and
try to take advantage of the newly defined secondary
Packet zone.
- netstat(1) and systat(1) quickly hacked up to do basic
stat reporting but additional stats work needs to be
done once some other details within UMA have been taken
care of and it becomes clearer to how stats will work
within the modified framework.
From the user perspective, one implication is that the
NMBCLUSTERS compile-time option is no longer used. The
maximum number of clusters is still capped off according
to maxusers, but it can be made unlimited by setting
the kern.ipc.nmbclusters boot-time tunable to zero.
Work should be done to write an appropriate sysctl
handler allowing dynamic tuning of kern.ipc.nmbclusters
at runtime.
Additional things worth noting/known issues (READ):
- One report of 'ips' (ServeRAID) driver acting really
slow in conjunction with mbuma. Need more data.
Latest report is that ips is equally sucking with
and without mbuma.
- Giant leak in NFS code sometimes occurs, can't
reproduce but currently analyzing; brueffer is
able to reproduce but THIS IS NOT an mbuma-specific
problem and currently occurs even WITHOUT mbuma.
- Issues in network locking: there is at least one
code path in the rip code where one or more locks
are acquired and we end up in m_prepend() with
M_WAITOK, which causes WITNESS to whine from within
UMA. Current temporary solution: force all UMA
allocations to be M_NOWAIT from within UMA for now
to avoid deadlocks unless WITNESS is defined and we
can determine with certainty that we're not holding
any locks when we're M_WAITOK.
- I've seen at least one weird socketbuffer empty-but-
mbuf-still-attached panic. I don't believe this
to be related to mbuma but please keep your eyes
open, turn on debugging, and capture crash dumps.
This change removes more code than it adds.
A paper is available detailing the change and considering
various performance issues, it was presented at BSDCan2004:
http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf
Please read the paper for Future Work and implementation
details, as well as credits.
Testing and Debugging:
rwatson,
brueffer,
Ketrien I. Saihr-Kesenchedra,
...
Reviewed by: Lots of people (for different parts)
2004-05-31 21:46:06 +00:00
|
|
|
*
|
|
|
|
* The flag to use is as follows:
|
|
|
|
* - M_DONTWAIT or M_NOWAIT from an interrupt handler to not block allocation.
|
2008-03-25 09:39:02 +00:00
|
|
|
* - M_WAIT or M_WAITOK from wherever it is safe to block.
|
Bring in mbuma to replace mballoc.
mbuma is an Mbuf & Cluster allocator built on top of a number of
extensions to the UMA framework, all included herein.
Extensions to UMA worth noting:
- Better layering between slab <-> zone caches; introduce
Keg structure which splits off slab cache away from the
zone structure and allows multiple zones to be stacked
on top of a single Keg (single type of slab cache);
perhaps we should look into defining a subset API on
top of the Keg for special use by malloc(9),
for example.
- UMA_ZONE_REFCNT zones can now be added, and reference
counters automagically allocated for them within the end
of the associated slab structures. uma_find_refcnt()
does a kextract to fetch the slab struct reference from
the underlying page, and lookup the corresponding refcnt.
mbuma things worth noting:
- integrates mbuf & cluster allocations with extended UMA
and provides caches for commonly-allocated items; defines
several zones (two primary, one secondary) and two kegs.
- change up certain code paths that always used to do:
m_get() + m_clget() to instead just use m_getcl() and
try to take advantage of the newly defined secondary
Packet zone.
- netstat(1) and systat(1) quickly hacked up to do basic
stat reporting but additional stats work needs to be
done once some other details within UMA have been taken
care of and it becomes clearer to how stats will work
within the modified framework.
From the user perspective, one implication is that the
NMBCLUSTERS compile-time option is no longer used. The
maximum number of clusters is still capped off according
to maxusers, but it can be made unlimited by setting
the kern.ipc.nmbclusters boot-time tunable to zero.
Work should be done to write an appropriate sysctl
handler allowing dynamic tuning of kern.ipc.nmbclusters
at runtime.
Additional things worth noting/known issues (READ):
- One report of 'ips' (ServeRAID) driver acting really
slow in conjunction with mbuma. Need more data.
Latest report is that ips is equally sucking with
and without mbuma.
- Giant leak in NFS code sometimes occurs, can't
reproduce but currently analyzing; brueffer is
able to reproduce but THIS IS NOT an mbuma-specific
problem and currently occurs even WITHOUT mbuma.
- Issues in network locking: there is at least one
code path in the rip code where one or more locks
are acquired and we end up in m_prepend() with
M_WAITOK, which causes WITNESS to whine from within
UMA. Current temporary solution: force all UMA
allocations to be M_NOWAIT from within UMA for now
to avoid deadlocks unless WITNESS is defined and we
can determine with certainty that we're not holding
any locks when we're M_WAITOK.
- I've seen at least one weird socketbuffer empty-but-
mbuf-still-attached panic. I don't believe this
to be related to mbuma but please keep your eyes
open, turn on debugging, and capture crash dumps.
This change removes more code than it adds.
A paper is available detailing the change and considering
various performance issues, it was presented at BSDCan2004:
http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf
Please read the paper for Future Work and implementation
details, as well as credits.
Testing and Debugging:
rwatson,
brueffer,
Ketrien I. Saihr-Kesenchedra,
...
Reviewed by: Lots of people (for different parts)
2004-05-31 21:46:06 +00:00
|
|
|
*
|
2007-03-24 20:19:44 +00:00
|
|
|
* M_DONTWAIT/M_NOWAIT means that we will not block the thread explicitly and
|
|
|
|
* if we cannot allocate immediately we may return NULL, whereas
|
2008-03-25 09:39:02 +00:00
|
|
|
* M_WAIT/M_WAITOK means that if we cannot allocate resources we
|
2007-03-24 20:19:44 +00:00
|
|
|
* will block until they are available, and thus never return NULL.
|
Bring in mbuma to replace mballoc.
mbuma is an Mbuf & Cluster allocator built on top of a number of
extensions to the UMA framework, all included herein.
Extensions to UMA worth noting:
- Better layering between slab <-> zone caches; introduce
Keg structure which splits off slab cache away from the
zone structure and allows multiple zones to be stacked
on top of a single Keg (single type of slab cache);
perhaps we should look into defining a subset API on
top of the Keg for special use by malloc(9),
for example.
- UMA_ZONE_REFCNT zones can now be added, and reference
counters automagically allocated for them within the end
of the associated slab structures. uma_find_refcnt()
does a kextract to fetch the slab struct reference from
the underlying page, and lookup the corresponding refcnt.
mbuma things worth noting:
- integrates mbuf & cluster allocations with extended UMA
and provides caches for commonly-allocated items; defines
several zones (two primary, one secondary) and two kegs.
- change up certain code paths that always used to do:
m_get() + m_clget() to instead just use m_getcl() and
try to take advantage of the newly defined secondary
Packet zone.
- netstat(1) and systat(1) quickly hacked up to do basic
stat reporting but additional stats work needs to be
done once some other details within UMA have been taken
care of and it becomes clearer to how stats will work
within the modified framework.
From the user perspective, one implication is that the
NMBCLUSTERS compile-time option is no longer used. The
maximum number of clusters is still capped off according
to maxusers, but it can be made unlimited by setting
the kern.ipc.nmbclusters boot-time tunable to zero.
Work should be done to write an appropriate sysctl
handler allowing dynamic tuning of kern.ipc.nmbclusters
at runtime.
Additional things worth noting/known issues (READ):
- One report of 'ips' (ServeRAID) driver acting really
slow in conjunction with mbuma. Need more data.
Latest report is that ips is equally sucking with
and without mbuma.
- Giant leak in NFS code sometimes occurs, can't
reproduce but currently analyzing; brueffer is
able to reproduce but THIS IS NOT an mbuma-specific
problem and currently occurs even WITHOUT mbuma.
- Issues in network locking: there is at least one
code path in the rip code where one or more locks
are acquired and we end up in m_prepend() with
M_WAITOK, which causes WITNESS to whine from within
UMA. Current temporary solution: force all UMA
allocations to be M_NOWAIT from within UMA for now
to avoid deadlocks unless WITNESS is defined and we
can determine with certainty that we're not holding
any locks when we're M_WAITOK.
- I've seen at least one weird socketbuffer empty-but-
mbuf-still-attached panic. I don't believe this
to be related to mbuma but please keep your eyes
open, turn on debugging, and capture crash dumps.
This change removes more code than it adds.
A paper is available detailing the change and considering
various performance issues, it was presented at BSDCan2004:
http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf
Please read the paper for Future Work and implementation
details, as well as credits.
Testing and Debugging:
rwatson,
brueffer,
Ketrien I. Saihr-Kesenchedra,
...
Reviewed by: Lots of people (for different parts)
2004-05-31 21:46:06 +00:00
|
|
|
*
|
|
|
|
* XXX Eventually just phase this out to use M_WAITOK/M_NOWAIT.
|
2001-06-22 06:35:32 +00:00
|
|
|
*/
|
Bring in mbuma to replace mballoc.
mbuma is an Mbuf & Cluster allocator built on top of a number of
extensions to the UMA framework, all included herein.
Extensions to UMA worth noting:
- Better layering between slab <-> zone caches; introduce
Keg structure which splits off slab cache away from the
zone structure and allows multiple zones to be stacked
on top of a single Keg (single type of slab cache);
perhaps we should look into defining a subset API on
top of the Keg for special use by malloc(9),
for example.
- UMA_ZONE_REFCNT zones can now be added, and reference
counters automagically allocated for them within the end
of the associated slab structures. uma_find_refcnt()
does a kextract to fetch the slab struct reference from
the underlying page, and lookup the corresponding refcnt.
mbuma things worth noting:
- integrates mbuf & cluster allocations with extended UMA
and provides caches for commonly-allocated items; defines
several zones (two primary, one secondary) and two kegs.
- change up certain code paths that always used to do:
m_get() + m_clget() to instead just use m_getcl() and
try to take advantage of the newly defined secondary
Packet zone.
- netstat(1) and systat(1) quickly hacked up to do basic
stat reporting but additional stats work needs to be
done once some other details within UMA have been taken
care of and it becomes clearer to how stats will work
within the modified framework.
From the user perspective, one implication is that the
NMBCLUSTERS compile-time option is no longer used. The
maximum number of clusters is still capped off according
to maxusers, but it can be made unlimited by setting
the kern.ipc.nmbclusters boot-time tunable to zero.
Work should be done to write an appropriate sysctl
handler allowing dynamic tuning of kern.ipc.nmbclusters
at runtime.
Additional things worth noting/known issues (READ):
- One report of 'ips' (ServeRAID) driver acting really
slow in conjunction with mbuma. Need more data.
Latest report is that ips is equally sucking with
and without mbuma.
- Giant leak in NFS code sometimes occurs, can't
reproduce but currently analyzing; brueffer is
able to reproduce but THIS IS NOT an mbuma-specific
problem and currently occurs even WITHOUT mbuma.
- Issues in network locking: there is at least one
code path in the rip code where one or more locks
are acquired and we end up in m_prepend() with
M_WAITOK, which causes WITNESS to whine from within
UMA. Current temporary solution: force all UMA
allocations to be M_NOWAIT from within UMA for now
to avoid deadlocks unless WITNESS is defined and we
can determine with certainty that we're not holding
any locks when we're M_WAITOK.
- I've seen at least one weird socketbuffer empty-but-
mbuf-still-attached panic. I don't believe this
to be related to mbuma but please keep your eyes
open, turn on debugging, and capture crash dumps.
This change removes more code than it adds.
A paper is available detailing the change and considering
various performance issues, it was presented at BSDCan2004:
http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf
Please read the paper for Future Work and implementation
details, as well as credits.
Testing and Debugging:
rwatson,
brueffer,
Ketrien I. Saihr-Kesenchedra,
...
Reviewed by: Lots of people (for different parts)
2004-05-31 21:46:06 +00:00
|
|
|
#define MBTOM(how) (how)
|
|
|
|
#define M_DONTWAIT M_NOWAIT
|
|
|
|
#define M_TRYWAIT M_WAITOK
|
|
|
|
#define M_WAIT M_WAITOK
|
1999-12-12 05:52:51 +00:00
|
|
|
|
2005-07-17 14:04:03 +00:00
|
|
|
/*
|
|
|
|
* String names of mbuf-related UMA(9) and malloc(9) types. Exposed to
|
|
|
|
* !_KERNEL so that monitoring tools can look up the zones with
|
|
|
|
* libmemstat(3).
|
|
|
|
*/
|
|
|
|
#define MBUF_MEM_NAME "mbuf"
|
|
|
|
#define MBUF_CLUSTER_MEM_NAME "mbuf_cluster"
|
|
|
|
#define MBUF_PACKET_MEM_NAME "mbuf_packet"
|
2007-12-25 14:17:16 +00:00
|
|
|
#define MBUF_JUMBOP_MEM_NAME "mbuf_jumbo_page"
|
2005-11-02 16:20:36 +00:00
|
|
|
#define MBUF_JUMBO9_MEM_NAME "mbuf_jumbo_9k"
|
|
|
|
#define MBUF_JUMBO16_MEM_NAME "mbuf_jumbo_16k"
|
2005-07-17 14:04:03 +00:00
|
|
|
#define MBUF_TAG_MEM_NAME "mbuf_tag"
|
2005-11-02 16:20:36 +00:00
|
|
|
#define MBUF_EXTREFCNT_MEM_NAME "mbuf_ext_refcnt"
|
2005-07-17 14:04:03 +00:00
|
|
|
|
2001-04-05 03:55:27 +00:00
|
|
|
#ifdef _KERNEL
|
2000-08-19 08:32:59 +00:00
|
|
|
|
2004-07-21 15:42:02 +00:00
|
|
|
#ifdef WITNESS
|
2007-03-24 20:19:44 +00:00
|
|
|
#define MBUF_CHECKSLEEP(how) do { \
|
2004-07-21 07:12:24 +00:00
|
|
|
if (how == M_WAITOK) \
|
|
|
|
WITNESS_WARN(WARN_GIANTOK | WARN_SLEEPOK, NULL, \
|
|
|
|
"Sleeping in \"%s\"", __func__); \
|
2005-11-02 16:20:36 +00:00
|
|
|
} while (0)
|
2004-07-21 15:42:02 +00:00
|
|
|
#else
|
2007-03-24 20:19:44 +00:00
|
|
|
#define MBUF_CHECKSLEEP(how)
|
2004-07-21 15:42:02 +00:00
|
|
|
#endif
|
2004-07-21 07:12:24 +00:00
|
|
|
|
Bring in mbuma to replace mballoc.
mbuma is an Mbuf & Cluster allocator built on top of a number of
extensions to the UMA framework, all included herein.
Extensions to UMA worth noting:
- Better layering between slab <-> zone caches; introduce
Keg structure which splits off slab cache away from the
zone structure and allows multiple zones to be stacked
on top of a single Keg (single type of slab cache);
perhaps we should look into defining a subset API on
top of the Keg for special use by malloc(9),
for example.
- UMA_ZONE_REFCNT zones can now be added, and reference
counters automagically allocated for them within the end
of the associated slab structures. uma_find_refcnt()
does a kextract to fetch the slab struct reference from
the underlying page, and lookup the corresponding refcnt.
mbuma things worth noting:
- integrates mbuf & cluster allocations with extended UMA
and provides caches for commonly-allocated items; defines
several zones (two primary, one secondary) and two kegs.
- change up certain code paths that always used to do:
m_get() + m_clget() to instead just use m_getcl() and
try to take advantage of the newly defined secondary
Packet zone.
- netstat(1) and systat(1) quickly hacked up to do basic
stat reporting but additional stats work needs to be
done once some other details within UMA have been taken
care of and it becomes clearer to how stats will work
within the modified framework.
From the user perspective, one implication is that the
NMBCLUSTERS compile-time option is no longer used. The
maximum number of clusters is still capped off according
to maxusers, but it can be made unlimited by setting
the kern.ipc.nmbclusters boot-time tunable to zero.
Work should be done to write an appropriate sysctl
handler allowing dynamic tuning of kern.ipc.nmbclusters
at runtime.
Additional things worth noting/known issues (READ):
- One report of 'ips' (ServeRAID) driver acting really
slow in conjunction with mbuma. Need more data.
Latest report is that ips is equally sucking with
and without mbuma.
- Giant leak in NFS code sometimes occurs, can't
reproduce but currently analyzing; brueffer is
able to reproduce but THIS IS NOT an mbuma-specific
problem and currently occurs even WITHOUT mbuma.
- Issues in network locking: there is at least one
code path in the rip code where one or more locks
are acquired and we end up in m_prepend() with
M_WAITOK, which causes WITNESS to whine from within
UMA. Current temporary solution: force all UMA
allocations to be M_NOWAIT from within UMA for now
to avoid deadlocks unless WITNESS is defined and we
can determine with certainty that we're not holding
any locks when we're M_WAITOK.
- I've seen at least one weird socketbuffer empty-but-
mbuf-still-attached panic. I don't believe this
to be related to mbuma but please keep your eyes
open, turn on debugging, and capture crash dumps.
This change removes more code than it adds.
A paper is available detailing the change and considering
various performance issues, it was presented at BSDCan2004:
http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf
Please read the paper for Future Work and implementation
details, as well as credits.
Testing and Debugging:
rwatson,
brueffer,
Ketrien I. Saihr-Kesenchedra,
...
Reviewed by: Lots of people (for different parts)
2004-05-31 21:46:06 +00:00
|
|
|
/*
|
|
|
|
* Network buffer allocation API
|
|
|
|
*
|
2004-10-12 20:18:27 +00:00
|
|
|
* The rest of it is defined in kern/kern_mbuf.c
|
Bring in mbuma to replace mballoc.
mbuma is an Mbuf & Cluster allocator built on top of a number of
extensions to the UMA framework, all included herein.
Extensions to UMA worth noting:
- Better layering between slab <-> zone caches; introduce
Keg structure which splits off slab cache away from the
zone structure and allows multiple zones to be stacked
on top of a single Keg (single type of slab cache);
perhaps we should look into defining a subset API on
top of the Keg for special use by malloc(9),
for example.
- UMA_ZONE_REFCNT zones can now be added, and reference
counters automagically allocated for them within the end
of the associated slab structures. uma_find_refcnt()
does a kextract to fetch the slab struct reference from
the underlying page, and lookup the corresponding refcnt.
mbuma things worth noting:
- integrates mbuf & cluster allocations with extended UMA
and provides caches for commonly-allocated items; defines
several zones (two primary, one secondary) and two kegs.
- change up certain code paths that always used to do:
m_get() + m_clget() to instead just use m_getcl() and
try to take advantage of the newly defined secondary
Packet zone.
- netstat(1) and systat(1) quickly hacked up to do basic
stat reporting but additional stats work needs to be
done once some other details within UMA have been taken
care of and it becomes clearer to how stats will work
within the modified framework.
From the user perspective, one implication is that the
NMBCLUSTERS compile-time option is no longer used. The
maximum number of clusters is still capped off according
to maxusers, but it can be made unlimited by setting
the kern.ipc.nmbclusters boot-time tunable to zero.
Work should be done to write an appropriate sysctl
handler allowing dynamic tuning of kern.ipc.nmbclusters
at runtime.
Additional things worth noting/known issues (READ):
- One report of 'ips' (ServeRAID) driver acting really
slow in conjunction with mbuma. Need more data.
Latest report is that ips is equally sucking with
and without mbuma.
- Giant leak in NFS code sometimes occurs, can't
reproduce but currently analyzing; brueffer is
able to reproduce but THIS IS NOT an mbuma-specific
problem and currently occurs even WITHOUT mbuma.
- Issues in network locking: there is at least one
code path in the rip code where one or more locks
are acquired and we end up in m_prepend() with
M_WAITOK, which causes WITNESS to whine from within
UMA. Current temporary solution: force all UMA
allocations to be M_NOWAIT from within UMA for now
to avoid deadlocks unless WITNESS is defined and we
can determine with certainty that we're not holding
any locks when we're M_WAITOK.
- I've seen at least one weird socketbuffer empty-but-
mbuf-still-attached panic. I don't believe this
to be related to mbuma but please keep your eyes
open, turn on debugging, and capture crash dumps.
This change removes more code than it adds.
A paper is available detailing the change and considering
various performance issues, it was presented at BSDCan2004:
http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf
Please read the paper for Future Work and implementation
details, as well as credits.
Testing and Debugging:
rwatson,
brueffer,
Ketrien I. Saihr-Kesenchedra,
...
Reviewed by: Lots of people (for different parts)
2004-05-31 21:46:06 +00:00
|
|
|
*/
|
|
|
|
|
|
|
|
extern uma_zone_t zone_mbuf;
|
|
|
|
extern uma_zone_t zone_clust;
|
|
|
|
extern uma_zone_t zone_pack;
|
2006-02-17 14:14:15 +00:00
|
|
|
extern uma_zone_t zone_jumbop;
|
2005-11-02 16:20:36 +00:00
|
|
|
extern uma_zone_t zone_jumbo9;
|
|
|
|
extern uma_zone_t zone_jumbo16;
|
|
|
|
extern uma_zone_t zone_ext_refcnt;
|
Bring in mbuma to replace mballoc.
mbuma is an Mbuf & Cluster allocator built on top of a number of
extensions to the UMA framework, all included herein.
Extensions to UMA worth noting:
- Better layering between slab <-> zone caches; introduce
Keg structure which splits off slab cache away from the
zone structure and allows multiple zones to be stacked
on top of a single Keg (single type of slab cache);
perhaps we should look into defining a subset API on
top of the Keg for special use by malloc(9),
for example.
- UMA_ZONE_REFCNT zones can now be added, and reference
counters automagically allocated for them within the end
of the associated slab structures. uma_find_refcnt()
does a kextract to fetch the slab struct reference from
the underlying page, and lookup the corresponding refcnt.
mbuma things worth noting:
- integrates mbuf & cluster allocations with extended UMA
and provides caches for commonly-allocated items; defines
several zones (two primary, one secondary) and two kegs.
- change up certain code paths that always used to do:
m_get() + m_clget() to instead just use m_getcl() and
try to take advantage of the newly defined secondary
Packet zone.
- netstat(1) and systat(1) quickly hacked up to do basic
stat reporting but additional stats work needs to be
done once some other details within UMA have been taken
care of and it becomes clearer to how stats will work
within the modified framework.
From the user perspective, one implication is that the
NMBCLUSTERS compile-time option is no longer used. The
maximum number of clusters is still capped off according
to maxusers, but it can be made unlimited by setting
the kern.ipc.nmbclusters boot-time tunable to zero.
Work should be done to write an appropriate sysctl
handler allowing dynamic tuning of kern.ipc.nmbclusters
at runtime.
Additional things worth noting/known issues (READ):
- One report of 'ips' (ServeRAID) driver acting really
slow in conjunction with mbuma. Need more data.
Latest report is that ips is equally sucking with
and without mbuma.
- Giant leak in NFS code sometimes occurs, can't
reproduce but currently analyzing; brueffer is
able to reproduce but THIS IS NOT an mbuma-specific
problem and currently occurs even WITHOUT mbuma.
- Issues in network locking: there is at least one
code path in the rip code where one or more locks
are acquired and we end up in m_prepend() with
M_WAITOK, which causes WITNESS to whine from within
UMA. Current temporary solution: force all UMA
allocations to be M_NOWAIT from within UMA for now
to avoid deadlocks unless WITNESS is defined and we
can determine with certainty that we're not holding
any locks when we're M_WAITOK.
- I've seen at least one weird socketbuffer empty-but-
mbuf-still-attached panic. I don't believe this
to be related to mbuma but please keep your eyes
open, turn on debugging, and capture crash dumps.
This change removes more code than it adds.
A paper is available detailing the change and considering
various performance issues, it was presented at BSDCan2004:
http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf
Please read the paper for Future Work and implementation
details, as well as credits.
Testing and Debugging:
rwatson,
brueffer,
Ketrien I. Saihr-Kesenchedra,
...
Reviewed by: Lots of people (for different parts)
2004-05-31 21:46:06 +00:00
|
|
|
|
2007-04-14 20:31:05 +00:00
|
|
|
static __inline struct mbuf *m_getcl(int how, short type, int flags);
|
Bring in mbuma to replace mballoc.
mbuma is an Mbuf & Cluster allocator built on top of a number of
extensions to the UMA framework, all included herein.
Extensions to UMA worth noting:
- Better layering between slab <-> zone caches; introduce
Keg structure which splits off slab cache away from the
zone structure and allows multiple zones to be stacked
on top of a single Keg (single type of slab cache);
perhaps we should look into defining a subset API on
top of the Keg for special use by malloc(9),
for example.
- UMA_ZONE_REFCNT zones can now be added, and reference
counters automagically allocated for them within the end
of the associated slab structures. uma_find_refcnt()
does a kextract to fetch the slab struct reference from
the underlying page, and lookup the corresponding refcnt.
mbuma things worth noting:
- integrates mbuf & cluster allocations with extended UMA
and provides caches for commonly-allocated items; defines
several zones (two primary, one secondary) and two kegs.
- change up certain code paths that always used to do:
m_get() + m_clget() to instead just use m_getcl() and
try to take advantage of the newly defined secondary
Packet zone.
- netstat(1) and systat(1) quickly hacked up to do basic
stat reporting but additional stats work needs to be
done once some other details within UMA have been taken
care of and it becomes clearer to how stats will work
within the modified framework.
From the user perspective, one implication is that the
NMBCLUSTERS compile-time option is no longer used. The
maximum number of clusters is still capped off according
to maxusers, but it can be made unlimited by setting
the kern.ipc.nmbclusters boot-time tunable to zero.
Work should be done to write an appropriate sysctl
handler allowing dynamic tuning of kern.ipc.nmbclusters
at runtime.
Additional things worth noting/known issues (READ):
- One report of 'ips' (ServeRAID) driver acting really
slow in conjunction with mbuma. Need more data.
Latest report is that ips is equally sucking with
and without mbuma.
- Giant leak in NFS code sometimes occurs, can't
reproduce but currently analyzing; brueffer is
able to reproduce but THIS IS NOT an mbuma-specific
problem and currently occurs even WITHOUT mbuma.
- Issues in network locking: there is at least one
code path in the rip code where one or more locks
are acquired and we end up in m_prepend() with
M_WAITOK, which causes WITNESS to whine from within
UMA. Current temporary solution: force all UMA
allocations to be M_NOWAIT from within UMA for now
to avoid deadlocks unless WITNESS is defined and we
can determine with certainty that we're not holding
any locks when we're M_WAITOK.
- I've seen at least one weird socketbuffer empty-but-
mbuf-still-attached panic. I don't believe this
to be related to mbuma but please keep your eyes
open, turn on debugging, and capture crash dumps.
This change removes more code than it adds.
A paper is available detailing the change and considering
various performance issues, it was presented at BSDCan2004:
http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf
Please read the paper for Future Work and implementation
details, as well as credits.
Testing and Debugging:
rwatson,
brueffer,
Ketrien I. Saihr-Kesenchedra,
...
Reviewed by: Lots of people (for different parts)
2004-05-31 21:46:06 +00:00
|
|
|
static __inline struct mbuf *m_get(int how, short type);
|
|
|
|
static __inline struct mbuf *m_gethdr(int how, short type);
|
2007-03-24 20:19:44 +00:00
|
|
|
static __inline struct mbuf *m_getjcl(int how, short type, int flags,
|
|
|
|
int size);
|
Bring in mbuma to replace mballoc.
mbuma is an Mbuf & Cluster allocator built on top of a number of
extensions to the UMA framework, all included herein.
Extensions to UMA worth noting:
- Better layering between slab <-> zone caches; introduce
Keg structure which splits off slab cache away from the
zone structure and allows multiple zones to be stacked
on top of a single Keg (single type of slab cache);
perhaps we should look into defining a subset API on
top of the Keg for special use by malloc(9),
for example.
- UMA_ZONE_REFCNT zones can now be added, and reference
counters automagically allocated for them within the end
of the associated slab structures. uma_find_refcnt()
does a kextract to fetch the slab struct reference from
the underlying page, and lookup the corresponding refcnt.
mbuma things worth noting:
- integrates mbuf & cluster allocations with extended UMA
and provides caches for commonly-allocated items; defines
several zones (two primary, one secondary) and two kegs.
- change up certain code paths that always used to do:
m_get() + m_clget() to instead just use m_getcl() and
try to take advantage of the newly defined secondary
Packet zone.
- netstat(1) and systat(1) quickly hacked up to do basic
stat reporting but additional stats work needs to be
done once some other details within UMA have been taken
care of and it becomes clearer to how stats will work
within the modified framework.
From the user perspective, one implication is that the
NMBCLUSTERS compile-time option is no longer used. The
maximum number of clusters is still capped off according
to maxusers, but it can be made unlimited by setting
the kern.ipc.nmbclusters boot-time tunable to zero.
Work should be done to write an appropriate sysctl
handler allowing dynamic tuning of kern.ipc.nmbclusters
at runtime.
Additional things worth noting/known issues (READ):
- One report of 'ips' (ServeRAID) driver acting really
slow in conjunction with mbuma. Need more data.
Latest report is that ips is equally sucking with
and without mbuma.
- Giant leak in NFS code sometimes occurs, can't
reproduce but currently analyzing; brueffer is
able to reproduce but THIS IS NOT an mbuma-specific
problem and currently occurs even WITHOUT mbuma.
- Issues in network locking: there is at least one
code path in the rip code where one or more locks
are acquired and we end up in m_prepend() with
M_WAITOK, which causes WITNESS to whine from within
UMA. Current temporary solution: force all UMA
allocations to be M_NOWAIT from within UMA for now
to avoid deadlocks unless WITNESS is defined and we
can determine with certainty that we're not holding
any locks when we're M_WAITOK.
- I've seen at least one weird socketbuffer empty-but-
mbuf-still-attached panic. I don't believe this
to be related to mbuma but please keep your eyes
open, turn on debugging, and capture crash dumps.
This change removes more code than it adds.
A paper is available detailing the change and considering
various performance issues, it was presented at BSDCan2004:
http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf
Please read the paper for Future Work and implementation
details, as well as credits.
Testing and Debugging:
rwatson,
brueffer,
Ketrien I. Saihr-Kesenchedra,
...
Reviewed by: Lots of people (for different parts)
2004-05-31 21:46:06 +00:00
|
|
|
static __inline struct mbuf *m_getclr(int how, short type); /* XXX */
|
2009-06-19 21:14:39 +00:00
|
|
|
static __inline int m_init(struct mbuf *m, uma_zone_t zone,
|
|
|
|
int size, int how, short type, int flags);
|
Bring in mbuma to replace mballoc.
mbuma is an Mbuf & Cluster allocator built on top of a number of
extensions to the UMA framework, all included herein.
Extensions to UMA worth noting:
- Better layering between slab <-> zone caches; introduce
Keg structure which splits off slab cache away from the
zone structure and allows multiple zones to be stacked
on top of a single Keg (single type of slab cache);
perhaps we should look into defining a subset API on
top of the Keg for special use by malloc(9),
for example.
- UMA_ZONE_REFCNT zones can now be added, and reference
counters automagically allocated for them within the end
of the associated slab structures. uma_find_refcnt()
does a kextract to fetch the slab struct reference from
the underlying page, and lookup the corresponding refcnt.
mbuma things worth noting:
- integrates mbuf & cluster allocations with extended UMA
and provides caches for commonly-allocated items; defines
several zones (two primary, one secondary) and two kegs.
- change up certain code paths that always used to do:
m_get() + m_clget() to instead just use m_getcl() and
try to take advantage of the newly defined secondary
Packet zone.
- netstat(1) and systat(1) quickly hacked up to do basic
stat reporting but additional stats work needs to be
done once some other details within UMA have been taken
care of and it becomes clearer to how stats will work
within the modified framework.
From the user perspective, one implication is that the
NMBCLUSTERS compile-time option is no longer used. The
maximum number of clusters is still capped off according
to maxusers, but it can be made unlimited by setting
the kern.ipc.nmbclusters boot-time tunable to zero.
Work should be done to write an appropriate sysctl
handler allowing dynamic tuning of kern.ipc.nmbclusters
at runtime.
Additional things worth noting/known issues (READ):
- One report of 'ips' (ServeRAID) driver acting really
slow in conjunction with mbuma. Need more data.
Latest report is that ips is equally sucking with
and without mbuma.
- Giant leak in NFS code sometimes occurs, can't
reproduce but currently analyzing; brueffer is
able to reproduce but THIS IS NOT an mbuma-specific
problem and currently occurs even WITHOUT mbuma.
- Issues in network locking: there is at least one
code path in the rip code where one or more locks
are acquired and we end up in m_prepend() with
M_WAITOK, which causes WITNESS to whine from within
UMA. Current temporary solution: force all UMA
allocations to be M_NOWAIT from within UMA for now
to avoid deadlocks unless WITNESS is defined and we
can determine with certainty that we're not holding
any locks when we're M_WAITOK.
- I've seen at least one weird socketbuffer empty-but-
mbuf-still-attached panic. I don't believe this
to be related to mbuma but please keep your eyes
open, turn on debugging, and capture crash dumps.
This change removes more code than it adds.
A paper is available detailing the change and considering
various performance issues, it was presented at BSDCan2004:
http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf
Please read the paper for Future Work and implementation
details, as well as credits.
Testing and Debugging:
rwatson,
brueffer,
Ketrien I. Saihr-Kesenchedra,
...
Reviewed by: Lots of people (for different parts)
2004-05-31 21:46:06 +00:00
|
|
|
static __inline struct mbuf *m_free(struct mbuf *m);
|
|
|
|
static __inline void m_clget(struct mbuf *m, int how);
|
2005-12-08 13:13:06 +00:00
|
|
|
static __inline void *m_cljget(struct mbuf *m, int how, int size);
|
Bring in mbuma to replace mballoc.
mbuma is an Mbuf & Cluster allocator built on top of a number of
extensions to the UMA framework, all included herein.
Extensions to UMA worth noting:
- Better layering between slab <-> zone caches; introduce
Keg structure which splits off slab cache away from the
zone structure and allows multiple zones to be stacked
on top of a single Keg (single type of slab cache);
perhaps we should look into defining a subset API on
top of the Keg for special use by malloc(9),
for example.
- UMA_ZONE_REFCNT zones can now be added, and reference
counters automagically allocated for them within the end
of the associated slab structures. uma_find_refcnt()
does a kextract to fetch the slab struct reference from
the underlying page, and lookup the corresponding refcnt.
mbuma things worth noting:
- integrates mbuf & cluster allocations with extended UMA
and provides caches for commonly-allocated items; defines
several zones (two primary, one secondary) and two kegs.
- change up certain code paths that always used to do:
m_get() + m_clget() to instead just use m_getcl() and
try to take advantage of the newly defined secondary
Packet zone.
- netstat(1) and systat(1) quickly hacked up to do basic
stat reporting but additional stats work needs to be
done once some other details within UMA have been taken
care of and it becomes clearer to how stats will work
within the modified framework.
From the user perspective, one implication is that the
NMBCLUSTERS compile-time option is no longer used. The
maximum number of clusters is still capped off according
to maxusers, but it can be made unlimited by setting
the kern.ipc.nmbclusters boot-time tunable to zero.
Work should be done to write an appropriate sysctl
handler allowing dynamic tuning of kern.ipc.nmbclusters
at runtime.
Additional things worth noting/known issues (READ):
- One report of 'ips' (ServeRAID) driver acting really
slow in conjunction with mbuma. Need more data.
Latest report is that ips is equally sucking with
and without mbuma.
- Giant leak in NFS code sometimes occurs, can't
reproduce but currently analyzing; brueffer is
able to reproduce but THIS IS NOT an mbuma-specific
problem and currently occurs even WITHOUT mbuma.
- Issues in network locking: there is at least one
code path in the rip code where one or more locks
are acquired and we end up in m_prepend() with
M_WAITOK, which causes WITNESS to whine from within
UMA. Current temporary solution: force all UMA
allocations to be M_NOWAIT from within UMA for now
to avoid deadlocks unless WITNESS is defined and we
can determine with certainty that we're not holding
any locks when we're M_WAITOK.
- I've seen at least one weird socketbuffer empty-but-
mbuf-still-attached panic. I don't believe this
to be related to mbuma but please keep your eyes
open, turn on debugging, and capture crash dumps.
This change removes more code than it adds.
A paper is available detailing the change and considering
various performance issues, it was presented at BSDCan2004:
http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf
Please read the paper for Future Work and implementation
details, as well as credits.
Testing and Debugging:
rwatson,
brueffer,
Ketrien I. Saihr-Kesenchedra,
...
Reviewed by: Lots of people (for different parts)
2004-05-31 21:46:06 +00:00
|
|
|
static __inline void m_chtype(struct mbuf *m, short new_type);
|
|
|
|
void mb_free_ext(struct mbuf *);
|
2007-04-11 23:13:12 +00:00
|
|
|
static __inline struct mbuf *m_last(struct mbuf *m);
|
2009-06-19 21:14:39 +00:00
|
|
|
int m_pkthdr_init(struct mbuf *m, int how);
|
Bring in mbuma to replace mballoc.
mbuma is an Mbuf & Cluster allocator built on top of a number of
extensions to the UMA framework, all included herein.
Extensions to UMA worth noting:
- Better layering between slab <-> zone caches; introduce
Keg structure which splits off slab cache away from the
zone structure and allows multiple zones to be stacked
on top of a single Keg (single type of slab cache);
perhaps we should look into defining a subset API on
top of the Keg for special use by malloc(9),
for example.
- UMA_ZONE_REFCNT zones can now be added, and reference
counters automagically allocated for them within the end
of the associated slab structures. uma_find_refcnt()
does a kextract to fetch the slab struct reference from
the underlying page, and lookup the corresponding refcnt.
mbuma things worth noting:
- integrates mbuf & cluster allocations with extended UMA
and provides caches for commonly-allocated items; defines
several zones (two primary, one secondary) and two kegs.
- change up certain code paths that always used to do:
m_get() + m_clget() to instead just use m_getcl() and
try to take advantage of the newly defined secondary
Packet zone.
- netstat(1) and systat(1) quickly hacked up to do basic
stat reporting but additional stats work needs to be
done once some other details within UMA have been taken
care of and it becomes clearer to how stats will work
within the modified framework.
From the user perspective, one implication is that the
NMBCLUSTERS compile-time option is no longer used. The
maximum number of clusters is still capped off according
to maxusers, but it can be made unlimited by setting
the kern.ipc.nmbclusters boot-time tunable to zero.
Work should be done to write an appropriate sysctl
handler allowing dynamic tuning of kern.ipc.nmbclusters
at runtime.
Additional things worth noting/known issues (READ):
- One report of 'ips' (ServeRAID) driver acting really
slow in conjunction with mbuma. Need more data.
Latest report is that ips is equally sucking with
and without mbuma.
- Giant leak in NFS code sometimes occurs, can't
reproduce but currently analyzing; brueffer is
able to reproduce but THIS IS NOT an mbuma-specific
problem and currently occurs even WITHOUT mbuma.
- Issues in network locking: there is at least one
code path in the rip code where one or more locks
are acquired and we end up in m_prepend() with
M_WAITOK, which causes WITNESS to whine from within
UMA. Current temporary solution: force all UMA
allocations to be M_NOWAIT from within UMA for now
to avoid deadlocks unless WITNESS is defined and we
can determine with certainty that we're not holding
any locks when we're M_WAITOK.
- I've seen at least one weird socketbuffer empty-but-
mbuf-still-attached panic. I don't believe this
to be related to mbuma but please keep your eyes
open, turn on debugging, and capture crash dumps.
This change removes more code than it adds.
A paper is available detailing the change and considering
various performance issues, it was presented at BSDCan2004:
http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf
Please read the paper for Future Work and implementation
details, as well as credits.
Testing and Debugging:
rwatson,
brueffer,
Ketrien I. Saihr-Kesenchedra,
...
Reviewed by: Lots of people (for different parts)
2004-05-31 21:46:06 +00:00
|
|
|
|
2007-04-04 00:31:49 +00:00
|
|
|
static __inline int
|
|
|
|
m_gettype(int size)
|
|
|
|
{
|
|
|
|
int type;
|
|
|
|
|
|
|
|
switch (size) {
|
|
|
|
case MSIZE:
|
|
|
|
type = EXT_MBUF;
|
|
|
|
break;
|
|
|
|
case MCLBYTES:
|
|
|
|
type = EXT_CLUSTER;
|
|
|
|
break;
|
|
|
|
#if MJUMPAGESIZE != MCLBYTES
|
|
|
|
case MJUMPAGESIZE:
|
|
|
|
type = EXT_JUMBOP;
|
|
|
|
break;
|
|
|
|
#endif
|
|
|
|
case MJUM9BYTES:
|
|
|
|
type = EXT_JUMBO9;
|
|
|
|
break;
|
|
|
|
case MJUM16BYTES:
|
|
|
|
type = EXT_JUMBO16;
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
panic("%s: m_getjcl: invalid cluster size", __func__);
|
|
|
|
}
|
|
|
|
|
|
|
|
return (type);
|
|
|
|
}
|
|
|
|
|
|
|
|
static __inline uma_zone_t
|
|
|
|
m_getzone(int size)
|
|
|
|
{
|
|
|
|
uma_zone_t zone;
|
|
|
|
|
|
|
|
switch (size) {
|
|
|
|
case MSIZE:
|
|
|
|
zone = zone_mbuf;
|
|
|
|
break;
|
|
|
|
case MCLBYTES:
|
|
|
|
zone = zone_clust;
|
|
|
|
break;
|
|
|
|
#if MJUMPAGESIZE != MCLBYTES
|
|
|
|
case MJUMPAGESIZE:
|
|
|
|
zone = zone_jumbop;
|
|
|
|
break;
|
|
|
|
#endif
|
|
|
|
case MJUM9BYTES:
|
|
|
|
zone = zone_jumbo9;
|
|
|
|
break;
|
|
|
|
case MJUM16BYTES:
|
|
|
|
zone = zone_jumbo16;
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
panic("%s: m_getjcl: invalid cluster type", __func__);
|
|
|
|
}
|
|
|
|
|
|
|
|
return (zone);
|
|
|
|
}
|
|
|
|
|
2009-06-19 21:14:39 +00:00
|
|
|
/*
|
|
|
|
* Initialize an mbuf with linear storage.
|
|
|
|
*
|
|
|
|
* Inline because the consumer text overhead will be roughly the same to
|
|
|
|
* initialize or call a function with this many parameters and M_PKTHDR
|
|
|
|
* should go away with constant propagation for !MGETHDR.
|
|
|
|
*/
|
|
|
|
static __inline int
|
|
|
|
m_init(struct mbuf *m, uma_zone_t zone, int size, int how, short type,
|
|
|
|
int flags)
|
|
|
|
{
|
|
|
|
int error;
|
|
|
|
|
|
|
|
m->m_next = NULL;
|
|
|
|
m->m_nextpkt = NULL;
|
|
|
|
m->m_data = m->m_dat;
|
|
|
|
m->m_len = 0;
|
|
|
|
m->m_flags = flags;
|
|
|
|
m->m_type = type;
|
|
|
|
if (flags & M_PKTHDR) {
|
|
|
|
if ((error = m_pkthdr_init(m, how)) != 0)
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
2007-03-24 20:19:44 +00:00
|
|
|
static __inline struct mbuf *
|
Bring in mbuma to replace mballoc.
mbuma is an Mbuf & Cluster allocator built on top of a number of
extensions to the UMA framework, all included herein.
Extensions to UMA worth noting:
- Better layering between slab <-> zone caches; introduce
Keg structure which splits off slab cache away from the
zone structure and allows multiple zones to be stacked
on top of a single Keg (single type of slab cache);
perhaps we should look into defining a subset API on
top of the Keg for special use by malloc(9),
for example.
- UMA_ZONE_REFCNT zones can now be added, and reference
counters automagically allocated for them within the end
of the associated slab structures. uma_find_refcnt()
does a kextract to fetch the slab struct reference from
the underlying page, and lookup the corresponding refcnt.
mbuma things worth noting:
- integrates mbuf & cluster allocations with extended UMA
and provides caches for commonly-allocated items; defines
several zones (two primary, one secondary) and two kegs.
- change up certain code paths that always used to do:
m_get() + m_clget() to instead just use m_getcl() and
try to take advantage of the newly defined secondary
Packet zone.
- netstat(1) and systat(1) quickly hacked up to do basic
stat reporting but additional stats work needs to be
done once some other details within UMA have been taken
care of and it becomes clearer to how stats will work
within the modified framework.
From the user perspective, one implication is that the
NMBCLUSTERS compile-time option is no longer used. The
maximum number of clusters is still capped off according
to maxusers, but it can be made unlimited by setting
the kern.ipc.nmbclusters boot-time tunable to zero.
Work should be done to write an appropriate sysctl
handler allowing dynamic tuning of kern.ipc.nmbclusters
at runtime.
Additional things worth noting/known issues (READ):
- One report of 'ips' (ServeRAID) driver acting really
slow in conjunction with mbuma. Need more data.
Latest report is that ips is equally sucking with
and without mbuma.
- Giant leak in NFS code sometimes occurs, can't
reproduce but currently analyzing; brueffer is
able to reproduce but THIS IS NOT an mbuma-specific
problem and currently occurs even WITHOUT mbuma.
- Issues in network locking: there is at least one
code path in the rip code where one or more locks
are acquired and we end up in m_prepend() with
M_WAITOK, which causes WITNESS to whine from within
UMA. Current temporary solution: force all UMA
allocations to be M_NOWAIT from within UMA for now
to avoid deadlocks unless WITNESS is defined and we
can determine with certainty that we're not holding
any locks when we're M_WAITOK.
- I've seen at least one weird socketbuffer empty-but-
mbuf-still-attached panic. I don't believe this
to be related to mbuma but please keep your eyes
open, turn on debugging, and capture crash dumps.
This change removes more code than it adds.
A paper is available detailing the change and considering
various performance issues, it was presented at BSDCan2004:
http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf
Please read the paper for Future Work and implementation
details, as well as credits.
Testing and Debugging:
rwatson,
brueffer,
Ketrien I. Saihr-Kesenchedra,
...
Reviewed by: Lots of people (for different parts)
2004-05-31 21:46:06 +00:00
|
|
|
m_get(int how, short type)
|
|
|
|
{
|
|
|
|
struct mb_args args;
|
|
|
|
|
|
|
|
args.flags = 0;
|
|
|
|
args.type = type;
|
2007-03-24 20:19:44 +00:00
|
|
|
return ((struct mbuf *)(uma_zalloc_arg(zone_mbuf, &args, how)));
|
Bring in mbuma to replace mballoc.
mbuma is an Mbuf & Cluster allocator built on top of a number of
extensions to the UMA framework, all included herein.
Extensions to UMA worth noting:
- Better layering between slab <-> zone caches; introduce
Keg structure which splits off slab cache away from the
zone structure and allows multiple zones to be stacked
on top of a single Keg (single type of slab cache);
perhaps we should look into defining a subset API on
top of the Keg for special use by malloc(9),
for example.
- UMA_ZONE_REFCNT zones can now be added, and reference
counters automagically allocated for them within the end
of the associated slab structures. uma_find_refcnt()
does a kextract to fetch the slab struct reference from
the underlying page, and lookup the corresponding refcnt.
mbuma things worth noting:
- integrates mbuf & cluster allocations with extended UMA
and provides caches for commonly-allocated items; defines
several zones (two primary, one secondary) and two kegs.
- change up certain code paths that always used to do:
m_get() + m_clget() to instead just use m_getcl() and
try to take advantage of the newly defined secondary
Packet zone.
- netstat(1) and systat(1) quickly hacked up to do basic
stat reporting but additional stats work needs to be
done once some other details within UMA have been taken
care of and it becomes clearer to how stats will work
within the modified framework.
From the user perspective, one implication is that the
NMBCLUSTERS compile-time option is no longer used. The
maximum number of clusters is still capped off according
to maxusers, but it can be made unlimited by setting
the kern.ipc.nmbclusters boot-time tunable to zero.
Work should be done to write an appropriate sysctl
handler allowing dynamic tuning of kern.ipc.nmbclusters
at runtime.
Additional things worth noting/known issues (READ):
- One report of 'ips' (ServeRAID) driver acting really
slow in conjunction with mbuma. Need more data.
Latest report is that ips is equally sucking with
and without mbuma.
- Giant leak in NFS code sometimes occurs, can't
reproduce but currently analyzing; brueffer is
able to reproduce but THIS IS NOT an mbuma-specific
problem and currently occurs even WITHOUT mbuma.
- Issues in network locking: there is at least one
code path in the rip code where one or more locks
are acquired and we end up in m_prepend() with
M_WAITOK, which causes WITNESS to whine from within
UMA. Current temporary solution: force all UMA
allocations to be M_NOWAIT from within UMA for now
to avoid deadlocks unless WITNESS is defined and we
can determine with certainty that we're not holding
any locks when we're M_WAITOK.
- I've seen at least one weird socketbuffer empty-but-
mbuf-still-attached panic. I don't believe this
to be related to mbuma but please keep your eyes
open, turn on debugging, and capture crash dumps.
This change removes more code than it adds.
A paper is available detailing the change and considering
various performance issues, it was presented at BSDCan2004:
http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf
Please read the paper for Future Work and implementation
details, as well as credits.
Testing and Debugging:
rwatson,
brueffer,
Ketrien I. Saihr-Kesenchedra,
...
Reviewed by: Lots of people (for different parts)
2004-05-31 21:46:06 +00:00
|
|
|
}
|
|
|
|
|
2007-03-24 20:19:44 +00:00
|
|
|
/*
|
|
|
|
* XXX This should be deprecated, very little use.
|
|
|
|
*/
|
|
|
|
static __inline struct mbuf *
|
Bring in mbuma to replace mballoc.
mbuma is an Mbuf & Cluster allocator built on top of a number of
extensions to the UMA framework, all included herein.
Extensions to UMA worth noting:
- Better layering between slab <-> zone caches; introduce
Keg structure which splits off slab cache away from the
zone structure and allows multiple zones to be stacked
on top of a single Keg (single type of slab cache);
perhaps we should look into defining a subset API on
top of the Keg for special use by malloc(9),
for example.
- UMA_ZONE_REFCNT zones can now be added, and reference
counters automagically allocated for them within the end
of the associated slab structures. uma_find_refcnt()
does a kextract to fetch the slab struct reference from
the underlying page, and lookup the corresponding refcnt.
mbuma things worth noting:
- integrates mbuf & cluster allocations with extended UMA
and provides caches for commonly-allocated items; defines
several zones (two primary, one secondary) and two kegs.
- change up certain code paths that always used to do:
m_get() + m_clget() to instead just use m_getcl() and
try to take advantage of the newly defined secondary
Packet zone.
- netstat(1) and systat(1) quickly hacked up to do basic
stat reporting but additional stats work needs to be
done once some other details within UMA have been taken
care of and it becomes clearer to how stats will work
within the modified framework.
From the user perspective, one implication is that the
NMBCLUSTERS compile-time option is no longer used. The
maximum number of clusters is still capped off according
to maxusers, but it can be made unlimited by setting
the kern.ipc.nmbclusters boot-time tunable to zero.
Work should be done to write an appropriate sysctl
handler allowing dynamic tuning of kern.ipc.nmbclusters
at runtime.
Additional things worth noting/known issues (READ):
- One report of 'ips' (ServeRAID) driver acting really
slow in conjunction with mbuma. Need more data.
Latest report is that ips is equally sucking with
and without mbuma.
- Giant leak in NFS code sometimes occurs, can't
reproduce but currently analyzing; brueffer is
able to reproduce but THIS IS NOT an mbuma-specific
problem and currently occurs even WITHOUT mbuma.
- Issues in network locking: there is at least one
code path in the rip code where one or more locks
are acquired and we end up in m_prepend() with
M_WAITOK, which causes WITNESS to whine from within
UMA. Current temporary solution: force all UMA
allocations to be M_NOWAIT from within UMA for now
to avoid deadlocks unless WITNESS is defined and we
can determine with certainty that we're not holding
any locks when we're M_WAITOK.
- I've seen at least one weird socketbuffer empty-but-
mbuf-still-attached panic. I don't believe this
to be related to mbuma but please keep your eyes
open, turn on debugging, and capture crash dumps.
This change removes more code than it adds.
A paper is available detailing the change and considering
various performance issues, it was presented at BSDCan2004:
http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf
Please read the paper for Future Work and implementation
details, as well as credits.
Testing and Debugging:
rwatson,
brueffer,
Ketrien I. Saihr-Kesenchedra,
...
Reviewed by: Lots of people (for different parts)
2004-05-31 21:46:06 +00:00
|
|
|
m_getclr(int how, short type)
|
|
|
|
{
|
|
|
|
struct mbuf *m;
|
|
|
|
struct mb_args args;
|
|
|
|
|
|
|
|
args.flags = 0;
|
|
|
|
args.type = type;
|
|
|
|
m = uma_zalloc_arg(zone_mbuf, &args, how);
|
|
|
|
if (m != NULL)
|
|
|
|
bzero(m->m_data, MLEN);
|
2007-03-24 20:19:44 +00:00
|
|
|
return (m);
|
Bring in mbuma to replace mballoc.
mbuma is an Mbuf & Cluster allocator built on top of a number of
extensions to the UMA framework, all included herein.
Extensions to UMA worth noting:
- Better layering between slab <-> zone caches; introduce
Keg structure which splits off slab cache away from the
zone structure and allows multiple zones to be stacked
on top of a single Keg (single type of slab cache);
perhaps we should look into defining a subset API on
top of the Keg for special use by malloc(9),
for example.
- UMA_ZONE_REFCNT zones can now be added, and reference
counters automagically allocated for them within the end
of the associated slab structures. uma_find_refcnt()
does a kextract to fetch the slab struct reference from
the underlying page, and lookup the corresponding refcnt.
mbuma things worth noting:
- integrates mbuf & cluster allocations with extended UMA
and provides caches for commonly-allocated items; defines
several zones (two primary, one secondary) and two kegs.
- change up certain code paths that always used to do:
m_get() + m_clget() to instead just use m_getcl() and
try to take advantage of the newly defined secondary
Packet zone.
- netstat(1) and systat(1) quickly hacked up to do basic
stat reporting but additional stats work needs to be
done once some other details within UMA have been taken
care of and it becomes clearer to how stats will work
within the modified framework.
From the user perspective, one implication is that the
NMBCLUSTERS compile-time option is no longer used. The
maximum number of clusters is still capped off according
to maxusers, but it can be made unlimited by setting
the kern.ipc.nmbclusters boot-time tunable to zero.
Work should be done to write an appropriate sysctl
handler allowing dynamic tuning of kern.ipc.nmbclusters
at runtime.
Additional things worth noting/known issues (READ):
- One report of 'ips' (ServeRAID) driver acting really
slow in conjunction with mbuma. Need more data.
Latest report is that ips is equally sucking with
and without mbuma.
- Giant leak in NFS code sometimes occurs, can't
reproduce but currently analyzing; brueffer is
able to reproduce but THIS IS NOT an mbuma-specific
problem and currently occurs even WITHOUT mbuma.
- Issues in network locking: there is at least one
code path in the rip code where one or more locks
are acquired and we end up in m_prepend() with
M_WAITOK, which causes WITNESS to whine from within
UMA. Current temporary solution: force all UMA
allocations to be M_NOWAIT from within UMA for now
to avoid deadlocks unless WITNESS is defined and we
can determine with certainty that we're not holding
any locks when we're M_WAITOK.
- I've seen at least one weird socketbuffer empty-but-
mbuf-still-attached panic. I don't believe this
to be related to mbuma but please keep your eyes
open, turn on debugging, and capture crash dumps.
This change removes more code than it adds.
A paper is available detailing the change and considering
various performance issues, it was presented at BSDCan2004:
http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf
Please read the paper for Future Work and implementation
details, as well as credits.
Testing and Debugging:
rwatson,
brueffer,
Ketrien I. Saihr-Kesenchedra,
...
Reviewed by: Lots of people (for different parts)
2004-05-31 21:46:06 +00:00
|
|
|
}
|
|
|
|
|
2007-03-24 20:19:44 +00:00
|
|
|
static __inline struct mbuf *
|
Bring in mbuma to replace mballoc.
mbuma is an Mbuf & Cluster allocator built on top of a number of
extensions to the UMA framework, all included herein.
Extensions to UMA worth noting:
- Better layering between slab <-> zone caches; introduce
Keg structure which splits off slab cache away from the
zone structure and allows multiple zones to be stacked
on top of a single Keg (single type of slab cache);
perhaps we should look into defining a subset API on
top of the Keg for special use by malloc(9),
for example.
- UMA_ZONE_REFCNT zones can now be added, and reference
counters automagically allocated for them within the end
of the associated slab structures. uma_find_refcnt()
does a kextract to fetch the slab struct reference from
the underlying page, and lookup the corresponding refcnt.
mbuma things worth noting:
- integrates mbuf & cluster allocations with extended UMA
and provides caches for commonly-allocated items; defines
several zones (two primary, one secondary) and two kegs.
- change up certain code paths that always used to do:
m_get() + m_clget() to instead just use m_getcl() and
try to take advantage of the newly defined secondary
Packet zone.
- netstat(1) and systat(1) quickly hacked up to do basic
stat reporting but additional stats work needs to be
done once some other details within UMA have been taken
care of and it becomes clearer to how stats will work
within the modified framework.
From the user perspective, one implication is that the
NMBCLUSTERS compile-time option is no longer used. The
maximum number of clusters is still capped off according
to maxusers, but it can be made unlimited by setting
the kern.ipc.nmbclusters boot-time tunable to zero.
Work should be done to write an appropriate sysctl
handler allowing dynamic tuning of kern.ipc.nmbclusters
at runtime.
Additional things worth noting/known issues (READ):
- One report of 'ips' (ServeRAID) driver acting really
slow in conjunction with mbuma. Need more data.
Latest report is that ips is equally sucking with
and without mbuma.
- Giant leak in NFS code sometimes occurs, can't
reproduce but currently analyzing; brueffer is
able to reproduce but THIS IS NOT an mbuma-specific
problem and currently occurs even WITHOUT mbuma.
- Issues in network locking: there is at least one
code path in the rip code where one or more locks
are acquired and we end up in m_prepend() with
M_WAITOK, which causes WITNESS to whine from within
UMA. Current temporary solution: force all UMA
allocations to be M_NOWAIT from within UMA for now
to avoid deadlocks unless WITNESS is defined and we
can determine with certainty that we're not holding
any locks when we're M_WAITOK.
- I've seen at least one weird socketbuffer empty-but-
mbuf-still-attached panic. I don't believe this
to be related to mbuma but please keep your eyes
open, turn on debugging, and capture crash dumps.
This change removes more code than it adds.
A paper is available detailing the change and considering
various performance issues, it was presented at BSDCan2004:
http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf
Please read the paper for Future Work and implementation
details, as well as credits.
Testing and Debugging:
rwatson,
brueffer,
Ketrien I. Saihr-Kesenchedra,
...
Reviewed by: Lots of people (for different parts)
2004-05-31 21:46:06 +00:00
|
|
|
m_gethdr(int how, short type)
|
|
|
|
{
|
|
|
|
struct mb_args args;
|
|
|
|
|
|
|
|
args.flags = M_PKTHDR;
|
|
|
|
args.type = type;
|
2007-03-24 20:19:44 +00:00
|
|
|
return ((struct mbuf *)(uma_zalloc_arg(zone_mbuf, &args, how)));
|
Bring in mbuma to replace mballoc.
mbuma is an Mbuf & Cluster allocator built on top of a number of
extensions to the UMA framework, all included herein.
Extensions to UMA worth noting:
- Better layering between slab <-> zone caches; introduce
Keg structure which splits off slab cache away from the
zone structure and allows multiple zones to be stacked
on top of a single Keg (single type of slab cache);
perhaps we should look into defining a subset API on
top of the Keg for special use by malloc(9),
for example.
- UMA_ZONE_REFCNT zones can now be added, and reference
counters automagically allocated for them within the end
of the associated slab structures. uma_find_refcnt()
does a kextract to fetch the slab struct reference from
the underlying page, and lookup the corresponding refcnt.
mbuma things worth noting:
- integrates mbuf & cluster allocations with extended UMA
and provides caches for commonly-allocated items; defines
several zones (two primary, one secondary) and two kegs.
- change up certain code paths that always used to do:
m_get() + m_clget() to instead just use m_getcl() and
try to take advantage of the newly defined secondary
Packet zone.
- netstat(1) and systat(1) quickly hacked up to do basic
stat reporting but additional stats work needs to be
done once some other details within UMA have been taken
care of and it becomes clearer to how stats will work
within the modified framework.
From the user perspective, one implication is that the
NMBCLUSTERS compile-time option is no longer used. The
maximum number of clusters is still capped off according
to maxusers, but it can be made unlimited by setting
the kern.ipc.nmbclusters boot-time tunable to zero.
Work should be done to write an appropriate sysctl
handler allowing dynamic tuning of kern.ipc.nmbclusters
at runtime.
Additional things worth noting/known issues (READ):
- One report of 'ips' (ServeRAID) driver acting really
slow in conjunction with mbuma. Need more data.
Latest report is that ips is equally sucking with
and without mbuma.
- Giant leak in NFS code sometimes occurs, can't
reproduce but currently analyzing; brueffer is
able to reproduce but THIS IS NOT an mbuma-specific
problem and currently occurs even WITHOUT mbuma.
- Issues in network locking: there is at least one
code path in the rip code where one or more locks
are acquired and we end up in m_prepend() with
M_WAITOK, which causes WITNESS to whine from within
UMA. Current temporary solution: force all UMA
allocations to be M_NOWAIT from within UMA for now
to avoid deadlocks unless WITNESS is defined and we
can determine with certainty that we're not holding
any locks when we're M_WAITOK.
- I've seen at least one weird socketbuffer empty-but-
mbuf-still-attached panic. I don't believe this
to be related to mbuma but please keep your eyes
open, turn on debugging, and capture crash dumps.
This change removes more code than it adds.
A paper is available detailing the change and considering
various performance issues, it was presented at BSDCan2004:
http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf
Please read the paper for Future Work and implementation
details, as well as credits.
Testing and Debugging:
rwatson,
brueffer,
Ketrien I. Saihr-Kesenchedra,
...
Reviewed by: Lots of people (for different parts)
2004-05-31 21:46:06 +00:00
|
|
|
}
|
|
|
|
|
2007-03-24 20:19:44 +00:00
|
|
|
static __inline struct mbuf *
|
Bring in mbuma to replace mballoc.
mbuma is an Mbuf & Cluster allocator built on top of a number of
extensions to the UMA framework, all included herein.
Extensions to UMA worth noting:
- Better layering between slab <-> zone caches; introduce
Keg structure which splits off slab cache away from the
zone structure and allows multiple zones to be stacked
on top of a single Keg (single type of slab cache);
perhaps we should look into defining a subset API on
top of the Keg for special use by malloc(9),
for example.
- UMA_ZONE_REFCNT zones can now be added, and reference
counters automagically allocated for them within the end
of the associated slab structures. uma_find_refcnt()
does a kextract to fetch the slab struct reference from
the underlying page, and lookup the corresponding refcnt.
mbuma things worth noting:
- integrates mbuf & cluster allocations with extended UMA
and provides caches for commonly-allocated items; defines
several zones (two primary, one secondary) and two kegs.
- change up certain code paths that always used to do:
m_get() + m_clget() to instead just use m_getcl() and
try to take advantage of the newly defined secondary
Packet zone.
- netstat(1) and systat(1) quickly hacked up to do basic
stat reporting but additional stats work needs to be
done once some other details within UMA have been taken
care of and it becomes clearer to how stats will work
within the modified framework.
From the user perspective, one implication is that the
NMBCLUSTERS compile-time option is no longer used. The
maximum number of clusters is still capped off according
to maxusers, but it can be made unlimited by setting
the kern.ipc.nmbclusters boot-time tunable to zero.
Work should be done to write an appropriate sysctl
handler allowing dynamic tuning of kern.ipc.nmbclusters
at runtime.
Additional things worth noting/known issues (READ):
- One report of 'ips' (ServeRAID) driver acting really
slow in conjunction with mbuma. Need more data.
Latest report is that ips is equally sucking with
and without mbuma.
- Giant leak in NFS code sometimes occurs, can't
reproduce but currently analyzing; brueffer is
able to reproduce but THIS IS NOT an mbuma-specific
problem and currently occurs even WITHOUT mbuma.
- Issues in network locking: there is at least one
code path in the rip code where one or more locks
are acquired and we end up in m_prepend() with
M_WAITOK, which causes WITNESS to whine from within
UMA. Current temporary solution: force all UMA
allocations to be M_NOWAIT from within UMA for now
to avoid deadlocks unless WITNESS is defined and we
can determine with certainty that we're not holding
any locks when we're M_WAITOK.
- I've seen at least one weird socketbuffer empty-but-
mbuf-still-attached panic. I don't believe this
to be related to mbuma but please keep your eyes
open, turn on debugging, and capture crash dumps.
This change removes more code than it adds.
A paper is available detailing the change and considering
various performance issues, it was presented at BSDCan2004:
http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf
Please read the paper for Future Work and implementation
details, as well as credits.
Testing and Debugging:
rwatson,
brueffer,
Ketrien I. Saihr-Kesenchedra,
...
Reviewed by: Lots of people (for different parts)
2004-05-31 21:46:06 +00:00
|
|
|
m_getcl(int how, short type, int flags)
|
|
|
|
{
|
|
|
|
struct mb_args args;
|
|
|
|
|
|
|
|
args.flags = flags;
|
|
|
|
args.type = type;
|
2007-03-24 20:19:44 +00:00
|
|
|
return ((struct mbuf *)(uma_zalloc_arg(zone_pack, &args, how)));
|
Bring in mbuma to replace mballoc.
mbuma is an Mbuf & Cluster allocator built on top of a number of
extensions to the UMA framework, all included herein.
Extensions to UMA worth noting:
- Better layering between slab <-> zone caches; introduce
Keg structure which splits off slab cache away from the
zone structure and allows multiple zones to be stacked
on top of a single Keg (single type of slab cache);
perhaps we should look into defining a subset API on
top of the Keg for special use by malloc(9),
for example.
- UMA_ZONE_REFCNT zones can now be added, and reference
counters automagically allocated for them within the end
of the associated slab structures. uma_find_refcnt()
does a kextract to fetch the slab struct reference from
the underlying page, and lookup the corresponding refcnt.
mbuma things worth noting:
- integrates mbuf & cluster allocations with extended UMA
and provides caches for commonly-allocated items; defines
several zones (two primary, one secondary) and two kegs.
- change up certain code paths that always used to do:
m_get() + m_clget() to instead just use m_getcl() and
try to take advantage of the newly defined secondary
Packet zone.
- netstat(1) and systat(1) quickly hacked up to do basic
stat reporting but additional stats work needs to be
done once some other details within UMA have been taken
care of and it becomes clearer to how stats will work
within the modified framework.
From the user perspective, one implication is that the
NMBCLUSTERS compile-time option is no longer used. The
maximum number of clusters is still capped off according
to maxusers, but it can be made unlimited by setting
the kern.ipc.nmbclusters boot-time tunable to zero.
Work should be done to write an appropriate sysctl
handler allowing dynamic tuning of kern.ipc.nmbclusters
at runtime.
Additional things worth noting/known issues (READ):
- One report of 'ips' (ServeRAID) driver acting really
slow in conjunction with mbuma. Need more data.
Latest report is that ips is equally sucking with
and without mbuma.
- Giant leak in NFS code sometimes occurs, can't
reproduce but currently analyzing; brueffer is
able to reproduce but THIS IS NOT an mbuma-specific
problem and currently occurs even WITHOUT mbuma.
- Issues in network locking: there is at least one
code path in the rip code where one or more locks
are acquired and we end up in m_prepend() with
M_WAITOK, which causes WITNESS to whine from within
UMA. Current temporary solution: force all UMA
allocations to be M_NOWAIT from within UMA for now
to avoid deadlocks unless WITNESS is defined and we
can determine with certainty that we're not holding
any locks when we're M_WAITOK.
- I've seen at least one weird socketbuffer empty-but-
mbuf-still-attached panic. I don't believe this
to be related to mbuma but please keep your eyes
open, turn on debugging, and capture crash dumps.
This change removes more code than it adds.
A paper is available detailing the change and considering
various performance issues, it was presented at BSDCan2004:
http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf
Please read the paper for Future Work and implementation
details, as well as credits.
Testing and Debugging:
rwatson,
brueffer,
Ketrien I. Saihr-Kesenchedra,
...
Reviewed by: Lots of people (for different parts)
2004-05-31 21:46:06 +00:00
|
|
|
}
|
|
|
|
|
2005-12-08 13:13:06 +00:00
|
|
|
/*
|
|
|
|
* m_getjcl() returns an mbuf with a cluster of the specified size attached.
|
2006-02-17 14:14:15 +00:00
|
|
|
* For size it takes MCLBYTES, MJUMPAGESIZE, MJUM9BYTES, MJUM16BYTES.
|
2007-03-24 20:19:44 +00:00
|
|
|
*
|
|
|
|
* XXX: This is rather large, should be real function maybe.
|
2005-12-08 13:13:06 +00:00
|
|
|
*/
|
2007-03-24 20:19:44 +00:00
|
|
|
static __inline struct mbuf *
|
2005-12-08 13:13:06 +00:00
|
|
|
m_getjcl(int how, short type, int flags, int size)
|
|
|
|
{
|
|
|
|
struct mb_args args;
|
|
|
|
struct mbuf *m, *n;
|
|
|
|
uma_zone_t zone;
|
|
|
|
|
2010-05-07 22:09:17 +00:00
|
|
|
if (size == MCLBYTES)
|
|
|
|
return m_getcl(how, type, flags);
|
|
|
|
|
2005-12-08 13:13:06 +00:00
|
|
|
args.flags = flags;
|
|
|
|
args.type = type;
|
|
|
|
|
|
|
|
m = uma_zalloc_arg(zone_mbuf, &args, how);
|
|
|
|
if (m == NULL)
|
2007-03-24 20:19:44 +00:00
|
|
|
return (NULL);
|
2005-12-08 13:13:06 +00:00
|
|
|
|
2007-04-04 00:31:49 +00:00
|
|
|
zone = m_getzone(size);
|
2005-12-08 13:13:06 +00:00
|
|
|
n = uma_zalloc_arg(zone, m, how);
|
2006-01-20 13:43:11 +00:00
|
|
|
if (n == NULL) {
|
2005-12-08 13:13:06 +00:00
|
|
|
uma_zfree(zone_mbuf, m);
|
2007-03-24 20:19:44 +00:00
|
|
|
return (NULL);
|
2006-01-20 13:43:11 +00:00
|
|
|
}
|
2007-03-24 20:19:44 +00:00
|
|
|
return (m);
|
2005-12-08 13:13:06 +00:00
|
|
|
}
|
|
|
|
|
2007-10-06 21:13:55 +00:00
|
|
|
static __inline void
|
|
|
|
m_free_fast(struct mbuf *m)
|
|
|
|
{
|
2008-01-09 06:29:49 +00:00
|
|
|
#ifdef INVARIANTS
|
|
|
|
if (m->m_flags & M_PKTHDR)
|
|
|
|
KASSERT(SLIST_EMPTY(&m->m_pkthdr.tags), ("doing fast free of mbuf with tags"));
|
|
|
|
#endif
|
|
|
|
|
2007-10-06 21:13:55 +00:00
|
|
|
uma_zfree_arg(zone_mbuf, m, (void *)MB_NOTAGS);
|
|
|
|
}
|
|
|
|
|
2007-03-24 20:19:44 +00:00
|
|
|
static __inline struct mbuf *
|
Bring in mbuma to replace mballoc.
mbuma is an Mbuf & Cluster allocator built on top of a number of
extensions to the UMA framework, all included herein.
Extensions to UMA worth noting:
- Better layering between slab <-> zone caches; introduce
Keg structure which splits off slab cache away from the
zone structure and allows multiple zones to be stacked
on top of a single Keg (single type of slab cache);
perhaps we should look into defining a subset API on
top of the Keg for special use by malloc(9),
for example.
- UMA_ZONE_REFCNT zones can now be added, and reference
counters automagically allocated for them within the end
of the associated slab structures. uma_find_refcnt()
does a kextract to fetch the slab struct reference from
the underlying page, and lookup the corresponding refcnt.
mbuma things worth noting:
- integrates mbuf & cluster allocations with extended UMA
and provides caches for commonly-allocated items; defines
several zones (two primary, one secondary) and two kegs.
- change up certain code paths that always used to do:
m_get() + m_clget() to instead just use m_getcl() and
try to take advantage of the newly defined secondary
Packet zone.
- netstat(1) and systat(1) quickly hacked up to do basic
stat reporting but additional stats work needs to be
done once some other details within UMA have been taken
care of and it becomes clearer to how stats will work
within the modified framework.
From the user perspective, one implication is that the
NMBCLUSTERS compile-time option is no longer used. The
maximum number of clusters is still capped off according
to maxusers, but it can be made unlimited by setting
the kern.ipc.nmbclusters boot-time tunable to zero.
Work should be done to write an appropriate sysctl
handler allowing dynamic tuning of kern.ipc.nmbclusters
at runtime.
Additional things worth noting/known issues (READ):
- One report of 'ips' (ServeRAID) driver acting really
slow in conjunction with mbuma. Need more data.
Latest report is that ips is equally sucking with
and without mbuma.
- Giant leak in NFS code sometimes occurs, can't
reproduce but currently analyzing; brueffer is
able to reproduce but THIS IS NOT an mbuma-specific
problem and currently occurs even WITHOUT mbuma.
- Issues in network locking: there is at least one
code path in the rip code where one or more locks
are acquired and we end up in m_prepend() with
M_WAITOK, which causes WITNESS to whine from within
UMA. Current temporary solution: force all UMA
allocations to be M_NOWAIT from within UMA for now
to avoid deadlocks unless WITNESS is defined and we
can determine with certainty that we're not holding
any locks when we're M_WAITOK.
- I've seen at least one weird socketbuffer empty-but-
mbuf-still-attached panic. I don't believe this
to be related to mbuma but please keep your eyes
open, turn on debugging, and capture crash dumps.
This change removes more code than it adds.
A paper is available detailing the change and considering
various performance issues, it was presented at BSDCan2004:
http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf
Please read the paper for Future Work and implementation
details, as well as credits.
Testing and Debugging:
rwatson,
brueffer,
Ketrien I. Saihr-Kesenchedra,
...
Reviewed by: Lots of people (for different parts)
2004-05-31 21:46:06 +00:00
|
|
|
m_free(struct mbuf *m)
|
|
|
|
{
|
|
|
|
struct mbuf *n = m->m_next;
|
|
|
|
|
|
|
|
if (m->m_flags & M_EXT)
|
|
|
|
mb_free_ext(m);
|
2007-10-06 21:42:39 +00:00
|
|
|
else if ((m->m_flags & M_NOFREE) == 0)
|
Bring in mbuma to replace mballoc.
mbuma is an Mbuf & Cluster allocator built on top of a number of
extensions to the UMA framework, all included herein.
Extensions to UMA worth noting:
- Better layering between slab <-> zone caches; introduce
Keg structure which splits off slab cache away from the
zone structure and allows multiple zones to be stacked
on top of a single Keg (single type of slab cache);
perhaps we should look into defining a subset API on
top of the Keg for special use by malloc(9),
for example.
- UMA_ZONE_REFCNT zones can now be added, and reference
counters automagically allocated for them within the end
of the associated slab structures. uma_find_refcnt()
does a kextract to fetch the slab struct reference from
the underlying page, and lookup the corresponding refcnt.
mbuma things worth noting:
- integrates mbuf & cluster allocations with extended UMA
and provides caches for commonly-allocated items; defines
several zones (two primary, one secondary) and two kegs.
- change up certain code paths that always used to do:
m_get() + m_clget() to instead just use m_getcl() and
try to take advantage of the newly defined secondary
Packet zone.
- netstat(1) and systat(1) quickly hacked up to do basic
stat reporting but additional stats work needs to be
done once some other details within UMA have been taken
care of and it becomes clearer to how stats will work
within the modified framework.
From the user perspective, one implication is that the
NMBCLUSTERS compile-time option is no longer used. The
maximum number of clusters is still capped off according
to maxusers, but it can be made unlimited by setting
the kern.ipc.nmbclusters boot-time tunable to zero.
Work should be done to write an appropriate sysctl
handler allowing dynamic tuning of kern.ipc.nmbclusters
at runtime.
Additional things worth noting/known issues (READ):
- One report of 'ips' (ServeRAID) driver acting really
slow in conjunction with mbuma. Need more data.
Latest report is that ips is equally sucking with
and without mbuma.
- Giant leak in NFS code sometimes occurs, can't
reproduce but currently analyzing; brueffer is
able to reproduce but THIS IS NOT an mbuma-specific
problem and currently occurs even WITHOUT mbuma.
- Issues in network locking: there is at least one
code path in the rip code where one or more locks
are acquired and we end up in m_prepend() with
M_WAITOK, which causes WITNESS to whine from within
UMA. Current temporary solution: force all UMA
allocations to be M_NOWAIT from within UMA for now
to avoid deadlocks unless WITNESS is defined and we
can determine with certainty that we're not holding
any locks when we're M_WAITOK.
- I've seen at least one weird socketbuffer empty-but-
mbuf-still-attached panic. I don't believe this
to be related to mbuma but please keep your eyes
open, turn on debugging, and capture crash dumps.
This change removes more code than it adds.
A paper is available detailing the change and considering
various performance issues, it was presented at BSDCan2004:
http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf
Please read the paper for Future Work and implementation
details, as well as credits.
Testing and Debugging:
rwatson,
brueffer,
Ketrien I. Saihr-Kesenchedra,
...
Reviewed by: Lots of people (for different parts)
2004-05-31 21:46:06 +00:00
|
|
|
uma_zfree(zone_mbuf, m);
|
2007-03-24 20:19:44 +00:00
|
|
|
return (n);
|
Bring in mbuma to replace mballoc.
mbuma is an Mbuf & Cluster allocator built on top of a number of
extensions to the UMA framework, all included herein.
Extensions to UMA worth noting:
- Better layering between slab <-> zone caches; introduce
Keg structure which splits off slab cache away from the
zone structure and allows multiple zones to be stacked
on top of a single Keg (single type of slab cache);
perhaps we should look into defining a subset API on
top of the Keg for special use by malloc(9),
for example.
- UMA_ZONE_REFCNT zones can now be added, and reference
counters automagically allocated for them within the end
of the associated slab structures. uma_find_refcnt()
does a kextract to fetch the slab struct reference from
the underlying page, and lookup the corresponding refcnt.
mbuma things worth noting:
- integrates mbuf & cluster allocations with extended UMA
and provides caches for commonly-allocated items; defines
several zones (two primary, one secondary) and two kegs.
- change up certain code paths that always used to do:
m_get() + m_clget() to instead just use m_getcl() and
try to take advantage of the newly defined secondary
Packet zone.
- netstat(1) and systat(1) quickly hacked up to do basic
stat reporting but additional stats work needs to be
done once some other details within UMA have been taken
care of and it becomes clearer to how stats will work
within the modified framework.
From the user perspective, one implication is that the
NMBCLUSTERS compile-time option is no longer used. The
maximum number of clusters is still capped off according
to maxusers, but it can be made unlimited by setting
the kern.ipc.nmbclusters boot-time tunable to zero.
Work should be done to write an appropriate sysctl
handler allowing dynamic tuning of kern.ipc.nmbclusters
at runtime.
Additional things worth noting/known issues (READ):
- One report of 'ips' (ServeRAID) driver acting really
slow in conjunction with mbuma. Need more data.
Latest report is that ips is equally sucking with
and without mbuma.
- Giant leak in NFS code sometimes occurs, can't
reproduce but currently analyzing; brueffer is
able to reproduce but THIS IS NOT an mbuma-specific
problem and currently occurs even WITHOUT mbuma.
- Issues in network locking: there is at least one
code path in the rip code where one or more locks
are acquired and we end up in m_prepend() with
M_WAITOK, which causes WITNESS to whine from within
UMA. Current temporary solution: force all UMA
allocations to be M_NOWAIT from within UMA for now
to avoid deadlocks unless WITNESS is defined and we
can determine with certainty that we're not holding
any locks when we're M_WAITOK.
- I've seen at least one weird socketbuffer empty-but-
mbuf-still-attached panic. I don't believe this
to be related to mbuma but please keep your eyes
open, turn on debugging, and capture crash dumps.
This change removes more code than it adds.
A paper is available detailing the change and considering
various performance issues, it was presented at BSDCan2004:
http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf
Please read the paper for Future Work and implementation
details, as well as credits.
Testing and Debugging:
rwatson,
brueffer,
Ketrien I. Saihr-Kesenchedra,
...
Reviewed by: Lots of people (for different parts)
2004-05-31 21:46:06 +00:00
|
|
|
}
|
|
|
|
|
2007-03-24 20:19:44 +00:00
|
|
|
static __inline void
|
Bring in mbuma to replace mballoc.
mbuma is an Mbuf & Cluster allocator built on top of a number of
extensions to the UMA framework, all included herein.
Extensions to UMA worth noting:
- Better layering between slab <-> zone caches; introduce
Keg structure which splits off slab cache away from the
zone structure and allows multiple zones to be stacked
on top of a single Keg (single type of slab cache);
perhaps we should look into defining a subset API on
top of the Keg for special use by malloc(9),
for example.
- UMA_ZONE_REFCNT zones can now be added, and reference
counters automagically allocated for them within the end
of the associated slab structures. uma_find_refcnt()
does a kextract to fetch the slab struct reference from
the underlying page, and lookup the corresponding refcnt.
mbuma things worth noting:
- integrates mbuf & cluster allocations with extended UMA
and provides caches for commonly-allocated items; defines
several zones (two primary, one secondary) and two kegs.
- change up certain code paths that always used to do:
m_get() + m_clget() to instead just use m_getcl() and
try to take advantage of the newly defined secondary
Packet zone.
- netstat(1) and systat(1) quickly hacked up to do basic
stat reporting but additional stats work needs to be
done once some other details within UMA have been taken
care of and it becomes clearer to how stats will work
within the modified framework.
From the user perspective, one implication is that the
NMBCLUSTERS compile-time option is no longer used. The
maximum number of clusters is still capped off according
to maxusers, but it can be made unlimited by setting
the kern.ipc.nmbclusters boot-time tunable to zero.
Work should be done to write an appropriate sysctl
handler allowing dynamic tuning of kern.ipc.nmbclusters
at runtime.
Additional things worth noting/known issues (READ):
- One report of 'ips' (ServeRAID) driver acting really
slow in conjunction with mbuma. Need more data.
Latest report is that ips is equally sucking with
and without mbuma.
- Giant leak in NFS code sometimes occurs, can't
reproduce but currently analyzing; brueffer is
able to reproduce but THIS IS NOT an mbuma-specific
problem and currently occurs even WITHOUT mbuma.
- Issues in network locking: there is at least one
code path in the rip code where one or more locks
are acquired and we end up in m_prepend() with
M_WAITOK, which causes WITNESS to whine from within
UMA. Current temporary solution: force all UMA
allocations to be M_NOWAIT from within UMA for now
to avoid deadlocks unless WITNESS is defined and we
can determine with certainty that we're not holding
any locks when we're M_WAITOK.
- I've seen at least one weird socketbuffer empty-but-
mbuf-still-attached panic. I don't believe this
to be related to mbuma but please keep your eyes
open, turn on debugging, and capture crash dumps.
This change removes more code than it adds.
A paper is available detailing the change and considering
various performance issues, it was presented at BSDCan2004:
http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf
Please read the paper for Future Work and implementation
details, as well as credits.
Testing and Debugging:
rwatson,
brueffer,
Ketrien I. Saihr-Kesenchedra,
...
Reviewed by: Lots of people (for different parts)
2004-05-31 21:46:06 +00:00
|
|
|
m_clget(struct mbuf *m, int how)
|
|
|
|
{
|
2007-03-24 20:19:44 +00:00
|
|
|
|
2005-11-02 16:20:36 +00:00
|
|
|
if (m->m_flags & M_EXT)
|
|
|
|
printf("%s: %p mbuf already has cluster\n", __func__, m);
|
2006-07-17 09:05:21 +00:00
|
|
|
m->m_ext.ext_buf = (char *)NULL;
|
Bring in mbuma to replace mballoc.
mbuma is an Mbuf & Cluster allocator built on top of a number of
extensions to the UMA framework, all included herein.
Extensions to UMA worth noting:
- Better layering between slab <-> zone caches; introduce
Keg structure which splits off slab cache away from the
zone structure and allows multiple zones to be stacked
on top of a single Keg (single type of slab cache);
perhaps we should look into defining a subset API on
top of the Keg for special use by malloc(9),
for example.
- UMA_ZONE_REFCNT zones can now be added, and reference
counters automagically allocated for them within the end
of the associated slab structures. uma_find_refcnt()
does a kextract to fetch the slab struct reference from
the underlying page, and lookup the corresponding refcnt.
mbuma things worth noting:
- integrates mbuf & cluster allocations with extended UMA
and provides caches for commonly-allocated items; defines
several zones (two primary, one secondary) and two kegs.
- change up certain code paths that always used to do:
m_get() + m_clget() to instead just use m_getcl() and
try to take advantage of the newly defined secondary
Packet zone.
- netstat(1) and systat(1) quickly hacked up to do basic
stat reporting but additional stats work needs to be
done once some other details within UMA have been taken
care of and it becomes clearer to how stats will work
within the modified framework.
From the user perspective, one implication is that the
NMBCLUSTERS compile-time option is no longer used. The
maximum number of clusters is still capped off according
to maxusers, but it can be made unlimited by setting
the kern.ipc.nmbclusters boot-time tunable to zero.
Work should be done to write an appropriate sysctl
handler allowing dynamic tuning of kern.ipc.nmbclusters
at runtime.
Additional things worth noting/known issues (READ):
- One report of 'ips' (ServeRAID) driver acting really
slow in conjunction with mbuma. Need more data.
Latest report is that ips is equally sucking with
and without mbuma.
- Giant leak in NFS code sometimes occurs, can't
reproduce but currently analyzing; brueffer is
able to reproduce but THIS IS NOT an mbuma-specific
problem and currently occurs even WITHOUT mbuma.
- Issues in network locking: there is at least one
code path in the rip code where one or more locks
are acquired and we end up in m_prepend() with
M_WAITOK, which causes WITNESS to whine from within
UMA. Current temporary solution: force all UMA
allocations to be M_NOWAIT from within UMA for now
to avoid deadlocks unless WITNESS is defined and we
can determine with certainty that we're not holding
any locks when we're M_WAITOK.
- I've seen at least one weird socketbuffer empty-but-
mbuf-still-attached panic. I don't believe this
to be related to mbuma but please keep your eyes
open, turn on debugging, and capture crash dumps.
This change removes more code than it adds.
A paper is available detailing the change and considering
various performance issues, it was presented at BSDCan2004:
http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf
Please read the paper for Future Work and implementation
details, as well as credits.
Testing and Debugging:
rwatson,
brueffer,
Ketrien I. Saihr-Kesenchedra,
...
Reviewed by: Lots of people (for different parts)
2004-05-31 21:46:06 +00:00
|
|
|
uma_zalloc_arg(zone_clust, m, how);
|
2007-01-25 01:05:23 +00:00
|
|
|
/*
|
|
|
|
* On a cluster allocation failure, drain the packet zone and retry,
|
|
|
|
* we might be able to loosen a few clusters up on the drain.
|
|
|
|
*/
|
|
|
|
if ((how & M_NOWAIT) && (m->m_ext.ext_buf == NULL)) {
|
|
|
|
zone_drain(zone_pack);
|
|
|
|
uma_zalloc_arg(zone_clust, m, how);
|
|
|
|
}
|
Bring in mbuma to replace mballoc.
mbuma is an Mbuf & Cluster allocator built on top of a number of
extensions to the UMA framework, all included herein.
Extensions to UMA worth noting:
- Better layering between slab <-> zone caches; introduce
Keg structure which splits off slab cache away from the
zone structure and allows multiple zones to be stacked
on top of a single Keg (single type of slab cache);
perhaps we should look into defining a subset API on
top of the Keg for special use by malloc(9),
for example.
- UMA_ZONE_REFCNT zones can now be added, and reference
counters automagically allocated for them within the end
of the associated slab structures. uma_find_refcnt()
does a kextract to fetch the slab struct reference from
the underlying page, and lookup the corresponding refcnt.
mbuma things worth noting:
- integrates mbuf & cluster allocations with extended UMA
and provides caches for commonly-allocated items; defines
several zones (two primary, one secondary) and two kegs.
- change up certain code paths that always used to do:
m_get() + m_clget() to instead just use m_getcl() and
try to take advantage of the newly defined secondary
Packet zone.
- netstat(1) and systat(1) quickly hacked up to do basic
stat reporting but additional stats work needs to be
done once some other details within UMA have been taken
care of and it becomes clearer to how stats will work
within the modified framework.
From the user perspective, one implication is that the
NMBCLUSTERS compile-time option is no longer used. The
maximum number of clusters is still capped off according
to maxusers, but it can be made unlimited by setting
the kern.ipc.nmbclusters boot-time tunable to zero.
Work should be done to write an appropriate sysctl
handler allowing dynamic tuning of kern.ipc.nmbclusters
at runtime.
Additional things worth noting/known issues (READ):
- One report of 'ips' (ServeRAID) driver acting really
slow in conjunction with mbuma. Need more data.
Latest report is that ips is equally sucking with
and without mbuma.
- Giant leak in NFS code sometimes occurs, can't
reproduce but currently analyzing; brueffer is
able to reproduce but THIS IS NOT an mbuma-specific
problem and currently occurs even WITHOUT mbuma.
- Issues in network locking: there is at least one
code path in the rip code where one or more locks
are acquired and we end up in m_prepend() with
M_WAITOK, which causes WITNESS to whine from within
UMA. Current temporary solution: force all UMA
allocations to be M_NOWAIT from within UMA for now
to avoid deadlocks unless WITNESS is defined and we
can determine with certainty that we're not holding
any locks when we're M_WAITOK.
- I've seen at least one weird socketbuffer empty-but-
mbuf-still-attached panic. I don't believe this
to be related to mbuma but please keep your eyes
open, turn on debugging, and capture crash dumps.
This change removes more code than it adds.
A paper is available detailing the change and considering
various performance issues, it was presented at BSDCan2004:
http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf
Please read the paper for Future Work and implementation
details, as well as credits.
Testing and Debugging:
rwatson,
brueffer,
Ketrien I. Saihr-Kesenchedra,
...
Reviewed by: Lots of people (for different parts)
2004-05-31 21:46:06 +00:00
|
|
|
}
|
|
|
|
|
2005-12-08 13:13:06 +00:00
|
|
|
/*
|
2007-03-24 20:19:44 +00:00
|
|
|
* m_cljget() is different from m_clget() as it can allocate clusters without
|
|
|
|
* attaching them to an mbuf. In that case the return value is the pointer
|
|
|
|
* to the cluster of the requested size. If an mbuf was specified, it gets
|
|
|
|
* the cluster attached to it and the return value can be safely ignored.
|
2006-02-17 14:14:15 +00:00
|
|
|
* For size it takes MCLBYTES, MJUMPAGESIZE, MJUM9BYTES, MJUM16BYTES.
|
2005-12-08 13:13:06 +00:00
|
|
|
*/
|
2007-03-24 20:19:44 +00:00
|
|
|
static __inline void *
|
2005-12-08 13:13:06 +00:00
|
|
|
m_cljget(struct mbuf *m, int how, int size)
|
|
|
|
{
|
|
|
|
uma_zone_t zone;
|
|
|
|
|
|
|
|
if (m && m->m_flags & M_EXT)
|
|
|
|
printf("%s: %p mbuf already has cluster\n", __func__, m);
|
|
|
|
if (m != NULL)
|
|
|
|
m->m_ext.ext_buf = NULL;
|
|
|
|
|
2007-04-04 00:31:49 +00:00
|
|
|
zone = m_getzone(size);
|
2005-12-08 13:13:06 +00:00
|
|
|
return (uma_zalloc_arg(zone, m, how));
|
|
|
|
}
|
|
|
|
|
2007-04-04 04:08:57 +00:00
|
|
|
static __inline void
|
|
|
|
m_cljset(struct mbuf *m, void *cl, int type)
|
|
|
|
{
|
|
|
|
uma_zone_t zone;
|
|
|
|
int size;
|
|
|
|
|
|
|
|
switch (type) {
|
|
|
|
case EXT_CLUSTER:
|
|
|
|
size = MCLBYTES;
|
|
|
|
zone = zone_clust;
|
|
|
|
break;
|
|
|
|
#if MJUMPAGESIZE != MCLBYTES
|
|
|
|
case EXT_JUMBOP:
|
|
|
|
size = MJUMPAGESIZE;
|
|
|
|
zone = zone_jumbop;
|
|
|
|
break;
|
|
|
|
#endif
|
|
|
|
case EXT_JUMBO9:
|
|
|
|
size = MJUM9BYTES;
|
|
|
|
zone = zone_jumbo9;
|
|
|
|
break;
|
|
|
|
case EXT_JUMBO16:
|
|
|
|
size = MJUM16BYTES;
|
|
|
|
zone = zone_jumbo16;
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
panic("unknown cluster type");
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
m->m_data = m->m_ext.ext_buf = cl;
|
2008-02-01 19:36:27 +00:00
|
|
|
m->m_ext.ext_free = m->m_ext.ext_arg1 = m->m_ext.ext_arg2 = NULL;
|
2007-04-04 04:08:57 +00:00
|
|
|
m->m_ext.ext_size = size;
|
|
|
|
m->m_ext.ext_type = type;
|
|
|
|
m->m_ext.ref_cnt = uma_find_refcnt(zone, cl);
|
|
|
|
m->m_flags |= M_EXT;
|
|
|
|
|
|
|
|
}
|
|
|
|
|
2007-03-24 20:19:44 +00:00
|
|
|
static __inline void
|
Bring in mbuma to replace mballoc.
mbuma is an Mbuf & Cluster allocator built on top of a number of
extensions to the UMA framework, all included herein.
Extensions to UMA worth noting:
- Better layering between slab <-> zone caches; introduce
Keg structure which splits off slab cache away from the
zone structure and allows multiple zones to be stacked
on top of a single Keg (single type of slab cache);
perhaps we should look into defining a subset API on
top of the Keg for special use by malloc(9),
for example.
- UMA_ZONE_REFCNT zones can now be added, and reference
counters automagically allocated for them within the end
of the associated slab structures. uma_find_refcnt()
does a kextract to fetch the slab struct reference from
the underlying page, and lookup the corresponding refcnt.
mbuma things worth noting:
- integrates mbuf & cluster allocations with extended UMA
and provides caches for commonly-allocated items; defines
several zones (two primary, one secondary) and two kegs.
- change up certain code paths that always used to do:
m_get() + m_clget() to instead just use m_getcl() and
try to take advantage of the newly defined secondary
Packet zone.
- netstat(1) and systat(1) quickly hacked up to do basic
stat reporting but additional stats work needs to be
done once some other details within UMA have been taken
care of and it becomes clearer to how stats will work
within the modified framework.
From the user perspective, one implication is that the
NMBCLUSTERS compile-time option is no longer used. The
maximum number of clusters is still capped off according
to maxusers, but it can be made unlimited by setting
the kern.ipc.nmbclusters boot-time tunable to zero.
Work should be done to write an appropriate sysctl
handler allowing dynamic tuning of kern.ipc.nmbclusters
at runtime.
Additional things worth noting/known issues (READ):
- One report of 'ips' (ServeRAID) driver acting really
slow in conjunction with mbuma. Need more data.
Latest report is that ips is equally sucking with
and without mbuma.
- Giant leak in NFS code sometimes occurs, can't
reproduce but currently analyzing; brueffer is
able to reproduce but THIS IS NOT an mbuma-specific
problem and currently occurs even WITHOUT mbuma.
- Issues in network locking: there is at least one
code path in the rip code where one or more locks
are acquired and we end up in m_prepend() with
M_WAITOK, which causes WITNESS to whine from within
UMA. Current temporary solution: force all UMA
allocations to be M_NOWAIT from within UMA for now
to avoid deadlocks unless WITNESS is defined and we
can determine with certainty that we're not holding
any locks when we're M_WAITOK.
- I've seen at least one weird socketbuffer empty-but-
mbuf-still-attached panic. I don't believe this
to be related to mbuma but please keep your eyes
open, turn on debugging, and capture crash dumps.
This change removes more code than it adds.
A paper is available detailing the change and considering
various performance issues, it was presented at BSDCan2004:
http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf
Please read the paper for Future Work and implementation
details, as well as credits.
Testing and Debugging:
rwatson,
brueffer,
Ketrien I. Saihr-Kesenchedra,
...
Reviewed by: Lots of people (for different parts)
2004-05-31 21:46:06 +00:00
|
|
|
m_chtype(struct mbuf *m, short new_type)
|
|
|
|
{
|
2007-03-24 20:19:44 +00:00
|
|
|
|
Bring in mbuma to replace mballoc.
mbuma is an Mbuf & Cluster allocator built on top of a number of
extensions to the UMA framework, all included herein.
Extensions to UMA worth noting:
- Better layering between slab <-> zone caches; introduce
Keg structure which splits off slab cache away from the
zone structure and allows multiple zones to be stacked
on top of a single Keg (single type of slab cache);
perhaps we should look into defining a subset API on
top of the Keg for special use by malloc(9),
for example.
- UMA_ZONE_REFCNT zones can now be added, and reference
counters automagically allocated for them within the end
of the associated slab structures. uma_find_refcnt()
does a kextract to fetch the slab struct reference from
the underlying page, and lookup the corresponding refcnt.
mbuma things worth noting:
- integrates mbuf & cluster allocations with extended UMA
and provides caches for commonly-allocated items; defines
several zones (two primary, one secondary) and two kegs.
- change up certain code paths that always used to do:
m_get() + m_clget() to instead just use m_getcl() and
try to take advantage of the newly defined secondary
Packet zone.
- netstat(1) and systat(1) quickly hacked up to do basic
stat reporting but additional stats work needs to be
done once some other details within UMA have been taken
care of and it becomes clearer to how stats will work
within the modified framework.
From the user perspective, one implication is that the
NMBCLUSTERS compile-time option is no longer used. The
maximum number of clusters is still capped off according
to maxusers, but it can be made unlimited by setting
the kern.ipc.nmbclusters boot-time tunable to zero.
Work should be done to write an appropriate sysctl
handler allowing dynamic tuning of kern.ipc.nmbclusters
at runtime.
Additional things worth noting/known issues (READ):
- One report of 'ips' (ServeRAID) driver acting really
slow in conjunction with mbuma. Need more data.
Latest report is that ips is equally sucking with
and without mbuma.
- Giant leak in NFS code sometimes occurs, can't
reproduce but currently analyzing; brueffer is
able to reproduce but THIS IS NOT an mbuma-specific
problem and currently occurs even WITHOUT mbuma.
- Issues in network locking: there is at least one
code path in the rip code where one or more locks
are acquired and we end up in m_prepend() with
M_WAITOK, which causes WITNESS to whine from within
UMA. Current temporary solution: force all UMA
allocations to be M_NOWAIT from within UMA for now
to avoid deadlocks unless WITNESS is defined and we
can determine with certainty that we're not holding
any locks when we're M_WAITOK.
- I've seen at least one weird socketbuffer empty-but-
mbuf-still-attached panic. I don't believe this
to be related to mbuma but please keep your eyes
open, turn on debugging, and capture crash dumps.
This change removes more code than it adds.
A paper is available detailing the change and considering
various performance issues, it was presented at BSDCan2004:
http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf
Please read the paper for Future Work and implementation
details, as well as credits.
Testing and Debugging:
rwatson,
brueffer,
Ketrien I. Saihr-Kesenchedra,
...
Reviewed by: Lots of people (for different parts)
2004-05-31 21:46:06 +00:00
|
|
|
m->m_type = new_type;
|
|
|
|
}
|
|
|
|
|
2007-04-11 23:13:12 +00:00
|
|
|
static __inline struct mbuf *
|
|
|
|
m_last(struct mbuf *m)
|
|
|
|
{
|
|
|
|
|
|
|
|
while (m->m_next)
|
|
|
|
m = m->m_next;
|
|
|
|
return (m);
|
|
|
|
}
|
|
|
|
|
1994-05-24 10:09:53 +00:00
|
|
|
/*
|
2007-03-24 20:19:44 +00:00
|
|
|
* mbuf, cluster, and external object allocation macros (for compatibility
|
|
|
|
* purposes).
|
Big mbuf subsystem diff #1: incorporate mutexes and fix things up somewhat
to accomodate the changes.
Here's a list of things that have changed (I may have left out a few); for a
relatively complete list, see http://people.freebsd.org/~bmilekic/mtx_journal
* Remove old (once useful) mcluster code for MCLBYTES > PAGE_SIZE which
nobody uses anymore. It was great while it lasted, but now we're moving
onto bigger and better things (Approved by: wollman).
* Practically re-wrote the allocation macros in sys/sys/mbuf.h to accomodate
new allocations which grab the necessary lock.
* Make sure that necessary mbstat variables are manipulated with
corresponding atomic() routines.
* Changed the "wait" routines, cleaned it up, made one routine that does
the job.
* Generalized MWAKEUP() macro. Got rid of m_retry and m_retryhdr, as they
are now included in the generalized "wait" routines.
* Sleep routines now use msleep().
* Free lists have locks.
* etc... probably other stuff I'm missing...
Things to look out for and work on later:
* find a better way to (dynamically) adjust EXT_COUNTERS
* move necessity to recurse on a lock from drain routines by providing
lock-free lower-level version of MFREE() (and possibly m_free()?).
* checkout include of mutex.h in sys/sys/mbuf.h - probably violating
general philosophy here.
The code has been reviewed quite a bit, but problems may arise... please,
don't panic! Send me Emails: bmilekic@freebsd.org
Reviewed by: jlemon, cp, alfred, others?
2000-09-30 06:30:39 +00:00
|
|
|
*/
|
2002-12-30 20:22:40 +00:00
|
|
|
#define M_MOVE_PKTHDR(to, from) m_move_pkthdr((to), (from))
|
2002-09-18 20:28:58 +00:00
|
|
|
#define MGET(m, how, type) ((m) = m_get((how), (type)))
|
|
|
|
#define MGETHDR(m, how, type) ((m) = m_gethdr((how), (type)))
|
2001-12-23 22:04:08 +00:00
|
|
|
#define MCLGET(m, how) m_clget((m), (how))
|
2008-02-01 19:36:27 +00:00
|
|
|
#define MEXTADD(m, buf, size, free, arg1, arg2, flags, type) \
|
|
|
|
m_extadd((m), (caddr_t)(buf), (size), (free),(arg1),(arg2),(flags), (type))
|
2006-11-02 17:37:22 +00:00
|
|
|
#define m_getm(m, len, how, type) \
|
|
|
|
m_getm2((m), (len), (how), (type), M_PKTHDR)
|
1999-12-18 13:52:44 +00:00
|
|
|
|
2000-11-11 23:12:27 +00:00
|
|
|
/*
|
2007-03-24 20:19:44 +00:00
|
|
|
* Evaluate TRUE if it's safe to write to the mbuf m's data region (this can
|
|
|
|
* be both the local data payload, or an external buffer area, depending on
|
|
|
|
* whether M_EXT is set).
|
2000-11-11 23:12:27 +00:00
|
|
|
*/
|
2005-11-02 16:20:36 +00:00
|
|
|
#define M_WRITABLE(m) (!((m)->m_flags & M_RDONLY) && \
|
|
|
|
(!(((m)->m_flags & M_EXT)) || \
|
|
|
|
(*((m)->m_ext.ref_cnt) == 1)) ) \
|
2000-11-11 23:12:27 +00:00
|
|
|
|
2004-05-28 14:20:06 +00:00
|
|
|
/* Check if the supplied mbuf has a packet header, or else panic. */
|
|
|
|
#define M_ASSERTPKTHDR(m) \
|
2008-01-15 04:00:12 +00:00
|
|
|
KASSERT((m) != NULL && (m)->m_flags & M_PKTHDR, \
|
2004-05-28 14:20:06 +00:00
|
|
|
("%s: no mbuf packet header!", __func__))
|
2003-04-08 14:25:47 +00:00
|
|
|
|
2007-03-24 20:19:44 +00:00
|
|
|
/*
|
|
|
|
* Ensure that the supplied mbuf is a valid, non-free mbuf.
|
|
|
|
*
|
|
|
|
* XXX: Broken at the moment. Need some UMA magic to make it work again.
|
|
|
|
*/
|
2004-05-28 14:20:06 +00:00
|
|
|
#define M_ASSERTVALID(m) \
|
2005-11-02 16:20:36 +00:00
|
|
|
KASSERT((((struct mbuf *)m)->m_flags & 0) == 0, \
|
2004-05-28 14:20:06 +00:00
|
|
|
("%s: attempted use of a free mbuf!", __func__))
|
2003-10-19 22:33:41 +00:00
|
|
|
|
1994-05-24 10:09:53 +00:00
|
|
|
/*
|
2007-03-24 20:19:44 +00:00
|
|
|
* Set the m_data pointer of a newly-allocated mbuf (m_get/MGET) to place an
|
|
|
|
* object of the specified size at the end of the mbuf, longword aligned.
|
1994-05-24 10:09:53 +00:00
|
|
|
*/
|
1999-12-18 13:52:44 +00:00
|
|
|
#define M_ALIGN(m, len) do { \
|
2005-11-18 14:40:43 +00:00
|
|
|
KASSERT(!((m)->m_flags & (M_PKTHDR|M_EXT)), \
|
|
|
|
("%s: M_ALIGN not normal mbuf", __func__)); \
|
|
|
|
KASSERT((m)->m_data == (m)->m_dat, \
|
|
|
|
("%s: M_ALIGN not a virgin mbuf", __func__)); \
|
1999-12-18 13:52:44 +00:00
|
|
|
(m)->m_data += (MLEN - (len)) & ~(sizeof(long) - 1); \
|
|
|
|
} while (0)
|
|
|
|
|
1994-05-24 10:09:53 +00:00
|
|
|
/*
|
2007-03-24 20:19:44 +00:00
|
|
|
* As above, for mbufs allocated with m_gethdr/MGETHDR or initialized by
|
|
|
|
* M_DUP/MOVE_PKTHDR.
|
1994-05-24 10:09:53 +00:00
|
|
|
*/
|
1999-12-18 13:52:44 +00:00
|
|
|
#define MH_ALIGN(m, len) do { \
|
2005-11-18 14:40:43 +00:00
|
|
|
KASSERT((m)->m_flags & M_PKTHDR && !((m)->m_flags & M_EXT), \
|
|
|
|
("%s: MH_ALIGN not PKTHDR mbuf", __func__)); \
|
|
|
|
KASSERT((m)->m_data == (m)->m_pktdat, \
|
|
|
|
("%s: MH_ALIGN not a virgin mbuf", __func__)); \
|
1999-12-18 13:52:44 +00:00
|
|
|
(m)->m_data += (MHLEN - (len)) & ~(sizeof(long) - 1); \
|
|
|
|
} while (0)
|
1994-05-24 10:09:53 +00:00
|
|
|
|
|
|
|
/*
|
2007-03-24 20:19:44 +00:00
|
|
|
* Compute the amount of space available before the current start of data in
|
|
|
|
* an mbuf.
|
2002-05-31 22:09:57 +00:00
|
|
|
*
|
|
|
|
* The M_WRITABLE() is a temporary, conservative safety measure: the burden
|
|
|
|
* of checking writability of the mbuf data area rests solely with the caller.
|
1994-05-24 10:09:53 +00:00
|
|
|
*/
|
1999-12-18 13:52:44 +00:00
|
|
|
#define M_LEADINGSPACE(m) \
|
|
|
|
((m)->m_flags & M_EXT ? \
|
2001-12-14 17:31:58 +00:00
|
|
|
(M_WRITABLE(m) ? (m)->m_data - (m)->m_ext.ext_buf : 0): \
|
1999-12-18 13:52:44 +00:00
|
|
|
(m)->m_flags & M_PKTHDR ? (m)->m_data - (m)->m_pktdat : \
|
1994-05-24 10:09:53 +00:00
|
|
|
(m)->m_data - (m)->m_dat)
|
|
|
|
|
|
|
|
/*
|
2007-03-24 20:19:44 +00:00
|
|
|
* Compute the amount of space available after the end of data in an mbuf.
|
2002-05-31 22:09:57 +00:00
|
|
|
*
|
|
|
|
* The M_WRITABLE() is a temporary, conservative safety measure: the burden
|
|
|
|
* of checking writability of the mbuf data area rests solely with the caller.
|
1994-05-24 10:09:53 +00:00
|
|
|
*/
|
1999-12-18 13:52:44 +00:00
|
|
|
#define M_TRAILINGSPACE(m) \
|
2002-05-31 22:09:57 +00:00
|
|
|
((m)->m_flags & M_EXT ? \
|
|
|
|
(M_WRITABLE(m) ? (m)->m_ext.ext_buf + (m)->m_ext.ext_size \
|
|
|
|
- ((m)->m_data + (m)->m_len) : 0) : \
|
1994-05-24 10:09:53 +00:00
|
|
|
&(m)->m_dat[MLEN] - ((m)->m_data + (m)->m_len))
|
|
|
|
|
|
|
|
/*
|
2007-03-24 20:19:44 +00:00
|
|
|
* Arrange to prepend space of size plen to mbuf m. If a new mbuf must be
|
|
|
|
* allocated, how specifies whether to wait. If the allocation fails, the
|
|
|
|
* original mbuf chain is freed and m is set to NULL.
|
1994-05-24 10:09:53 +00:00
|
|
|
*/
|
1999-12-18 13:52:44 +00:00
|
|
|
#define M_PREPEND(m, plen, how) do { \
|
2001-12-23 22:04:08 +00:00
|
|
|
struct mbuf **_mmp = &(m); \
|
|
|
|
struct mbuf *_mm = *_mmp; \
|
|
|
|
int _mplen = (plen); \
|
|
|
|
int __mhow = (how); \
|
1999-12-19 01:47:16 +00:00
|
|
|
\
|
2004-07-21 07:12:24 +00:00
|
|
|
MBUF_CHECKSLEEP(how); \
|
1999-12-19 01:47:16 +00:00
|
|
|
if (M_LEADINGSPACE(_mm) >= _mplen) { \
|
|
|
|
_mm->m_data -= _mplen; \
|
|
|
|
_mm->m_len += _mplen; \
|
1999-12-18 13:52:44 +00:00
|
|
|
} else \
|
1999-12-19 01:47:16 +00:00
|
|
|
_mm = m_prepend(_mm, _mplen, __mhow); \
|
2000-04-19 01:24:26 +00:00
|
|
|
if (_mm != NULL && _mm->m_flags & M_PKTHDR) \
|
1999-12-19 01:47:16 +00:00
|
|
|
_mm->m_pkthdr.len += _mplen; \
|
|
|
|
*_mmp = _mm; \
|
1999-12-18 13:52:44 +00:00
|
|
|
} while (0)
|
1994-05-24 10:09:53 +00:00
|
|
|
|
Big mbuf subsystem diff #1: incorporate mutexes and fix things up somewhat
to accomodate the changes.
Here's a list of things that have changed (I may have left out a few); for a
relatively complete list, see http://people.freebsd.org/~bmilekic/mtx_journal
* Remove old (once useful) mcluster code for MCLBYTES > PAGE_SIZE which
nobody uses anymore. It was great while it lasted, but now we're moving
onto bigger and better things (Approved by: wollman).
* Practically re-wrote the allocation macros in sys/sys/mbuf.h to accomodate
new allocations which grab the necessary lock.
* Make sure that necessary mbstat variables are manipulated with
corresponding atomic() routines.
* Changed the "wait" routines, cleaned it up, made one routine that does
the job.
* Generalized MWAKEUP() macro. Got rid of m_retry and m_retryhdr, as they
are now included in the generalized "wait" routines.
* Sleep routines now use msleep().
* Free lists have locks.
* etc... probably other stuff I'm missing...
Things to look out for and work on later:
* find a better way to (dynamically) adjust EXT_COUNTERS
* move necessity to recurse on a lock from drain routines by providing
lock-free lower-level version of MFREE() (and possibly m_free()?).
* checkout include of mutex.h in sys/sys/mbuf.h - probably violating
general philosophy here.
The code has been reviewed quite a bit, but problems may arise... please,
don't panic! Send me Emails: bmilekic@freebsd.org
Reviewed by: jlemon, cp, alfred, others?
2000-09-30 06:30:39 +00:00
|
|
|
/*
|
2007-03-24 20:19:44 +00:00
|
|
|
* Change mbuf to new type. This is a relatively expensive operation and
|
|
|
|
* should be avoided.
|
Big mbuf subsystem diff #1: incorporate mutexes and fix things up somewhat
to accomodate the changes.
Here's a list of things that have changed (I may have left out a few); for a
relatively complete list, see http://people.freebsd.org/~bmilekic/mtx_journal
* Remove old (once useful) mcluster code for MCLBYTES > PAGE_SIZE which
nobody uses anymore. It was great while it lasted, but now we're moving
onto bigger and better things (Approved by: wollman).
* Practically re-wrote the allocation macros in sys/sys/mbuf.h to accomodate
new allocations which grab the necessary lock.
* Make sure that necessary mbstat variables are manipulated with
corresponding atomic() routines.
* Changed the "wait" routines, cleaned it up, made one routine that does
the job.
* Generalized MWAKEUP() macro. Got rid of m_retry and m_retryhdr, as they
are now included in the generalized "wait" routines.
* Sleep routines now use msleep().
* Free lists have locks.
* etc... probably other stuff I'm missing...
Things to look out for and work on later:
* find a better way to (dynamically) adjust EXT_COUNTERS
* move necessity to recurse on a lock from drain routines by providing
lock-free lower-level version of MFREE() (and possibly m_free()?).
* checkout include of mutex.h in sys/sys/mbuf.h - probably violating
general philosophy here.
The code has been reviewed quite a bit, but problems may arise... please,
don't panic! Send me Emails: bmilekic@freebsd.org
Reviewed by: jlemon, cp, alfred, others?
2000-09-30 06:30:39 +00:00
|
|
|
*/
|
2001-09-30 01:58:39 +00:00
|
|
|
#define MCHTYPE(m, t) m_chtype((m), (t))
|
1994-05-24 10:09:53 +00:00
|
|
|
|
2001-12-23 22:04:08 +00:00
|
|
|
/* Length to m_copy to copy all. */
|
1994-05-24 10:09:53 +00:00
|
|
|
#define M_COPYALL 1000000000
|
|
|
|
|
2002-09-18 20:28:58 +00:00
|
|
|
/* Compatibility with 4.3. */
|
2003-02-19 05:47:46 +00:00
|
|
|
#define m_copy(m, o, l) m_copym((m), (o), (l), M_DONTWAIT)
|
1994-05-24 10:09:53 +00:00
|
|
|
|
2007-03-24 20:19:44 +00:00
|
|
|
extern int max_datalen; /* MHLEN - max_hdr */
|
|
|
|
extern int max_hdr; /* Largest link + protocol header */
|
|
|
|
extern int max_linkhdr; /* Largest link-level header */
|
|
|
|
extern int max_protohdr; /* Largest protocol header */
|
|
|
|
extern struct mbstat mbstat; /* General mbuf stats/infos */
|
|
|
|
extern int nmbclusters; /* Maximum number of clusters */
|
2001-06-22 06:35:32 +00:00
|
|
|
|
2004-05-28 14:20:06 +00:00
|
|
|
struct uio;
|
2004-02-08 03:19:08 +00:00
|
|
|
|
2001-06-22 06:35:32 +00:00
|
|
|
void m_adj(struct mbuf *, int);
|
2005-07-30 01:32:16 +00:00
|
|
|
void m_align(struct mbuf *, int);
|
2003-12-15 21:49:41 +00:00
|
|
|
int m_apply(struct mbuf *, int, int,
|
2004-05-28 14:20:06 +00:00
|
|
|
int (*)(void *, void *, u_int), void *);
|
2004-12-08 05:42:02 +00:00
|
|
|
int m_append(struct mbuf *, int, c_caddr_t);
|
2001-06-22 06:35:32 +00:00
|
|
|
void m_cat(struct mbuf *, struct mbuf *);
|
|
|
|
void m_extadd(struct mbuf *, caddr_t, u_int,
|
2008-02-01 19:36:27 +00:00
|
|
|
void (*)(void *, void *), void *, void *, int, int);
|
2008-01-17 21:25:09 +00:00
|
|
|
struct mbuf *m_collapse(struct mbuf *, int, int);
|
2004-04-18 13:01:28 +00:00
|
|
|
void m_copyback(struct mbuf *, int, int, c_caddr_t);
|
2001-08-19 04:35:28 +00:00
|
|
|
void m_copydata(const struct mbuf *, int, int, caddr_t);
|
2007-03-24 20:19:44 +00:00
|
|
|
struct mbuf *m_copym(struct mbuf *, int, int, int);
|
|
|
|
struct mbuf *m_copymdata(struct mbuf *, struct mbuf *,
|
2005-08-29 20:15:33 +00:00
|
|
|
int, int, int, int);
|
2007-03-24 20:19:44 +00:00
|
|
|
struct mbuf *m_copypacket(struct mbuf *, int);
|
2002-09-18 20:28:58 +00:00
|
|
|
void m_copy_pkthdr(struct mbuf *, struct mbuf *);
|
2007-03-24 20:19:44 +00:00
|
|
|
struct mbuf *m_copyup(struct mbuf *n, int len, int dstoff);
|
|
|
|
struct mbuf *m_defrag(struct mbuf *, int);
|
2005-08-30 21:14:30 +00:00
|
|
|
void m_demote(struct mbuf *, int);
|
2007-03-24 20:19:44 +00:00
|
|
|
struct mbuf *m_devget(char *, int, int, struct ifnet *,
|
2002-10-06 09:56:52 +00:00
|
|
|
void (*)(char *, caddr_t, u_int));
|
2007-03-24 20:19:44 +00:00
|
|
|
struct mbuf *m_dup(struct mbuf *, int);
|
2002-12-30 20:22:40 +00:00
|
|
|
int m_dup_pkthdr(struct mbuf *, struct mbuf *, int);
|
2002-09-18 22:29:33 +00:00
|
|
|
u_int m_fixhdr(struct mbuf *);
|
2007-03-24 20:19:44 +00:00
|
|
|
struct mbuf *m_fragment(struct mbuf *, int, int);
|
2001-06-22 06:35:32 +00:00
|
|
|
void m_freem(struct mbuf *);
|
2007-03-24 20:19:44 +00:00
|
|
|
struct mbuf *m_getm2(struct mbuf *, int, int, short, int);
|
|
|
|
struct mbuf *m_getptr(struct mbuf *, int, int *);
|
2002-09-18 22:29:33 +00:00
|
|
|
u_int m_length(struct mbuf *, struct mbuf **);
|
2009-06-22 22:20:38 +00:00
|
|
|
int m_mbuftouio(struct uio *, struct mbuf *, int);
|
2002-12-30 20:22:40 +00:00
|
|
|
void m_move_pkthdr(struct mbuf *, struct mbuf *);
|
2007-03-24 20:19:44 +00:00
|
|
|
struct mbuf *m_prepend(struct mbuf *, int, int);
|
2004-09-28 18:40:18 +00:00
|
|
|
void m_print(const struct mbuf *, int);
|
2007-03-24 20:19:44 +00:00
|
|
|
struct mbuf *m_pulldown(struct mbuf *, int, int, int *);
|
|
|
|
struct mbuf *m_pullup(struct mbuf *, int);
|
2005-08-29 19:58:56 +00:00
|
|
|
int m_sanity(struct mbuf *, int);
|
2007-03-24 20:19:44 +00:00
|
|
|
struct mbuf *m_split(struct mbuf *, int, int);
|
|
|
|
struct mbuf *m_uiotombuf(struct uio *, int, int, int, int);
|
|
|
|
struct mbuf *m_unshare(struct mbuf *, int how);
|
2002-10-16 01:54:46 +00:00
|
|
|
|
2010-07-18 20:57:53 +00:00
|
|
|
/*-
|
2007-03-24 20:19:44 +00:00
|
|
|
* Network packets may have annotations attached by affixing a list of
|
|
|
|
* "packet tags" to the pkthdr structure. Packet tags are dynamically
|
|
|
|
* allocated semi-opaque data structures that have a fixed header
|
|
|
|
* (struct m_tag) that specifies the size of the memory block and a
|
|
|
|
* <cookie,type> pair that identifies it. The cookie is a 32-bit unique
|
|
|
|
* unsigned value used to identify a module or ABI. By convention this value
|
|
|
|
* is chosen as the date+time that the module is created, expressed as the
|
|
|
|
* number of seconds since the epoch (e.g., using date -u +'%s'). The type
|
|
|
|
* value is an ABI/module-specific value that identifies a particular
|
|
|
|
* annotation and is private to the module. For compatibility with systems
|
|
|
|
* like OpenBSD that define packet tags w/o an ABI/module cookie, the value
|
|
|
|
* PACKET_ABI_COMPAT is used to implement m_tag_get and m_tag_find
|
|
|
|
* compatibility shim functions and several tag types are defined below.
|
|
|
|
* Users that do not require compatibility should use a private cookie value
|
|
|
|
* so that packet tag-related definitions can be maintained privately.
|
2002-10-16 01:54:46 +00:00
|
|
|
*
|
2007-03-24 20:19:44 +00:00
|
|
|
* Note that the packet tag returned by m_tag_alloc has the default memory
|
|
|
|
* alignment implemented by malloc. To reference private data one can use a
|
|
|
|
* construct like:
|
2002-10-16 01:54:46 +00:00
|
|
|
*
|
2004-05-18 14:13:23 +00:00
|
|
|
* struct m_tag *mtag = m_tag_alloc(...);
|
2002-10-16 01:54:46 +00:00
|
|
|
* struct foo *p = (struct foo *)(mtag+1);
|
|
|
|
*
|
2007-03-24 20:19:44 +00:00
|
|
|
* if the alignment of struct m_tag is sufficient for referencing members of
|
|
|
|
* struct foo. Otherwise it is necessary to embed struct m_tag within the
|
|
|
|
* private data structure to insure proper alignment; e.g.,
|
2002-10-16 01:54:46 +00:00
|
|
|
*
|
|
|
|
* struct foo {
|
|
|
|
* struct m_tag tag;
|
|
|
|
* ...
|
|
|
|
* };
|
2004-05-18 14:13:23 +00:00
|
|
|
* struct foo *p = (struct foo *) m_tag_alloc(...);
|
2002-10-16 01:54:46 +00:00
|
|
|
* struct m_tag *mtag = &p->tag;
|
|
|
|
*/
|
|
|
|
|
2003-10-29 05:40:07 +00:00
|
|
|
/*
|
2007-03-24 20:19:44 +00:00
|
|
|
* Persistent tags stay with an mbuf until the mbuf is reclaimed. Otherwise
|
|
|
|
* tags are expected to ``vanish'' when they pass through a network
|
|
|
|
* interface. For most interfaces this happens normally as the tags are
|
|
|
|
* reclaimed when the mbuf is free'd. However in some special cases
|
|
|
|
* reclaiming must be done manually. An example is packets that pass through
|
|
|
|
* the loopback interface. Also, one must be careful to do this when
|
|
|
|
* ``turning around'' packets (e.g., icmp_reflect).
|
2003-10-29 05:40:07 +00:00
|
|
|
*
|
2007-03-24 20:19:44 +00:00
|
|
|
* To mark a tag persistent bit-or this flag in when defining the tag id.
|
|
|
|
* The tag will then be treated as described above.
|
2003-10-29 05:40:07 +00:00
|
|
|
*/
|
|
|
|
#define MTAG_PERSISTENT 0x800
|
|
|
|
|
2002-10-16 01:54:46 +00:00
|
|
|
#define PACKET_TAG_NONE 0 /* Nadda */
|
|
|
|
|
2004-05-28 14:20:06 +00:00
|
|
|
/* Packet tags for use with PACKET_ABI_COMPAT. */
|
2002-10-16 01:54:46 +00:00
|
|
|
#define PACKET_TAG_IPSEC_IN_DONE 1 /* IPsec applied, in */
|
|
|
|
#define PACKET_TAG_IPSEC_OUT_DONE 2 /* IPsec applied, out */
|
|
|
|
#define PACKET_TAG_IPSEC_IN_CRYPTO_DONE 3 /* NIC IPsec crypto done */
|
|
|
|
#define PACKET_TAG_IPSEC_OUT_CRYPTO_NEEDED 4 /* NIC IPsec crypto req'ed */
|
|
|
|
#define PACKET_TAG_IPSEC_IN_COULD_DO_CRYPTO 5 /* NIC notifies IPsec */
|
|
|
|
#define PACKET_TAG_IPSEC_PENDING_TDB 6 /* Reminder to do IPsec */
|
|
|
|
#define PACKET_TAG_BRIDGE 7 /* Bridge processing done */
|
|
|
|
#define PACKET_TAG_GIF 8 /* GIF processing done */
|
|
|
|
#define PACKET_TAG_GRE 9 /* GRE processing done */
|
|
|
|
#define PACKET_TAG_IN_PACKET_CHECKSUM 10 /* NIC checksumming done */
|
|
|
|
#define PACKET_TAG_ENCAP 11 /* Encap. processing */
|
|
|
|
#define PACKET_TAG_IPSEC_SOCKET 12 /* IPSEC socket ref */
|
|
|
|
#define PACKET_TAG_IPSEC_HISTORY 13 /* IPSEC history */
|
|
|
|
#define PACKET_TAG_IPV6_INPUT 14 /* IPV6 input processing */
|
|
|
|
#define PACKET_TAG_DUMMYNET 15 /* dummynet info */
|
|
|
|
#define PACKET_TAG_DIVERT 17 /* divert info */
|
|
|
|
#define PACKET_TAG_IPFORWARD 18 /* ipforward info */
|
2003-10-29 05:40:07 +00:00
|
|
|
#define PACKET_TAG_MACLABEL (19 | MTAG_PERSISTENT) /* MAC label */
|
2007-07-03 12:46:08 +00:00
|
|
|
#define PACKET_TAG_PF 21 /* PF + ALTQ information */
|
Introduce a netisr to deliver kernel-generated routing, avoiding
recursive entering of the socket code from the routing code:
- Modify rt_dispatch() to bundle up the sockaddr family, if any,
associated with a pending mbuf to dispatch to routing sockets, in
an m_tag on the mbuf.
- Allocate NETISR_ROUTE for use by routing sockets.
- Introduce rtsintrq, an ifqueue to be used by the netisr, and
introduce rts_input(), a function to unbundle the tagged sockaddr
and inject the mbuf and address into raw_input(), which previously
occurred in rt_dispatch().
- Introduce rts_init() to initialize rtsintrq, its mutex, and
register the netisr. Perform this at the same point in system
initialization as setup of the domains.
This change introduces asynchrony between the generation of a
pending routing socket message and delivery to sockets for use
by userspace. It avoids socket->routing->rtsock->socket use and
helps to avoid lock order reversals between the routing code and
socket code (in particular, raw socket control blocks), as route
locks are held over calls to rt_dispatch().
Reviewed by: "George V.Neville-Neil" <gnn@neville-neil.com>
Conceptual head nod by: sam
2004-06-09 02:48:23 +00:00
|
|
|
#define PACKET_TAG_RTSOCKFAM 25 /* rtsock sa family */
|
2004-09-15 20:13:26 +00:00
|
|
|
#define PACKET_TAG_IPOPTIONS 27 /* Saved IP options */
|
2009-04-26 19:15:19 +00:00
|
|
|
#define PACKET_TAG_CARP 28 /* CARP info */
|
Added support for NAT-Traversal (RFC 3948) in IPsec stack.
Thanks to (no special order) Emmanuel Dreyfus (manu@netbsd.org), Larry
Baird (lab@gta.com), gnn, bz, and other FreeBSD devs, Julien Vanherzeele
(julien.vanherzeele@netasq.com, for years of bug reporting), the PFSense
team, and all people who used / tried the NAT-T patch for years and
reported bugs, patches, etc...
X-MFC: never
Reviewed by: bz
Approved by: gnn(mentor)
Obtained from: NETASQ
2009-06-12 15:44:35 +00:00
|
|
|
#define PACKET_TAG_IPSEC_NAT_T_PORTS 29 /* two uint16_t */
|
2010-08-19 11:31:03 +00:00
|
|
|
#define PACKET_TAG_ND_OUTGOING 30 /* ND outgoing */
|
2002-10-16 01:54:46 +00:00
|
|
|
|
Merge the //depot/user/yar/vlan branch into CVS. It contains some collective
work by yar, thompsa and myself. The checksum offloading part also involves
work done by Mihail Balikov.
The most important changes:
o Instead of global linked list of all vlan softc use a per-trunk
hash. The size of hash is dynamically adjusted, depending on
number of entries. This changes struct ifnet, replacing counter
of vlans with a pointer to trunk structure. This change is an
improvement for setups with big number of VLANs, several interfaces
and several CPUs. It is a small regression for a setup with a single
VLAN interface.
An alternative to dynamic hash is a per-trunk static array with
4096 entries, which is a compile time option - VLAN_ARRAY. In my
experiments the array is not an improvement, probably because such
a big trunk structure doesn't fit into CPU cache.
o Introduce an UMA zone for VLAN tags. Since drivers depend on it,
the zone is declared in kern_mbuf.c, not in optional vlan(4) driver.
This change is a big improvement for any setup utilizing vlan(4).
o Use rwlock(9) instead of mutex(9) for locking. We are the first
ones to do this! :)
o Some drivers can do hardware VLAN tagging + hardware checksum
offloading. Add an infrastructure for this. Whenever vlan(4) is
attached to a parent or parent configuration is changed, the flags
on vlan(4) interface are updated.
In collaboration with: yar, thompsa
In collaboration with: Mihail Balikov <mihail.balikov interbgc.com>
2006-01-30 13:45:15 +00:00
|
|
|
/* Specific cookies and tags. */
|
|
|
|
|
2003-11-14 23:58:01 +00:00
|
|
|
/* Packet tag routines. */
|
2007-03-24 20:19:44 +00:00
|
|
|
struct m_tag *m_tag_alloc(u_int32_t, int, int, int);
|
2002-10-16 01:54:46 +00:00
|
|
|
void m_tag_delete(struct mbuf *, struct m_tag *);
|
|
|
|
void m_tag_delete_chain(struct mbuf *, struct m_tag *);
|
2004-10-11 18:40:19 +00:00
|
|
|
void m_tag_free_default(struct m_tag *);
|
2007-03-24 20:19:44 +00:00
|
|
|
struct m_tag *m_tag_locate(struct mbuf *, u_int32_t, int, struct m_tag *);
|
|
|
|
struct m_tag *m_tag_copy(struct m_tag *, int);
|
2002-12-30 20:22:40 +00:00
|
|
|
int m_tag_copy_chain(struct mbuf *, struct mbuf *, int);
|
2003-10-29 05:40:07 +00:00
|
|
|
void m_tag_delete_nonpersistent(struct mbuf *);
|
2002-10-16 01:54:46 +00:00
|
|
|
|
2004-01-02 17:27:39 +00:00
|
|
|
/*
|
|
|
|
* Initialize the list of tags associated with an mbuf.
|
|
|
|
*/
|
|
|
|
static __inline void
|
|
|
|
m_tag_init(struct mbuf *m)
|
|
|
|
{
|
2007-03-24 20:19:44 +00:00
|
|
|
|
2004-01-02 17:27:39 +00:00
|
|
|
SLIST_INIT(&m->m_pkthdr.tags);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2007-03-24 20:19:44 +00:00
|
|
|
* Set up the contents of a tag. Note that this does not fill in the free
|
|
|
|
* method; the caller is expected to do that.
|
2004-01-02 17:27:39 +00:00
|
|
|
*
|
2007-03-24 20:19:44 +00:00
|
|
|
* XXX probably should be called m_tag_init, but that was already taken.
|
2004-01-02 17:27:39 +00:00
|
|
|
*/
|
|
|
|
static __inline void
|
|
|
|
m_tag_setup(struct m_tag *t, u_int32_t cookie, int type, int len)
|
|
|
|
{
|
2007-03-24 20:19:44 +00:00
|
|
|
|
2004-01-02 17:27:39 +00:00
|
|
|
t->m_tag_id = type;
|
|
|
|
t->m_tag_len = len;
|
|
|
|
t->m_tag_cookie = cookie;
|
|
|
|
}
|
|
|
|
|
2004-10-10 09:16:48 +00:00
|
|
|
/*
|
|
|
|
* Reclaim resources associated with a tag.
|
|
|
|
*/
|
|
|
|
static __inline void
|
|
|
|
m_tag_free(struct m_tag *t)
|
|
|
|
{
|
2007-03-24 20:19:44 +00:00
|
|
|
|
2004-10-10 09:16:48 +00:00
|
|
|
(*t->m_tag_free)(t);
|
|
|
|
}
|
|
|
|
|
2004-01-02 17:27:39 +00:00
|
|
|
/*
|
|
|
|
* Return the first tag associated with an mbuf.
|
|
|
|
*/
|
|
|
|
static __inline struct m_tag *
|
|
|
|
m_tag_first(struct mbuf *m)
|
|
|
|
{
|
2007-03-24 20:19:44 +00:00
|
|
|
|
2004-05-28 14:20:06 +00:00
|
|
|
return (SLIST_FIRST(&m->m_pkthdr.tags));
|
2004-01-02 17:27:39 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Return the next tag in the list of tags associated with an mbuf.
|
|
|
|
*/
|
|
|
|
static __inline struct m_tag *
|
|
|
|
m_tag_next(struct mbuf *m, struct m_tag *t)
|
|
|
|
{
|
2007-03-24 20:19:44 +00:00
|
|
|
|
2004-05-28 14:20:06 +00:00
|
|
|
return (SLIST_NEXT(t, m_tag_link));
|
2004-01-02 17:27:39 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Prepend a tag to the list of tags associated with an mbuf.
|
|
|
|
*/
|
|
|
|
static __inline void
|
|
|
|
m_tag_prepend(struct mbuf *m, struct m_tag *t)
|
|
|
|
{
|
2007-03-24 20:19:44 +00:00
|
|
|
|
2004-01-02 17:27:39 +00:00
|
|
|
SLIST_INSERT_HEAD(&m->m_pkthdr.tags, t, m_tag_link);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Unlink a tag from the list of tags associated with an mbuf.
|
|
|
|
*/
|
|
|
|
static __inline void
|
|
|
|
m_tag_unlink(struct mbuf *m, struct m_tag *t)
|
|
|
|
{
|
2007-03-24 20:19:44 +00:00
|
|
|
|
2004-01-02 17:27:39 +00:00
|
|
|
SLIST_REMOVE(&m->m_pkthdr.tags, t, m_tag, m_tag_link);
|
|
|
|
}
|
|
|
|
|
2003-11-14 23:58:01 +00:00
|
|
|
/* These are for OpenBSD compatibility. */
|
2002-10-16 01:54:46 +00:00
|
|
|
#define MTAG_ABI_COMPAT 0 /* compatibility ABI */
|
|
|
|
|
|
|
|
static __inline struct m_tag *
|
|
|
|
m_tag_get(int type, int length, int wait)
|
|
|
|
{
|
2004-05-28 14:20:06 +00:00
|
|
|
return (m_tag_alloc(MTAG_ABI_COMPAT, type, length, wait));
|
2002-10-16 01:54:46 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static __inline struct m_tag *
|
|
|
|
m_tag_find(struct mbuf *m, int type, struct m_tag *start)
|
|
|
|
{
|
2006-07-17 09:05:21 +00:00
|
|
|
return (SLIST_EMPTY(&m->m_pkthdr.tags) ? (struct m_tag *)NULL :
|
|
|
|
m_tag_locate(m, MTAG_ABI_COMPAT, type, start));
|
2002-10-16 01:54:46 +00:00
|
|
|
}
|
2004-05-02 06:36:30 +00:00
|
|
|
|
Add code to allow the system to handle multiple routing tables.
This particular implementation is designed to be fully backwards compatible
and to be MFC-able to 7.x (and 6.x)
Currently the only protocol that can make use of the multiple tables is IPv4
Similar functionality exists in OpenBSD and Linux.
From my notes:
-----
One thing where FreeBSD has been falling behind, and which by chance I
have some time to work on is "policy based routing", which allows
different
packet streams to be routed by more than just the destination address.
Constraints:
------------
I want to make some form of this available in the 6.x tree
(and by extension 7.x) , but FreeBSD in general needs it so I might as
well do it in -current and back port the portions I need.
One of the ways that this can be done is to have the ability to
instantiate multiple kernel routing tables (which I will now
refer to as "Forwarding Information Bases" or "FIBs" for political
correctness reasons). Which FIB a particular packet uses to make
the next hop decision can be decided by a number of mechanisms.
The policies these mechanisms implement are the "Policies" referred
to in "Policy based routing".
One of the constraints I have if I try to back port this work to
6.x is that it must be implemented as a EXTENSION to the existing
ABIs in 6.x so that third party applications do not need to be
recompiled in timespan of the branch.
This first version will not have some of the bells and whistles that
will come with later versions. It will, for example, be limited to 16
tables in the first commit.
Implementation method, Compatible version. (part 1)
-------------------------------
For this reason I have implemented a "sufficient subset" of a
multiple routing table solution in Perforce, and back-ported it
to 6.x. (also in Perforce though not always caught up with what I
have done in -current/P4). The subset allows a number of FIBs
to be defined at compile time (8 is sufficient for my purposes in 6.x)
and implements the changes needed to allow IPV4 to use them. I have not
done the changes for ipv6 simply because I do not need it, and I do not
have enough knowledge of ipv6 (e.g. neighbor discovery) needed to do it.
Other protocol families are left untouched and should there be
users with proprietary protocol families, they should continue to work
and be oblivious to the existence of the extra FIBs.
To understand how this is done, one must know that the current FIB
code starts everything off with a single dimensional array of
pointers to FIB head structures (One per protocol family), each of
which in turn points to the trie of routes available to that family.
The basic change in the ABI compatible version of the change is to
extent that array to be a 2 dimensional array, so that
instead of protocol family X looking at rt_tables[X] for the
table it needs, it looks at rt_tables[Y][X] when for all
protocol families except ipv4 Y is always 0.
Code that is unaware of the change always just sees the first row
of the table, which of course looks just like the one dimensional
array that existed before.
The entry points rtrequest(), rtalloc(), rtalloc1(), rtalloc_ign()
are all maintained, but refer only to the first row of the array,
so that existing callers in proprietary protocols can continue to
do the "right thing".
Some new entry points are added, for the exclusive use of ipv4 code
called in_rtrequest(), in_rtalloc(), in_rtalloc1() and in_rtalloc_ign(),
which have an extra argument which refers the code to the correct row.
In addition, there are some new entry points (currently called
rtalloc_fib() and friends) that check the Address family being
looked up and call either rtalloc() (and friends) if the protocol
is not IPv4 forcing the action to row 0 or to the appropriate row
if it IS IPv4 (and that info is available). These are for calling
from code that is not specific to any particular protocol. The way
these are implemented would change in the non ABI preserving code
to be added later.
One feature of the first version of the code is that for ipv4,
the interface routes show up automatically on all the FIBs, so
that no matter what FIB you select you always have the basic
direct attached hosts available to you. (rtinit() does this
automatically).
You CAN delete an interface route from one FIB should you want
to but by default it's there. ARP information is also available
in each FIB. It's assumed that the same machine would have the
same MAC address, regardless of which FIB you are using to get
to it.
This brings us as to how the correct FIB is selected for an outgoing
IPV4 packet.
Firstly, all packets have a FIB associated with them. if nothing
has been done to change it, it will be FIB 0. The FIB is changed
in the following ways.
Packets fall into one of a number of classes.
1/ locally generated packets, coming from a socket/PCB.
Such packets select a FIB from a number associated with the
socket/PCB. This in turn is inherited from the process,
but can be changed by a socket option. The process in turn
inherits it on fork. I have written a utility call setfib
that acts a bit like nice..
setfib -3 ping target.example.com # will use fib 3 for ping.
It is an obvious extension to make it a property of a jail
but I have not done so. It can be achieved by combining the setfib and
jail commands.
2/ packets received on an interface for forwarding.
By default these packets would use table 0,
(or possibly a number settable in a sysctl(not yet)).
but prior to routing the firewall can inspect them (see below).
(possibly in the future you may be able to associate a FIB
with packets received on an interface.. An ifconfig arg, but not yet.)
3/ packets inspected by a packet classifier, which can arbitrarily
associate a fib with it on a packet by packet basis.
A fib assigned to a packet by a packet classifier
(such as ipfw) would over-ride a fib associated by
a more default source. (such as cases 1 or 2).
4/ a tcp listen socket associated with a fib will generate
accept sockets that are associated with that same fib.
5/ Packets generated in response to some other packet (e.g. reset
or icmp packets). These should use the FIB associated with the
packet being reponded to.
6/ Packets generated during encapsulation.
gif, tun and other tunnel interfaces will encapsulate using the FIB
that was in effect withthe proces that set up the tunnel.
thus setfib 1 ifconfig gif0 [tunnel instructions]
will set the fib for the tunnel to use to be fib 1.
Routing messages would be associated with their
process, and thus select one FIB or another.
messages from the kernel would be associated with the fib they
refer to and would only be received by a routing socket associated
with that fib. (not yet implemented)
In addition Netstat has been edited to be able to cope with the
fact that the array is now 2 dimensional. (It looks in system
memory using libkvm (!)). Old versions of netstat see only the first FIB.
In addition two sysctls are added to give:
a) the number of FIBs compiled in (active)
b) the default FIB of the calling process.
Early testing experience:
-------------------------
Basically our (IronPort's) appliance does this functionality already
using ipfw fwd but that method has some drawbacks.
For example,
It can't fully simulate a routing table because it can't influence the
socket's choice of local address when a connect() is done.
Testing during the generating of these changes has been
remarkably smooth so far. Multiple tables have co-existed
with no notable side effects, and packets have been routes
accordingly.
ipfw has grown 2 new keywords:
setfib N ip from anay to any
count ip from any to any fib N
In pf there seems to be a requirement to be able to give symbolic names to the
fibs but I do not have that capacity. I am not sure if it is required.
SCTP has interestingly enough built in support for this, called VRFs
in Cisco parlance. it will be interesting to see how that handles it
when it suddenly actually does something.
Where to next:
--------------------
After committing the ABI compatible version and MFCing it, I'd
like to proceed in a forward direction in -current. this will
result in some roto-tilling in the routing code.
Firstly: the current code's idea of having a separate tree per
protocol family, all of the same format, and pointed to by the
1 dimensional array is a bit silly. Especially when one considers that
there is code that makes assumptions about every protocol having the
same internal structures there. Some protocols don't WANT that
sort of structure. (for example the whole idea of a netmask is foreign
to appletalk). This needs to be made opaque to the external code.
My suggested first change is to add routing method pointers to the
'domain' structure, along with information pointing the data.
instead of having an array of pointers to uniform structures,
there would be an array pointing to the 'domain' structures
for each protocol address domain (protocol family),
and the methods this reached would be called. The methods would have
an argument that gives FIB number, but the protocol would be free
to ignore it.
When the ABI can be changed it raises the possibilty of the
addition of a fib entry into the "struct route". Currently,
the structure contains the sockaddr of the desination, and the resulting
fib entry. To make this work fully, one could add a fib number
so that given an address and a fib, one can find the third element, the
fib entry.
Interaction with the ARP layer/ LL layer would need to be
revisited as well. Qing Li has been working on this already.
This work was sponsored by Ironport Systems/Cisco
Reviewed by: several including rwatson, bz and mlair (parts each)
Obtained from: Ironport systems/Cisco
2008-05-09 23:03:00 +00:00
|
|
|
/* XXX temporary FIB methods probably eventually use tags.*/
|
|
|
|
#define M_FIBSHIFT 28
|
|
|
|
#define M_FIBMASK 0x0F
|
|
|
|
|
|
|
|
/* get the fib from an mbuf and if it is not set, return the default */
|
|
|
|
#define M_GETFIB(_m) \
|
|
|
|
((((_m)->m_flags & M_FIB) >> M_FIBSHIFT) & M_FIBMASK)
|
|
|
|
|
|
|
|
#define M_SETFIB(_m, _fib) do { \
|
|
|
|
_m->m_flags &= ~M_FIB; \
|
|
|
|
_m->m_flags |= (((_fib) << M_FIBSHIFT) & M_FIB); \
|
|
|
|
} while (0)
|
|
|
|
|
1999-12-29 04:46:21 +00:00
|
|
|
#endif /* _KERNEL */
|
1994-08-21 04:42:17 +00:00
|
|
|
|
2008-04-29 21:23:21 +00:00
|
|
|
#ifdef MBUF_PROFILING
|
|
|
|
void m_profile(struct mbuf *m);
|
|
|
|
#define M_PROFILE(m) m_profile(m)
|
|
|
|
#else
|
|
|
|
#define M_PROFILE(m)
|
|
|
|
#endif
|
|
|
|
|
|
|
|
|
1994-11-14 13:54:20 +00:00
|
|
|
#endif /* !_SYS_MBUF_H_ */
|