Commit Graph

89 Commits

Author SHA1 Message Date
Jonathan T. Looney
2529f56ed3 Add the "TCP Blackbox Recorder" which we discussed at the developer
summits at BSDCan and BSDCam in 2017.

The TCP Blackbox Recorder allows you to capture events on a TCP connection
in a ring buffer. It stores metadata with the event. It optionally stores
the TCP header associated with an event (if the event is associated with a
packet) and also optionally stores information on the sockets.

It supports setting a log ID on a TCP connection and using this to correlate
multiple connections that share a common log ID.

You can log connections in different modes. If you are doing a coordinated
test with a particular connection, you may tell the system to put it in
mode 4 (continuous dump). Or, if you just want to monitor for errors, you
can put it in mode 1 (ring buffer) and dump all the ring buffers associated
with the connection ID when we receive an error signal for that connection
ID. You can set a default mode that will be applied to a particular ratio
of incoming connections. You can also manually set a mode using a socket
option.

This commit includes only basic probes. rrs@ has added quite an abundance
of probes in his TCP development work. He plans to commit those soon.

There are user-space programs which we plan to commit as ports. These read
the data from the log device and output pcapng files, and then let you
analyze the data (and metadata) in the pcapng files.

Reviewed by:	gnn (previous version)
Obtained from:	Netflix, Inc.
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D11085
2018-03-22 09:40:08 +00:00
Pedro F. Giffuni
8a16b7a18f General further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
2017-11-20 19:49:47 +00:00
Bjoern A. Zeeb
ae69ad884d After inpcb route caching was put back in place there is no need for
flowtable anymore (as flowtable was never considered to be useful in
the forwarding path).

Reviewed by:		np
Differential Revision:	https://reviews.freebsd.org/D11448
2017-07-27 13:03:36 +00:00
Warner Losh
fbbd9655e5 Renumber copyright clause 4
Renumber cluase 4 to 3, per what everybody else did when BSD granted
them permission to remove clause 3. My insistance on keeping the same
numbering for legal reasons is too pedantic, so give up on that point.

Submitted by:	Jan Schaumann <jschauma@stevens.edu>
Pull Request:	https://github.com/freebsd/freebsd/pull/96
2017-02-28 23:42:47 +00:00
Xin LI
f193c8ce0d Use strlcpy and snprintf in netstat(1).
Expand inet6name() line buffer to NI_MAXHOST and use strlcpy/snprintf
in various places.

Reported by:	Anton Yuzhaninov <citrin citrin ru>
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D8916
2017-01-05 09:23:54 +00:00
Gleb Smirnoff
dbfd87087b Print running TCP connection counts with TCP statistics. 2016-03-15 00:19:30 +00:00
Mark Johnston
9eddb899d9 Use a common subroutine to fetch and zero protocol stats instead of
duplicating roughly similar code for each protocol.

MFC after:	2 weeks
2015-09-11 04:37:01 +00:00
Mark Johnston
2bdd6ea2ad Remove prototypes for undefined functions from netstat.h. 2015-09-11 04:02:05 +00:00
Hiroki Sato
81dacd8beb Simplify kvm symbol resolution and error handling. The symbol table
nl_symbols will eventually be organized into several modules depending
on MK_* variables.
2015-09-02 18:51:36 +00:00
Hiroki Sato
10d5269ff9 - Add -W flag support for network column in intpr() (-i flag) and
routepr() (-r flag).  It is too narrow to show an IPv6 prefix
  in most cases.

- Accept "local" as a synonym of "unix" in protocol family name.

- Show a prefix length in CIDR notation when name resolution failed in
  netname().

- Make routename() and netname() AF-independent and remove
  unnecessary typecasting from struct sockaddr.

- Use getnameinfo(3) to format L2 addr in intpr().

- Fix a bug which showed "Address" when -A flag is specfied in pr_rthdr().

- Replace cryptic GETSA() macro with SA_SIZE().

- Fix declarations shadowing local variables with the same names.

- Add more static, remove unused header files and variables.

MFC after:	1 week
2015-09-01 08:42:04 +00:00
Marcel Moolenaar
ade9ccfe21 Convert netstat to use libxo.
Obtained from:  Phil Shafer <phil@juniper.net>
Ported to -current by: alfred@ (mostly), Kim Shrier
Formatting: marcel@
Sponsored by:   Juniper Networks, Inc.
2015-02-21 23:47:20 +00:00
Adrian Chadd
85b0f0f325 Add -R to netstat to dump RSS/flow information.
This is intended to help in diagnostics and debugging of NIC and stack
flowid support.

Eventually this will grow another column (RSS CPU ID) but
that currently isn't cached in the inpcb.

There's also no clean flowtype -> flowtype identifier string.  This is
the mbuf M_HASHTYPE_* values for RSS.

Here's some example output:

adrian@adrian-hackbox:~/work/freebsd/head/src % netstat -Rn | more
Active Internet connections
Proto Recv-Q Send-Q Local Address          Foreign Address           flowid ftype
tcp4       0      0 10.11.1.65.22          10.11.1.64.12409        29041942     2
udp4       0      0 127.0.0.1.123          *.*                     00000000     0
udp6       0      0 fe80::1%lo0.123        *.*                     00000000     0
udp6       0      0 ::1.123                *.*                     00000000     0
udp4       0      0 10.11.1.65.123         *.*                     00000000     0

Tested:

* amd64 system w/ igb NIC; local driver changes to expose RSS flowid in if_igb.
2014-05-19 17:11:43 +00:00
Gleb Smirnoff
45c203fce2 Remove AppleTalk support.
AppleTalk was a network transport protocol for Apple Macintosh devices
in 80s and then 90s. Starting with Mac OS X in 2000 the AppleTalk was
a legacy protocol and primary networking protocol is TCP/IP. The last
Mac OS X release to support AppleTalk happened in 2009. The same year
routing equipment vendors (namely Cisco) end their support.

Thus, AppleTalk won't be supported in FreeBSD 11.0-RELEASE.
2014-03-14 06:29:43 +00:00
Gleb Smirnoff
2c284d9395 Remove IPX support.
IPX was a network transport protocol in Novell's NetWare network operating
system from late 80s and then 90s. The NetWare itself switched to TCP/IP
as default transport in 1998. Later, in this century the Novell Open
Enterprise Server became successor of Novell NetWare. The last release
that claimed to still support IPX was OES 2 in 2007. Routing equipment
vendors (e.g. Cisco) discontinued support for IPX in 2011.

Thus, IPX won't be supported in FreeBSD 11.0-RELEASE.
2014-03-14 02:58:48 +00:00
Gleb Smirnoff
e3a7aa6f56 - Remove rt_metrics_lite and simply put its members into rtentry.
- Use counter(9) for rt_pksent (former rt_rmx.rmx_pksent). This
  removes another cache trashing ++ from packet forwarding path.
- Create zini/fini methods for the rtentry UMA zone. Via initialize
  mutex and counter in them.
- Fix reporting of rmx_pksent to routing socket.
- Fix netstat(1) to report "Use" both in kvm(3) and sysctl(3) mode.

The change is mostly targeted for stable/10 merge. For head,
rt_pksent is expected to just disappear.

Discussed with:		melifaro
Sponsored by:		Netflix
Sponsored by:		Nginx, Inc.
2014-03-05 01:17:47 +00:00
Gleb Smirnoff
5d6d7e756b o Revamp API between flowtable and netinet, netinet6.
- ip_output() and ip_output6() simply call flowtable_lookup(),
    passing mbuf and address family. That's the only code under
    #ifdef FLOWTABLE in the protocols code now.
o Revamp statistics gathering and export.
  - Remove hand made pcpu stats, and utilize counter(9).
  - Snapshot of statistics is available via 'netstat -rs'.
  - All sysctls are moved into net.flowtable namespace, since
    spreading them over net.inet isn't correct.
o Properly separate at compile time INET and INET6 parts.
o General cleanup.
  - Remove chain of multiple flowtables. We simply have one for
    IPv4 and one for IPv6.
  - Flowtables are allocated in flowtable.c, symbols are static.
  - With proper argument to SYSINIT() we no longer need flowtable_ready.
  - Hash salt doesn't need to be per-VNET.
  - Removed rudimentary debugging, which use quite useless in dtrace era.

The runtime behavior of flowtable shouldn't be changed by this commit.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-02-07 15:18:23 +00:00
Alexander V. Chernikov
fc47e028bb Use more fine-grained kvm(3) symbol lookup: routing code retrieves only
necessary symbols needed per subsystem. Main kvm(3) init is now delayed
as much as possbile. This finally fixes performance issues reported in
kern/167204.
Some non-working code (ng_socket.ko symbol addresses calculation) removed.
Some global variables eliminated.

PR:		kern/167204
MFC after:	4 weeks
2013-12-20 00:17:26 +00:00
Alexander V. Chernikov
11188df260 Restore corefiles handling via kvm(3).
Found by:	John-Mark Gurney <jmg at funkthat.com>
MFC after:	4 weeks
2013-12-18 20:04:04 +00:00
Gleb Smirnoff
84c1edcbad Rewrite netstat/if.c to use getifaddrs(3) and getifmaddrs(3) instead of
libkvm digging in kernel memory. This is possible since r231506 made
getifaddrs(3) to supply if_data for each ifaddr.

  The pros of this change is that now netstat(1) doesn't know about kernel
struct ifnet and struct ifaddr. And these structs are about to change
significantly in head soon. New netstat binary will work well with 10.0
and any future kernel.

  The cons is that now it isn't possible to obtain interface statistics
from a vmcore.

  Functions intpr() and sidewaysintpr() were rewritten from scratch.

  The output of netstat(1) has underwent the following changes:

1) The MTU is not printed for protocol addresses, since it has no notion.
   Dash is printed instead. If there would be a strong desire to return
   previous output, it is doable.
2) Output interface queue drops are not printed. Currently this data isn't
   available to userland via any API. We plan to drop 'struct ifqueue' from
   'struct ifnet' very soon, so old kvm(3) access to queue drops is soon
   to be broken, too. The plan is that drivers would handle their queues
   theirselves and a new field in if_data would be updated in case of drops.
3) In-kernel reference count for multicast addresses isn't printed. I doubt
   that anyone used it. Anyway, netstat(1) is sysadmin tool, not kernel
   debugger.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-15 09:55:07 +00:00
Hiroki Sato
3fddef95af Add -F fibnum option to specify an FIB number for -r flag. 2013-07-12 17:11:30 +00:00
Andrey V. Elsukov
5da0521fce Use new macros to implement ipstat and tcpstat using PCPU counters.
Change interface of kread_counters() similar ot kread() in the netstat(1).
2013-07-09 09:43:03 +00:00
Gleb Smirnoff
29dde48df4 Use kvm_counter_u64_fetch() to fix obtaining ipstat and tcpstat from
kernel core files.

Sponsored by:	Nginx, Inc.
2013-04-10 20:29:23 +00:00
Hajimu UMEMOTO
cd05232a21 - Hide the internal scope address representation of the KAME IPv6
stack from the output of `netstat -ani'.
- The node-local multicast address in the output of `netstat -rn'
  should be handled as well.

Spotted by:	Bernd Walter <ticso__at__cicely7.cicely.de>
2011-01-20 15:22:01 +00:00
Joel Dahl
da52b4caaf Remove the advertising clause from UCB copyrighted files in usr.bin. This
is in accordance with the information provided at
ftp://ftp.cs.berkeley.edu/pub/4bsd/README.Impt.License.Change

Also add $FreeBSD$ to a few files to keep svn happy.

Discussed with:	imp, rwatson
2010-12-11 08:32:16 +00:00
George V. Neville-Neil
f5d34df525 Add new, per connection, statistics for TCP, including:
Retransmitted Packets
Zero Window Advertisements
Out of Order Receives

These statistics are available via the -T argument to
netstat(1).
MFC after:	2 weeks
2010-11-17 18:55:12 +00:00
Robert Watson
88737be2dd Teach netstat -Q to work with -N and -M by adding libkvm versions of data
query routines.  This code is necessarily more fragile in the presence of
kernel changes than querying the kernel via sysctl (the default), but
useful when investigating crashes or live kernel state via firewire.

MFC after:	1 week
Sponsored by:	Juniper Networks
2010-03-01 00:46:45 +00:00
Robert Watson
0153eb6688 Teach netstat(1) to print out netisr statistics when given the -Q argument.
Currently supports only reporting on live systems via sysctl, kmem support
needs to be edded.

MFC after:	1 week
Sponsored by:	Juniper Networks
2010-02-22 15:57:36 +00:00
Xin LI
bf10ffe1d3 Add a new option, -q howmany, which when used in conjuction with -w,
exits netstat after _howmany_ outputs.

Requested by:	thomasa
Reviewed by:	freebsd-net (bms, old version in early 2007)
MFC after:	1 month
2010-01-11 03:00:17 +00:00
Bjoern A. Zeeb
aaae58c491 Unbreak user space after if_timer/if_watchdog removal in r199975.
Tested by:	glebius
2009-12-01 14:56:00 +00:00
Robert Watson
963b7ccd3b netstat(1) support for UNIX SOCK_SEQPACKET sockets -- changes were required
only for the kvm case, as we supported SOCK_SEQPACKET via sysctl already.

Sponsored by:	Google
MFC after:	3 months
2009-10-05 15:06:14 +00:00
George V. Neville-Neil
54fc657d59 Add ARP statistics to the kernel and netstat.
New counters now exist for:
requests sent
replies sent
requests received
replies received
packets received
total packets dropped due to no ARP entry
entrys timed out
Duplicate IPs seen

The new statistics are seen in the netstat command
when it is given the -s command line switch.

MFC after:	2 weeks
In collaboration with: bz
2009-09-03 21:10:57 +00:00
Bruce M Simpson
443fc3176d Introduce a number of changes to the MROUTING code.
This is purely a forwarding plane cleanup; no control plane
code is involved.

Summary:
 * Split IPv4 and IPv6 MROUTING support. The static compile-time
   kernel option remains the same, however, the modules may now
   be built for IPv4 and IPv6 separately as ip_mroute_mod and
   ip6_mroute_mod.
 * Clean up the IPv4 multicast forwarding code to use BSD queue
   and hash table constructs. Don't build our own timer abstractions
   when ratecheck() and timevalclear() etc will do.
 * Expose the multicast forwarding cache (MFC) and virtual interface
   table (VIF) as sysctls, to reduce netstat's dependence on libkvm
   for this information for running kernels.
   * bandwidth meters however still require libkvm.
 * Make the MFC hash table size a boot/load-time tunable ULONG,
   net.inet.ip.mfchashsize (defaults to 256).
 * Remove unused members from struct vif and struct mfc.
 * Kill RSVP support, as no current RSVP implementation uses it.
   These stubs could be moved to raw_ip.c.
 * Don't share locks or initialization between IPv4 and IPv6.
 * Don't use a static struct route_in6 in ip6_mroute.c.
   The v6 code is still using a cached struct route_in6, this is
   moved to mif6 for the time being.
 * More cleanup remains to be merged from ip_mroute.c to ip6_mroute.c.

v4 path tested using ports/net/mcast-tools.
v6 changes are mostly mechanical locking and *have not* been tested.
As these changes partially break some kernel ABIs, they will not
be MFCed. There is a lot more work to be done here.

Reviewed by:	Pavlin Radoslavov
2009-03-19 01:43:03 +00:00
Bruce M Simpson
57e9cb8cda Now that ifmcstat(8) does not suck, retire host-mode netstat -g.
This change will not be back-ported.
2009-02-15 16:16:38 +00:00
Sam Leffler
690f477d75 add new build knobs and jigger some existing controls to improve
control over the result of buildworld and installworld; this especially
helps packaging systems such as nanobsd

Reviewed by:	various (posted to arch)
MFC after:	1 month
2008-09-21 22:02:26 +00:00
David E. O'Brien
dd335a1577 Minimize changes CURRENT<->releng7. 2008-09-01 15:04:38 +00:00
George V. Neville-Neil
49f287f8c5 Update the kernel to count the number of mbufs and clusters
(all types) used per socket buffer.

Add support to netstat to print out all of the socket buffer
statistics.

Update the netstat manual page to describe the new -x flag
which gives the extended output.

Reviewed by:	rwatson, julian
2008-05-15 20:18:44 +00:00
Marius Strobl
bc784cfe1b Fix netname() [1] and routename() on big-endian LP64 archs.
Submitted by:	Yuri Pankov [1]
MFC after:	3 days
2008-02-07 23:00:40 +00:00
David E. O'Brien
65475bc8e6 style(9)
+ kread is not a boolean, so check it as such
+ fix $FreeBSD$ Ids
+ denote copyrights with /*-
+ misc whitespace changes.
2008-01-02 23:26:11 +00:00
John Baldwin
feda1a4372 Restore netstat -M functionality for most statistics on core dumps. In
general, when support was added to netstat for fetching data using sysctl,
no provision was left for fetching equivalent data from a core dump, and
in fact, netstat would _always_ fetch data from the live kernel using
sysctl even when -M was specified resulting in the user believing they
were getting data from coredumps when they actually weren't.  Some specific
changes:
- Add a global 'live' variable that is true if netstat is running against
  the live kernel and false if -M has been specified.
- Stop abusing the sysctl flag in the protocol tables to hold the protocol
  number.  Instead, the protocol is now its own field in the tables, and
  it is passed as a separate parameter to the PCB and stat routines rather
  than overloading the KVM offset parameter.
- Don't run PCB or stats functions who don't have a namelist offset if we
  are being run against a crash dump (!live).
- For the inet and unix PCB routines, we generate the same buffer from KVM
  that the sysctl usually generates complete with the header and trailer.
- Don't run bpf stats for !live (before it would just silently always run
  live).
- kread() no longer trashes memory when opening the buffer if there is an
  error on open and the passed in buffer is smaller than _POSIX2_LINE_MAX.
- The multicast routing code doesn't fallback to kvm on live kernels if
  the sysctl fails.  Keeping this made the code rather hairy, and netstat
  is already tied to the kernel ABI anyway (even when using sysctl's since
  things like xinpcb contain an inpcb) so any kernels this is run against
  that have the multicast routing stuff should have the sysctls.
- Don't try to dig around in the kernel linker in the netgraph PCB routine
  for core dumps.

Other notes:
- sctp's PCB routine only works on live kernels, it looked rather
  complicated to generate all the same stuff via KVM.  Someone can always
  add it later if desired though.
- Fix the ipsec removal bug where N_xxx for IPSEC stats weren't renumbered.
- Use sysctlbyname() everywhere rather than hardcoded mib values.

MFC after:	1 week
Approved by:	re (rwatson)
2007-07-16 17:15:55 +00:00
George V. Neville-Neil
8409aedfa6 Commit IPv6 support for FAST_IPSEC to the tree.
This commit includes all remaining changes for the time being including
user space updates.

Submitted by:    bz
Approved by:    re
2007-07-01 12:08:08 +00:00
Randall Stewart
74fd40c90c Adds support for SCTP. 2007-06-09 13:44:09 +00:00
Yaroslav Tykhiy
7b95a1ebbd Achieve WARNS=2 by using uintmax_t to pass around 64-bit quantities,
including to printf().  Using uintmax_t is also robust to further
extensions in both the C language and the bitwidth of kernel counters.

Tested on:	i386 amd64 ia64
2006-07-28 16:09:19 +00:00
Kelly Yancey
100b98db75 Add support for printing IPSEC protocol stats if the kernel was compiled
with FAST_IPSEC rather than the KAME IPSEC stack.

Note that the output of "netstat -s -p ipsec" differs depending on which
stack is compiled into the kernel since they each keep different stats.
This delta also adds the "esp", "ah", and "ipcomp" protocol stats, which
are also available when the kernel is compiled with the FAST_IPSEC stack
(e.g. "netstat -s -p esp").

Submitted by:	Matt Titus <titus at nttmcl dot com>
MFC after:	3 days
2005-12-28 20:36:55 +00:00
Robert Watson
d4426f281d Modify netstat -mb to use libmemstat when accessing a core dump or live
kernel memory and not using sysctl.  Previously, libmemstat was used
only for the live kernel via sysctl paths.

This results in netstat output becoming both more consistent between
core dumps and the live kernel, and also more information in the core
dump case than previously (i.e., mbuf cache information).

Statistics relating to sfbufs still rely on a kvm descriptor as they
are not currently exposed via libmemstat.  netstat -m operating on a
core is still unable to print certain sfbuf stats available on the live
kernel.

MFC after:	1 week
2005-11-13 14:06:01 +00:00
Max Laier
b6de9e91bd Remove bridge(4) from the tree. if_bridge(4) is a full functional
replacement and has additional features which make it superior.

Discussed on:	-arch
Reviewed by:	thompsa
X-MFC-after:	never (RELENG_6 as transition period)
2005-09-27 18:10:43 +00:00
Christian S.J. Peron
6b463eed3a Merge bpfstat's functionality into the netstat(1) utility. This adds
a -B option which causes bpf peers to be printed. This option can be
used in conjunction with -I if information about specific interfaces
is desired. This is similar to what NetBSD added to their version of
netstat.

$ netstat -B
  Pid  Netif  Flags      Recv      Drop     Match Sblen Hblen Command
 1137    lo0 p--s--         0         0         0     0     0 tcpdump
  205   sis0 -ifs-l     37331         0         1     0     0 dhclient
$

$ netstat -I lo0 -B
  Pid  Netif  Flags      Recv      Drop     Match Sblen Hblen Command
 1174    lo0 p--s--         0         0         0     0     0 tcpdump
$

-Add bpf.c which stores all the code for retrieving and parsing bpf
 related statistics.
-Modify main.c to add support for the -B option and hook it into the
 program logic.
-Add bpf.c to the build.
-Document this new functionality in the man page and bump the revision
 date.
-Add prototype for bpf_stats function.
2005-09-07 17:35:16 +00:00
Gleb Smirnoff
c2dfd19ff0 Add a new switch -h for interface stats mode, which prints all interface
statistics in human readable form.

In collaboration with:	vsevolod
Reviewed by:		cperciva
2005-08-18 21:04:12 +00:00
Max Laier
2e37c5a333 Print newly exported pfsync statistics with netstat(8).
Requested by:	glebius
MFC after:	1 week
2005-07-14 22:42:35 +00:00
Gleb Smirnoff
a97719482d Add CARP (Common Address Redundancy Protocol), which allows multiple
hosts to share an IP address, providing high availability and load
balancing.

Original work on CARP done by Michael Shalayeff, with many
additions by Marco Pfatschbacher and Ryan McBride.

FreeBSD port done solely by Max Laier.

Patch by:	mlaier
Obtained from:	OpenBSD (mickey, mcbride)
2005-02-22 13:04:05 +00:00
Bosko Milekic
099a0e588c Bring in mbuma to replace mballoc.
mbuma is an Mbuf & Cluster allocator built on top of a number of
extensions to the UMA framework, all included herein.

Extensions to UMA worth noting:
  - Better layering between slab <-> zone caches; introduce
    Keg structure which splits off slab cache away from the
    zone structure and allows multiple zones to be stacked
    on top of a single Keg (single type of slab cache);
    perhaps we should look into defining a subset API on
    top of the Keg for special use by malloc(9),
    for example.
  - UMA_ZONE_REFCNT zones can now be added, and reference
    counters automagically allocated for them within the end
    of the associated slab structures.  uma_find_refcnt()
    does a kextract to fetch the slab struct reference from
    the underlying page, and lookup the corresponding refcnt.

mbuma things worth noting:
  - integrates mbuf & cluster allocations with extended UMA
    and provides caches for commonly-allocated items; defines
    several zones (two primary, one secondary) and two kegs.
  - change up certain code paths that always used to do:
    m_get() + m_clget() to instead just use m_getcl() and
    try to take advantage of the newly defined secondary
    Packet zone.
  - netstat(1) and systat(1) quickly hacked up to do basic
    stat reporting but additional stats work needs to be
    done once some other details within UMA have been taken
    care of and it becomes clearer to how stats will work
    within the modified framework.

From the user perspective, one implication is that the
NMBCLUSTERS compile-time option is no longer used.  The
maximum number of clusters is still capped off according
to maxusers, but it can be made unlimited by setting
the kern.ipc.nmbclusters boot-time tunable to zero.
Work should be done to write an appropriate sysctl
handler allowing dynamic tuning of kern.ipc.nmbclusters
at runtime.

Additional things worth noting/known issues (READ):
   - One report of 'ips' (ServeRAID) driver acting really
     slow in conjunction with mbuma.  Need more data.
     Latest report is that ips is equally sucking with
     and without mbuma.
   - Giant leak in NFS code sometimes occurs, can't
     reproduce but currently analyzing; brueffer is
     able to reproduce but THIS IS NOT an mbuma-specific
     problem and currently occurs even WITHOUT mbuma.
   - Issues in network locking: there is at least one
     code path in the rip code where one or more locks
     are acquired and we end up in m_prepend() with
     M_WAITOK, which causes WITNESS to whine from within
     UMA.  Current temporary solution: force all UMA
     allocations to be M_NOWAIT from within UMA for now
     to avoid deadlocks unless WITNESS is defined and we
     can determine with certainty that we're not holding
     any locks when we're M_WAITOK.
   - I've seen at least one weird socketbuffer empty-but-
     mbuf-still-attached panic.  I don't believe this
     to be related to mbuma but please keep your eyes
     open, turn on debugging, and capture crash dumps.

This change removes more code than it adds.

A paper is available detailing the change and considering
various performance issues, it was presented at BSDCan2004:
http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf
Please read the paper for Future Work and implementation
details, as well as credits.

Testing and Debugging:
    rwatson,
    brueffer,
    Ketrien I. Saihr-Kesenchedra,
    ...
Reviewed by: Lots of people (for different parts)
2004-05-31 21:46:06 +00:00