freebsd-nq/sys
Christian S.J. Peron 4d621040ff Introduce support for zero-copy BPF buffering, which reduces the
overhead of packet capture by allowing a user process to directly "loan"
buffer memory to the kernel rather than using read(2) to explicitly copy
data from kernel address space.

The user process will issue new BPF ioctls to set the shared memory
buffer mode and provide pointers to buffers and their size. The kernel
then wires and maps the pages into kernel address space using sf_buf(9),
which on supporting architectures will use the direct map region. The
current "buffered" access mode remains the default, and support for
zero-copy buffers must, for the time being, be explicitly enabled using
a sysctl for the kernel to accept requests to use it.

The kernel and user process synchronize use of the buffers with atomic
operations, avoiding the need for system calls under load; the user
process may use select()/poll()/kqueue() to manage blocking while
waiting for network data if the user process is able to consume data
faster than the kernel generates it. Patchs to libpcap are available
to allow libpcap applications to transparently take advantage of this
support. Detailed information on the new API may be found in bpf(4),
including specific atomic operations and memory barriers required to
synchronize buffer use safely.

These changes modify the base BPF implementation to (roughly) abstrac
the current buffer model, allowing the new shared memory model to be
added, and add new monitoring statistics for netstat to print. The
implementation, with the exception of some monitoring hanges that break
the netstat monitoring ABI for BPF, will be MFC'd.

Zerocopy bpf buffers are still considered experimental are disabled
by default. To experiment with this new facility, adjust the
net.bpf.zerocopy_enable sysctl variable to 1.

Changes to libpcap will be made available as a patch for the time being,
and further refinements to the implementation are expected.

Sponsored by:		Seccuris Inc.
In collaboration with:	rwatson
Tested by:		pwood, gallatin
MFC after:		4 months [1]

[1] Certain portions will probably not be MFCed, specifically things
    that can break the monitoring ABI.
2008-03-24 13:49:17 +00:00
..
amd64 First pass at (possibly futile) microoptimizing of cpu_switch. Results 2008-03-23 23:09:06 +00:00
arm We need to prototype _start() as well, as we use it to test if we're running 2008-03-22 20:34:07 +00:00
boot
bsm
cam
cddl
compat o Add stub support for some new futex operations, 2008-03-20 17:03:55 +00:00
conf Introduce support for zero-copy BPF buffering, which reduces the 2008-03-24 13:49:17 +00:00
contrib
crypto
ddb
dev remove unneccessary tcbinfo lock acquisitions - set tp to null affter calling enter_timewait as we no longer own the inpcb 2008-03-24 05:21:10 +00:00
fs - Complete part of the unfinished bufobj work by consistently using 2008-03-22 09:15:16 +00:00
gdb
geom Redefine G_PART_SCHEME_DECLARE() from populating a private linker set 2008-03-23 01:31:59 +00:00
gnu
i4b
i386 Prevent the overflow in the calculation of the next page directory. 2008-03-23 07:07:27 +00:00
ia64
isa
kern - Greatly simplify vget() by removing the guarantee that any new 2008-03-24 04:22:58 +00:00
libkern
modules Instead of making a single geom_part.ko module, make a module 2008-03-23 01:42:47 +00:00
net Introduce support for zero-copy BPF buffering, which reduces the 2008-03-24 13:49:17 +00:00
net80211
netatalk
netatm
netgraph
netinet Label inp as unused in the non-INVARIANTS case 2008-03-24 00:29:01 +00:00
netinet6
netipsec Add ';' missed with the SYSINIT changes. 2008-03-21 18:31:42 +00:00
netipx
netnatm
netncp
netsmb
nfs
nfs4client - Complete part of the unfinished bufobj work by consistently using 2008-03-22 09:15:16 +00:00
nfsclient - Complete part of the unfinished bufobj work by consistently using 2008-03-22 09:15:16 +00:00
nfsserver - Complete part of the unfinished bufobj work by consistently using 2008-03-22 09:15:16 +00:00
opencrypto
pc98
pccard
pci For MSI capable hardwares, enable MSI enable bit in RL_CFG2 2008-03-23 05:31:35 +00:00
powerpc
rpc
security
sparc64 Oops. Use atomic_add_long() for atomic_fetchadd_long() (not atomic_add_int()) 2008-03-19 07:27:24 +00:00
sun4v Oops. Use atomic_add_long() for atomic_fetchadd_long() (not atomic_add_int()) 2008-03-19 07:27:24 +00:00
sys - Remove an old comment; vnodes have been working without Giant for 2008-03-24 04:11:40 +00:00
tools
ufs Yield the cpu in the kernel while iterating the list of the 2008-03-23 13:45:24 +00:00
vm Do not dereference cdev->si_cdevsw, use the dev_refthread() to properly 2008-03-20 16:08:42 +00:00
Makefile