Commit Graph

7 Commits

Author SHA1 Message Date
jkim
cda13d6d86 Always embed pointer to BPF JIT function in BPF descriptor
to avoid inconsistency when opt_bpf.h is not included.

Reviewed by:	rwatson
Approved by:	re (rwatson)
2009-08-12 17:28:53 +00:00
rwatson
7b4344f11f Clarify some comments, fix some types, and rename ZBUF_FLAG_IMMUTABLE to
ZBUF_FLAG_ASSIGNED to make it clear why the buffer can't be written to:
it is assigned to userspace.
2009-03-07 10:21:37 +00:00
csjp
4f71d026f8 Make sure we are clearing the ZBUF_FLAG_IMMUTABLE any time a free buffer
is reclaimed by the kernel.  This fixes a bug resulted in the kernel
over writing packet data while user-space was still processing it when
zerocopy is enabled.  (Or a panic if invariants was enabled).

Discussed with:	rwatson
2008-07-05 20:11:28 +00:00
rwatson
6d3db5778b Maintain and observe a ZBUF_FLAG_IMMUTABLE flag on zero-copy BPF
buffer kernel descriptors, which is used to allow the buffer
currently in the BPF "store" position to be assigned to userspace
when it fills, even if userspace hasn't acknowledged the buffer
in the "hold" position yet.  To implement this, notify the buffer
model when a buffer becomes full, and check that the store buffer
is writable, not just for it being full, before trying to append
new packet data.  Shared memory buffers will be assigned to
userspace at most once per fill, be it in the store or in the
hold position.

This removes the restriction that at most one shared memory can
by owned by userspace, reducing the chances that userspace will
need to call select() after acknowledging one buffer in order to
wait for the next buffer when under high load.  This more fully
realizes the goal of zero system calls in order to process a
high-speed packet stream from BPF.

Update bpf.4 to reflect that both buffers may be owned by userspace
at once; caution against assuming this.
2008-04-07 02:51:00 +00:00
rwatson
61a4ef5ea0 Add a comment explaining that we initialize the 'a' buffer for
zero-copy to the store buffer position on the BPF descriptor,
and the 'b' buffer as the free buffer in order to fill them in
the order documented in bpf(4).

MFC after:	4 months
Suggested by:	csjp
2008-03-26 21:29:13 +00:00
jkim
e4afcc95a9 Fix build with option BPF_JITTER. 2008-03-24 22:21:32 +00:00
csjp
310e3f93dd Introduce support for zero-copy BPF buffering, which reduces the
overhead of packet capture by allowing a user process to directly "loan"
buffer memory to the kernel rather than using read(2) to explicitly copy
data from kernel address space.

The user process will issue new BPF ioctls to set the shared memory
buffer mode and provide pointers to buffers and their size. The kernel
then wires and maps the pages into kernel address space using sf_buf(9),
which on supporting architectures will use the direct map region. The
current "buffered" access mode remains the default, and support for
zero-copy buffers must, for the time being, be explicitly enabled using
a sysctl for the kernel to accept requests to use it.

The kernel and user process synchronize use of the buffers with atomic
operations, avoiding the need for system calls under load; the user
process may use select()/poll()/kqueue() to manage blocking while
waiting for network data if the user process is able to consume data
faster than the kernel generates it. Patchs to libpcap are available
to allow libpcap applications to transparently take advantage of this
support. Detailed information on the new API may be found in bpf(4),
including specific atomic operations and memory barriers required to
synchronize buffer use safely.

These changes modify the base BPF implementation to (roughly) abstrac
the current buffer model, allowing the new shared memory model to be
added, and add new monitoring statistics for netstat to print. The
implementation, with the exception of some monitoring hanges that break
the netstat monitoring ABI for BPF, will be MFC'd.

Zerocopy bpf buffers are still considered experimental are disabled
by default. To experiment with this new facility, adjust the
net.bpf.zerocopy_enable sysctl variable to 1.

Changes to libpcap will be made available as a patch for the time being,
and further refinements to the implementation are expected.

Sponsored by:		Seccuris Inc.
In collaboration with:	rwatson
Tested by:		pwood, gallatin
MFC after:		4 months [1]

[1] Certain portions will probably not be MFCed, specifically things
    that can break the monitoring ABI.
2008-03-24 13:49:17 +00:00