buffer kernel descriptors, which is used to allow the buffer
currently in the BPF "store" position to be assigned to userspace
when it fills, even if userspace hasn't acknowledged the buffer
in the "hold" position yet. To implement this, notify the buffer
model when a buffer becomes full, and check that the store buffer
is writable, not just for it being full, before trying to append
new packet data. Shared memory buffers will be assigned to
userspace at most once per fill, be it in the store or in the
hold position.
This removes the restriction that at most one shared memory can
by owned by userspace, reducing the chances that userspace will
need to call select() after acknowledging one buffer in order to
wait for the next buffer when under high load. This more fully
realizes the goal of zero system calls in order to process a
high-speed packet stream from BPF.
Update bpf.4 to reflect that both buffers may be owned by userspace
at once; caution against assuming this.
from clearing the IFF_NEEDSGIANT flag on Giant-locked interfaces.
In particular, wpa_supplicant was doing this on USB interfaces,
causing panics when Giant-locked code was then called without Giant.
Submitted by: Alexey Popov
Reviewed by: rwatson
MFC after: 3 days
zero-copy to the store buffer position on the BPF descriptor,
and the 'b' buffer as the free buffer in order to fill them in
the order documented in bpf(4).
MFC after: 4 months
Suggested by: csjp
Removed dead code that assumed that M_TRYWAIT can return NULL; it's not true
since the advent of MBUMA.
Reviewed by: arch
There are ongoing disputes as to whether we want to switch to directly using
UMA flags M_WAITOK/M_NOWAIT for mbuf(9) allocation.
bpf_canfreebuf() in order to avoid potentially calling a non-inlinable
but trivial function in zero-copy buffer mode for every packet
received when we couldn't free the buffer anyway.
MFC after: 4 months
overhead of packet capture by allowing a user process to directly "loan"
buffer memory to the kernel rather than using read(2) to explicitly copy
data from kernel address space.
The user process will issue new BPF ioctls to set the shared memory
buffer mode and provide pointers to buffers and their size. The kernel
then wires and maps the pages into kernel address space using sf_buf(9),
which on supporting architectures will use the direct map region. The
current "buffered" access mode remains the default, and support for
zero-copy buffers must, for the time being, be explicitly enabled using
a sysctl for the kernel to accept requests to use it.
The kernel and user process synchronize use of the buffers with atomic
operations, avoiding the need for system calls under load; the user
process may use select()/poll()/kqueue() to manage blocking while
waiting for network data if the user process is able to consume data
faster than the kernel generates it. Patchs to libpcap are available
to allow libpcap applications to transparently take advantage of this
support. Detailed information on the new API may be found in bpf(4),
including specific atomic operations and memory barriers required to
synchronize buffer use safely.
These changes modify the base BPF implementation to (roughly) abstrac
the current buffer model, allowing the new shared memory model to be
added, and add new monitoring statistics for netstat to print. The
implementation, with the exception of some monitoring hanges that break
the netstat monitoring ABI for BPF, will be MFC'd.
Zerocopy bpf buffers are still considered experimental are disabled
by default. To experiment with this new facility, adjust the
net.bpf.zerocopy_enable sysctl variable to 1.
Changes to libpcap will be made available as a patch for the time being,
and further refinements to the implementation are expected.
Sponsored by: Seccuris Inc.
In collaboration with: rwatson
Tested by: pwood, gallatin
MFC after: 4 months [1]
[1] Certain portions will probably not be MFCed, specifically things
that can break the monitoring ABI.
This one line change makes the following code found in many ethernet device drivers
(at least em, igb, ixgbe, and cxgb) gratuitous
case SIOCSIFADDR:
if (ifa->ifa_addr->sa_family == AF_INET) {
/*
* XXX
* Since resetting hardware takes a very long time
* and results in link renegotiation we only
* initialize the hardware only when it is absolutely
* required.
*/
ifp->if_flags |= IFF_UP;
if (!(ifp->if_drv_flags & IFF_DRV_RUNNING)) {
EM_CORE_LOCK(adapter);
em_init_locked(adapter);
EM_CORE_UNLOCK(adapter);
}
arp_ifinit(ifp, ifa);
} else
error = ether_ioctl(ifp, command, data);
break;
this means that it no longer grabs the lagg rwlock. Use two port table arrays
which list the active ports for Tx and switch between them with an atomic op.
Now the lagg rwlock is only exclusively locked for management (ioctls) and
queuing of lacp control frames isnt needed.
after each SYSINIT() macro invocation. This makes a number of
lightweight C parsers much happier with the FreeBSD kernel
source, including cflow's prcc and lxr.
MFC after: 1 month
Discussed with: imp, rink
for all network interfaces, not just ethernet-like ones.
Upgrade it to a louder WARNING and be explicit that the flag is obsolete.
Support for IFF_NEEDSGIANT will be removed in a few months (see arch@ for
details) and will not appear in 8.0.
Upgrade if_watchdog to a WARNING.
- Set M_BCAST|M_MCAST for incoming frames
- Send the frame to a local interface if the bridge returns the mbuf
Submitted by: Eugene Grosbein
Tested by: Boris Kochergin
specified in Table 7-10 in their destination address field shall not be relayed
by the Bridge. Add a check in bridge_forward() to adhere to this.
PR: kern/119744
functions. It is easily triggered by running routed, and, I expect, by
running any other daemon that uses routing sockets.
Reviewed by: net@
MFC after: 1 week
(dummynet), ipsec_filter() would return the empty error code and the ipsec code
would continue to forward/deference the null mbuf.
Found by: m0n0wall
Reviewed by: bz
MFC after: 3 days
link has been marked discarding by Spanning Tree. This would cause the bridge
to see duplicate packets to itself even if STP has correctly calculated the
topology and blocked redundant links.
Reported by: trasz
Tested by: trasz
MFC after: 3 days
administratively down (!IFF_UP)
- Use the same parameters to lagg_link_active() to get the backup port as in
the output path, this didnt actually matter in practice as sc_primary is
always the first on the port list.
MFC after: 3 days
bpf will see inner and outer headers or just inner or outer
headers for incoming and outgoing IPsec packets.
This is useful in bpf to not have over long lines for debugging
or selcting packets based on the inner headers.
It also properly defines the behavior of what the firewalls see.
Last but not least it gives you if_enc(4) for IPv6 as well.
[ As some auxiliary state was not available in the later
input path we save it in the tdbi. That way tcpdump can give a
consistent view of either of (authentic,confidential) for both
before and after states. ]
Discussed with: thompsa (2007-04-25, basic idea of unifying paths)
Reviewed by: thompsa, gnn
This has the benefit that rmlocks have proper support for reader recursion
(in contrast to rwlock(9) which could potential lead to writer stravation).
It also means a significant performance gain, eventhough only visible in
microbenchmarks at the moment.
Discussed on: -arch, -net
as up if at least one of its ports also has a link up. This fixes using
carp+lagg together and any other system that relies on linkstate events.
PR: kern/113956
MFC after: 3 days
2) Alter packet flow inside dummynet: allow certain packets to bypass
dummynet scheduler. Benefits are:
- lower latency: if packet flow does not exceed pipe bandwidth, packets
will not be (up to tick) delayed (due to dummynet's scheduler granularity).
- lower overhead: if packet avoids dummynet scheduler it shouldn't reenter ip
stack later. Such packets can be fastforwarded.
- recursion (which can lead to kernel stack exhaution) eliminated. This fix
long existed panic, which can be triggered this way:
kldload dummynet
sysctl net.inet.ip.fw.one_pass=0
ipfw pipe 1 config bw 0
for i in `jot 30`; do ipfw add 1 pipe 1 icmp from any to any; done
ping -c 1 localhost
3) Three new sysctl nodes are added:
net.inet.ip.dummynet.io_pkt - packets passed to dummynet
net.inet.ip.dummynet.io_pkt_fast - packets avoided dummynet scheduler
net.inet.ip.dummynet.io_pkt_drop - packets dropped by dummynet
P.S. Above comments are true only for layer 3 packets. Layer 2 packet flow
is not changed yet.
MFC after: 3 month