Commit Graph

2175 Commits

Author SHA1 Message Date
Robert Watson
a7a91e6592 Maintain and observe a ZBUF_FLAG_IMMUTABLE flag on zero-copy BPF
buffer kernel descriptors, which is used to allow the buffer
currently in the BPF "store" position to be assigned to userspace
when it fills, even if userspace hasn't acknowledged the buffer
in the "hold" position yet.  To implement this, notify the buffer
model when a buffer becomes full, and check that the store buffer
is writable, not just for it being full, before trying to append
new packet data.  Shared memory buffers will be assigned to
userspace at most once per fill, be it in the store or in the
hold position.

This removes the restriction that at most one shared memory can
by owned by userspace, reducing the chances that userspace will
need to call select() after acknowledging one buffer in order to
wait for the next buffer when under high load.  This more fully
realizes the goal of zero system calls in order to process a
high-speed packet stream from BPF.

Update bpf.4 to reflect that both buffers may be owned by userspace
at once; caution against assuming this.
2008-04-07 02:51:00 +00:00
Robert Watson
08304c1617 Coerce if_loop.c in the general direction of style(9):
- Use ANSI function declarations
- Remove use of 'register' keyword
- Prefer style(9) return parens, white space

MFC after:	1 month
2008-04-07 01:43:30 +00:00
Ian Dowse
f5f1525321 Add IFF_NEEDSGIANT to IFF_CANTCHANGE, to prevent user-level code
from clearing the IFF_NEEDSGIANT flag on Giant-locked interfaces.
In particular, wpa_supplicant was doing this on USB interfaces,
causing panics when Giant-locked code was then called without Giant.

Submitted by:	Alexey Popov
Reviewed by:	rwatson
MFC after:	3 days
2008-03-27 18:02:30 +00:00
Robert Watson
61e175d59d Add a comment explaining that we initialize the 'a' buffer for
zero-copy to the store buffer position on the BPF descriptor,
and the 'b' buffer as the free buffer in order to fill them in
the order documented in bpf(4).

MFC after:	4 months
Suggested by:	csjp
2008-03-26 21:29:13 +00:00
Sam Leffler
fb27dd1db3 expose if_purgemaddrs, it will be used by the vap code unless someone
redesigns the mcast support code in the next few weeks

MFC after:	3 weeks
2008-03-25 21:23:32 +00:00
Sam Leffler
acaf1de6db IFM_IEEE80211_IBSSMASTER hasn't been used in many years; replace it
with IFM_IEEE80211_WDS which will be used by the forthcoming vap code

MFC after:	3 weeks
2008-03-25 21:22:43 +00:00
Ruslan Ermilov
ea26d58729 Replaced the misleading uses of a historical artefact M_TRYWAIT with M_WAIT.
Removed dead code that assumed that M_TRYWAIT can return NULL; it's not true
since the advent of MBUMA.

Reviewed by:	arch

There are ongoing disputes as to whether we want to switch to directly using
UMA flags M_WAITOK/M_NOWAIT for mbuf(9) allocation.
2008-03-25 09:39:02 +00:00
Robert Watson
fa0c2b3474 Check for a NULL free buffer pointer in BPF before invoking
bpf_canfreebuf() in order to avoid potentially calling a non-inlinable
but trivial function in zero-copy buffer mode for every packet
received when we couldn't free the buffer anyway.

MFC after:	4 months
2008-03-25 07:41:33 +00:00
Jung-uk Kim
b83a219e9b Fix build with option BPF_JITTER. 2008-03-24 22:21:32 +00:00
Jung-uk Kim
892547230b Remove redundant inclusions of net/bpfdesc.h. 2008-03-24 22:16:46 +00:00
Christian S.J. Peron
4d621040ff Introduce support for zero-copy BPF buffering, which reduces the
overhead of packet capture by allowing a user process to directly "loan"
buffer memory to the kernel rather than using read(2) to explicitly copy
data from kernel address space.

The user process will issue new BPF ioctls to set the shared memory
buffer mode and provide pointers to buffers and their size. The kernel
then wires and maps the pages into kernel address space using sf_buf(9),
which on supporting architectures will use the direct map region. The
current "buffered" access mode remains the default, and support for
zero-copy buffers must, for the time being, be explicitly enabled using
a sysctl for the kernel to accept requests to use it.

The kernel and user process synchronize use of the buffers with atomic
operations, avoiding the need for system calls under load; the user
process may use select()/poll()/kqueue() to manage blocking while
waiting for network data if the user process is able to consume data
faster than the kernel generates it. Patchs to libpcap are available
to allow libpcap applications to transparently take advantage of this
support. Detailed information on the new API may be found in bpf(4),
including specific atomic operations and memory barriers required to
synchronize buffer use safely.

These changes modify the base BPF implementation to (roughly) abstrac
the current buffer model, allowing the new shared memory model to be
added, and add new monitoring statistics for netstat to print. The
implementation, with the exception of some monitoring hanges that break
the netstat monitoring ABI for BPF, will be MFC'd.

Zerocopy bpf buffers are still considered experimental are disabled
by default. To experiment with this new facility, adjust the
net.bpf.zerocopy_enable sysctl variable to 1.

Changes to libpcap will be made available as a patch for the time being,
and further refinements to the implementation are expected.

Sponsored by:		Seccuris Inc.
In collaboration with:	rwatson
Tested by:		pwood, gallatin
MFC after:		4 months [1]

[1] Certain portions will probably not be MFCed, specifically things
    that can break the monitoring ABI.
2008-03-24 13:49:17 +00:00
Kip Macy
879773c18b back out last change as Sam believes that it breaks multicast - need to revisit after following up with pyun 2008-03-20 06:19:34 +00:00
Kip Macy
83631568fe Don't re-initialize the interface if it is already running.
This one line change makes the following code found in many ethernet device drivers
(at least em, igb, ixgbe, and cxgb) gratuitous

	case SIOCSIFADDR:
		if (ifa->ifa_addr->sa_family == AF_INET) {
			/*
			 * XXX
			 * Since resetting hardware takes a very long time
			 * and results in link renegotiation we only
			 * initialize the hardware only when it is absolutely
			 * required.
			 */
			ifp->if_flags |= IFF_UP;
			if (!(ifp->if_drv_flags & IFF_DRV_RUNNING)) {
				EM_CORE_LOCK(adapter);
				em_init_locked(adapter);
				EM_CORE_UNLOCK(adapter);
			}
			arp_ifinit(ifp, ifa);
		} else
			error = ether_ioctl(ifp, command, data);
		break;
2008-03-20 05:35:02 +00:00
Julian Elischer
29481f8846 Replace really convoluted code that simplifies to "a ^= 0x01;" 2008-03-19 22:29:11 +00:00
Andrew Thompson
69f04a828c Remove extra semicolons.
Pointed out by:		antoine
2008-03-17 01:26:44 +00:00
Andrew Thompson
3de1800850 Switch the LACP state machine over to its own mutex to protect the internals,
this means that it no longer grabs the lagg rwlock. Use two port table arrays
which list the active ports for Tx and switch between them with an atomic op.
Now the lagg rwlock is only exclusively locked for management (ioctls) and
queuing of lacp control frames isnt needed.
2008-03-16 19:25:30 +00:00
Robert Watson
237fdd787b In keeping with style(9)'s recommendations on macros, use a ';'
after each SYSINIT() macro invocation.  This makes a number of
lightweight C parsers much happier with the FreeBSD kernel
source, including cflow's prcc and lxr.

MFC after:	1 month
Discussed with:	imp, rink
2008-03-16 10:58:09 +00:00
Robert Watson
23a0c23034 Improve convergence of bpf_filter.c toward style(9).
MFC after:	3 weeks
Submitted by:	csjp
2008-03-09 21:13:43 +00:00
Robert Watson
b9175c4556 Move IFF_NEEDSGIANT warning from if_ethersubr.c to if.c so it is displayed
for all network interfaces, not just ethernet-like ones.

Upgrade it to a louder WARNING and be explicit that the flag is obsolete.
Support for IFF_NEEDSGIANT will be removed in a few months (see arch@ for
details) and will not appear in 8.0.

Upgrade if_watchdog to a WARNING.
2008-03-07 16:00:44 +00:00
Andrew Thompson
56abdd3350 Improve EtherIP interaction with the bridge
- Set M_BCAST|M_MCAST for incoming frames
 - Send the frame to a local interface if the bridge returns the mbuf

Submitted by:	Eugene Grosbein
Tested by:	Boris Kochergin
2008-03-06 19:02:37 +00:00
John Baldwin
1951e633c4 Use RTFREE_LOCKED() instead of rtfree() when releasing a reference on the
'rt' route in rtredirect() as 'rt' is always locked.

MFC after:	1 week
PR:		kern/117913
Submitted by:	Stefan Lambrev  stefan.lambrev of moneybookers.com
2008-02-13 16:57:58 +00:00
Robert Watson
31b32e6dc3 Add comment that bpfread() has multi-threading issues.
Fix minor white space nit.
2008-02-02 20:35:05 +00:00
Andrew Thompson
fdf229b124 Remove a chunk of duplicated code, test the destination address against the
bridge the same way we check member interfaces.
2008-01-18 09:34:09 +00:00
Andrew Thompson
905925d349 IEEE 802.1D-2004 states, frames containing any of the group MAC Addresses
specified in Table 7-10 in their destination address field shall not be relayed
by the Bridge. Add a check in bridge_forward() to adhere to this.

PR:		kern/119744
2008-01-18 00:19:10 +00:00
Andrew Thompson
eaf56834f1 Sync from OpenBSD r1.118, nuke clause 3 & 4. 2008-01-17 09:46:16 +00:00
Robert Watson
315f04614c Update netisr comment for the SMPng world order: netisr is no longer
implemented using the ISR facility, and cannot be triggered by calling
splnet()/splx().

MFC after:	3 weeks
2007-12-31 20:58:50 +00:00
Andrew Thompson
af0084c92e Pass any unmatched slowprotocols frames up the stack instead of dropping them,
there are more subtypes than just LACP.
2007-12-31 01:16:35 +00:00
Maxime Henrion
f321ff1561 Add a workaround for a deadlock between the rt_setgate() and rt_check()
functions.  It is easily triggered by running routed, and, I expect, by
running any other daemon that uses routing sockets.

Reviewed by:	net@
MFC after:	1 week
2007-12-27 10:00:57 +00:00
Andrew Thompson
e361d7d421 Fix a panic where if the mbuf was consumed by the filter for requeueing
(dummynet), ipsec_filter() would return the empty error code and the ipsec code
would continue to forward/deference the null mbuf.

Found by:       m0n0wall
Reviewed by:    bz
MFC after:      3 days
2007-12-26 08:41:58 +00:00
Robert Watson
c786600793 Use __FBSDID() in the kernel BPF implementation.
MFC after:	3 days
2007-12-25 13:24:02 +00:00
Robert Watson
2a0a392e1c Remove trailing whitespace from lines in BPF.
MFC after:	3 days
2007-12-23 14:10:33 +00:00
Andrew Thompson
8411d52a93 Simplify the error handling and use the dereferenced sc->sc_ifp pointer. 2007-12-18 09:13:04 +00:00
Andrew Thompson
155f68d1aa When the bridge has an address and a packet comes in for it then drop it if the
link has been marked discarding by Spanning Tree. This would cause the bridge
to see duplicate packets to itself even if STP has correctly calculated the
topology and blocked redundant links.

Reported by:	trasz
Tested by:	trasz
MFC after:	3 days
2007-12-18 07:04:50 +00:00
Andrew Thompson
1f019d8381 - Use the macro to check the port status has it will also test if its
administratively down (!IFF_UP)
 - Use the same parameters to lagg_link_active() to get the backup port as in
   the output path, this didnt actually matter in practice as sc_primary is
   always the first on the port list.

MFC after:	3 days
2007-12-18 02:12:03 +00:00
Andrew Thompson
f51133ee3f Add myself to the copyright. 2007-12-17 18:49:44 +00:00
Kip Macy
29910a5a77 widen the routing event interface (arp update, redirect, and eventually pmtu change)
into separate functions

revert previous commit's changes to arpresolve and add a new interface
arpresolve2 which does arp resolution without an mbuf
2007-12-17 07:40:34 +00:00
Kip Macy
4c908c35e0 fix bonehead cut and paste error in last commit 2007-12-15 22:06:23 +00:00
Kip Macy
a0d231fbb8 Create separate capability flags for TCP over IPv4 and TCP over IPv6 2007-12-15 21:01:48 +00:00
Kip Macy
835a6f1230 add interface capability for TOE 2007-12-15 20:22:09 +00:00
Kip Macy
8e7e854cd6 add interface for allowing consumers to register for ARP updates,
redirects, and path MTU changes

Reviewed by: silby
2007-12-12 20:53:25 +00:00
Sam Leffler
de0abf19ba Wake On Lan (WOL) infrastructure
Submitted by:	Stefan Sperling <stsp@stsp.name>
Reviewed by:	brooks
2007-12-10 02:31:00 +00:00
Andrew Thompson
9ddd3624d9 Fix spelling.
Obtained from:	OpenBSD
2007-12-09 20:47:12 +00:00
Kip Macy
2de2af32a0 Add padding for anticipated functionality
- vimage
 - TOE
 - multiq
 - host rtentry caching

Rename spare used by 80211 to if_llsoftc

Reviewed by: rwatson, gnn
MFC after: 1 day
2007-12-07 01:46:13 +00:00
Julian Elischer
bf3ce91a99 No need to assert that a == b when we just set a = b. 2007-12-06 22:40:17 +00:00
Andrew Thompson
d3b28963dc Support monitor mode where the frame is discarded after bpf and stats processing. 2007-12-05 00:42:28 +00:00
Bjoern A. Zeeb
19ad9831df Add sysctls to if_enc(4) to control whether the firewalls or
bpf will see inner and outer headers or just inner or outer
headers for incoming and outgoing IPsec packets.

This is useful in bpf to not have over long lines for debugging
or selcting packets based on the inner headers.
It also properly defines the behavior of what the firewalls see.

Last but not least it gives you if_enc(4) for IPv6 as well.

[ As some auxiliary state was not available in the later
  input path we save it in the tdbi. That way tcpdump can give a
  consistent view of either of (authentic,confidential) for both
  before and after states. ]

Discussed with:	thompsa (2007-04-25, basic idea of unifying paths)
Reviewed by:	thompsa, gnn
2007-11-28 22:33:53 +00:00
Max Laier
1030a1a9cb pfil(9) locking take 3: Switch to rmlock(9)
This has the benefit that rmlocks have proper support for reader recursion
(in contrast to rwlock(9) which could potential lead to writer stravation).
It also means a significant performance gain, eventhough only visible in
microbenchmarks at the moment.

Discussed on:	-arch, -net
2007-11-25 12:41:47 +00:00
Andrew Thompson
80ddfb40e4 Have the lagg interface generate link up/down events, the interface is marked
as up if at least one of its ports also has a link up. This fixes using
carp+lagg together and any other system that relies on linkstate events.

PR:		kern/113956
MFC after:	3 days
2007-11-25 06:30:46 +00:00
Andrew Thompson
5c0d5fddf5 Use the safer callout_init_rw() to allow the softclock to grab the
rwlock for us.
2007-11-21 05:28:49 +00:00
Oleg Bulyzhin
897c0f57d4 1) dummynet_io() declaration has changed.
2) Alter packet flow inside dummynet: allow certain packets to bypass
dummynet scheduler. Benefits are:

- lower latency: if packet flow does not exceed pipe bandwidth, packets
  will not be (up to tick) delayed (due to dummynet's scheduler granularity).
- lower overhead: if packet avoids dummynet scheduler it shouldn't reenter ip
  stack later. Such packets can be fastforwarded.
- recursion (which can lead to kernel stack exhaution) eliminated. This fix
  long existed panic, which can be triggered this way:
  	kldload dummynet
	sysctl net.inet.ip.fw.one_pass=0
	ipfw pipe 1 config bw 0
	for i in `jot 30`; do ipfw add 1 pipe 1 icmp from any to any; done
	ping -c 1 localhost

3) Three new sysctl nodes are added:
net.inet.ip.dummynet.io_pkt -		packets passed to dummynet
net.inet.ip.dummynet.io_pkt_fast - 	packets avoided dummynet scheduler
net.inet.ip.dummynet.io_pkt_drop -	packets dropped by dummynet

P.S. Above comments are true only for layer 3 packets. Layer 2 packet flow
     is not changed yet.

MFC after:	3 month
2007-11-06 23:01:42 +00:00