2283 Commits

Author SHA1 Message Date
qingli
9d72a61c72 Make this file compile on IPv6 kernels. 2008-04-13 23:04:46 +00:00
phk
8988da494e Make this compile also on non-IPv6 kernels. 2008-04-13 21:38:05 +00:00
bz
bf956f3742 Fix the build in case RADIX_MPATH is not defined. 2008-04-13 10:22:59 +00:00
qingli
c75b6e5dea These files handle the radix tree for the ECMP routes.
The original code from KAME did not take care of address
aliases or multiple ip addresses that have the same
prefix.

Reviewed by:	rwatson, gnn, sam, kmacy, julian
2008-04-13 06:12:13 +00:00
qingli
4e8901ea7a This patch provides the back end support for equal-cost multi-path
(ECMP) for both IPv4 and IPv6. Previously, multipath route insertion
is disallowed. For example,

	route add -net 192.103.54.0/24 10.9.44.1
	route add -net 192.103.54.0/24 10.9.44.2

The second route insertion will trigger an error message of
"add net 192.103.54.0/24: gateway 10.2.5.2: route already in table"

Multiple default routes can also be inserted. Here is the netstat
output:

default		10.2.5.1	UGS	0	3074	bge0 =>
default		10.2.5.2	UGS	0	0	bge0

When multipath routes exist, the "route delete" command requires
a specific gateway to be specified or else an error message would
be displayed. For example,

	route delete default

would fail and trigger the following error message:

"route: writing to routing socket: No such process"
"delete net default: not in table"

On the other hand,

	route delete default 10.2.5.2

would be successful: "delete net default: gateway 10.2.5.2"

One does not have to specify a gateway if there is only a single
route for a particular destination.

I need to perform more testings on address aliases and multiple
interfaces that have the same IP prefixes. This patch as it
stands today is not yet ready for prime time. Therefore, the ECMP
code fragments are fully guarded by the RADIX_MPATH macro.
Include the "options  RADIX_MPATH" in the kernel configuration
to enable this feature.

Reviewed by:	robert, sam, gnn, julian, kmacy
2008-04-13 05:45:14 +00:00
rwatson
6d3db5778b Maintain and observe a ZBUF_FLAG_IMMUTABLE flag on zero-copy BPF
buffer kernel descriptors, which is used to allow the buffer
currently in the BPF "store" position to be assigned to userspace
when it fills, even if userspace hasn't acknowledged the buffer
in the "hold" position yet.  To implement this, notify the buffer
model when a buffer becomes full, and check that the store buffer
is writable, not just for it being full, before trying to append
new packet data.  Shared memory buffers will be assigned to
userspace at most once per fill, be it in the store or in the
hold position.

This removes the restriction that at most one shared memory can
by owned by userspace, reducing the chances that userspace will
need to call select() after acknowledging one buffer in order to
wait for the next buffer when under high load.  This more fully
realizes the goal of zero system calls in order to process a
high-speed packet stream from BPF.

Update bpf.4 to reflect that both buffers may be owned by userspace
at once; caution against assuming this.
2008-04-07 02:51:00 +00:00
rwatson
d566093a40 Coerce if_loop.c in the general direction of style(9):
- Use ANSI function declarations
- Remove use of 'register' keyword
- Prefer style(9) return parens, white space

MFC after:	1 month
2008-04-07 01:43:30 +00:00
iedowse
8b81d719e1 Add IFF_NEEDSGIANT to IFF_CANTCHANGE, to prevent user-level code
from clearing the IFF_NEEDSGIANT flag on Giant-locked interfaces.
In particular, wpa_supplicant was doing this on USB interfaces,
causing panics when Giant-locked code was then called without Giant.

Submitted by:	Alexey Popov
Reviewed by:	rwatson
MFC after:	3 days
2008-03-27 18:02:30 +00:00
rwatson
61a4ef5ea0 Add a comment explaining that we initialize the 'a' buffer for
zero-copy to the store buffer position on the BPF descriptor,
and the 'b' buffer as the free buffer in order to fill them in
the order documented in bpf(4).

MFC after:	4 months
Suggested by:	csjp
2008-03-26 21:29:13 +00:00
sam
d5c642ca44 expose if_purgemaddrs, it will be used by the vap code unless someone
redesigns the mcast support code in the next few weeks

MFC after:	3 weeks
2008-03-25 21:23:32 +00:00
sam
12e7d1940e IFM_IEEE80211_IBSSMASTER hasn't been used in many years; replace it
with IFM_IEEE80211_WDS which will be used by the forthcoming vap code

MFC after:	3 weeks
2008-03-25 21:22:43 +00:00
ru
3b1bf8c2e9 Replaced the misleading uses of a historical artefact M_TRYWAIT with M_WAIT.
Removed dead code that assumed that M_TRYWAIT can return NULL; it's not true
since the advent of MBUMA.

Reviewed by:	arch

There are ongoing disputes as to whether we want to switch to directly using
UMA flags M_WAITOK/M_NOWAIT for mbuf(9) allocation.
2008-03-25 09:39:02 +00:00
rwatson
59900e5206 Check for a NULL free buffer pointer in BPF before invoking
bpf_canfreebuf() in order to avoid potentially calling a non-inlinable
but trivial function in zero-copy buffer mode for every packet
received when we couldn't free the buffer anyway.

MFC after:	4 months
2008-03-25 07:41:33 +00:00
jkim
e4afcc95a9 Fix build with option BPF_JITTER. 2008-03-24 22:21:32 +00:00
jkim
f1bce3f01d Remove redundant inclusions of net/bpfdesc.h. 2008-03-24 22:16:46 +00:00
csjp
310e3f93dd Introduce support for zero-copy BPF buffering, which reduces the
overhead of packet capture by allowing a user process to directly "loan"
buffer memory to the kernel rather than using read(2) to explicitly copy
data from kernel address space.

The user process will issue new BPF ioctls to set the shared memory
buffer mode and provide pointers to buffers and their size. The kernel
then wires and maps the pages into kernel address space using sf_buf(9),
which on supporting architectures will use the direct map region. The
current "buffered" access mode remains the default, and support for
zero-copy buffers must, for the time being, be explicitly enabled using
a sysctl for the kernel to accept requests to use it.

The kernel and user process synchronize use of the buffers with atomic
operations, avoiding the need for system calls under load; the user
process may use select()/poll()/kqueue() to manage blocking while
waiting for network data if the user process is able to consume data
faster than the kernel generates it. Patchs to libpcap are available
to allow libpcap applications to transparently take advantage of this
support. Detailed information on the new API may be found in bpf(4),
including specific atomic operations and memory barriers required to
synchronize buffer use safely.

These changes modify the base BPF implementation to (roughly) abstrac
the current buffer model, allowing the new shared memory model to be
added, and add new monitoring statistics for netstat to print. The
implementation, with the exception of some monitoring hanges that break
the netstat monitoring ABI for BPF, will be MFC'd.

Zerocopy bpf buffers are still considered experimental are disabled
by default. To experiment with this new facility, adjust the
net.bpf.zerocopy_enable sysctl variable to 1.

Changes to libpcap will be made available as a patch for the time being,
and further refinements to the implementation are expected.

Sponsored by:		Seccuris Inc.
In collaboration with:	rwatson
Tested by:		pwood, gallatin
MFC after:		4 months [1]

[1] Certain portions will probably not be MFCed, specifically things
    that can break the monitoring ABI.
2008-03-24 13:49:17 +00:00
kmacy
56b72c6a35 back out last change as Sam believes that it breaks multicast - need to revisit after following up with pyun 2008-03-20 06:19:34 +00:00
kmacy
8b4fc7299f Don't re-initialize the interface if it is already running.
This one line change makes the following code found in many ethernet device drivers
(at least em, igb, ixgbe, and cxgb) gratuitous

	case SIOCSIFADDR:
		if (ifa->ifa_addr->sa_family == AF_INET) {
			/*
			 * XXX
			 * Since resetting hardware takes a very long time
			 * and results in link renegotiation we only
			 * initialize the hardware only when it is absolutely
			 * required.
			 */
			ifp->if_flags |= IFF_UP;
			if (!(ifp->if_drv_flags & IFF_DRV_RUNNING)) {
				EM_CORE_LOCK(adapter);
				em_init_locked(adapter);
				EM_CORE_UNLOCK(adapter);
			}
			arp_ifinit(ifp, ifa);
		} else
			error = ether_ioctl(ifp, command, data);
		break;
2008-03-20 05:35:02 +00:00
julian
b8355c8260 Replace really convoluted code that simplifies to "a ^= 0x01;" 2008-03-19 22:29:11 +00:00
thompsa
bcbf813313 Remove extra semicolons.
Pointed out by:		antoine
2008-03-17 01:26:44 +00:00
thompsa
bc8c8477a0 Switch the LACP state machine over to its own mutex to protect the internals,
this means that it no longer grabs the lagg rwlock. Use two port table arrays
which list the active ports for Tx and switch between them with an atomic op.
Now the lagg rwlock is only exclusively locked for management (ioctls) and
queuing of lacp control frames isnt needed.
2008-03-16 19:25:30 +00:00
rwatson
877d7c65ba In keeping with style(9)'s recommendations on macros, use a ';'
after each SYSINIT() macro invocation.  This makes a number of
lightweight C parsers much happier with the FreeBSD kernel
source, including cflow's prcc and lxr.

MFC after:	1 month
Discussed with:	imp, rink
2008-03-16 10:58:09 +00:00
rwatson
30b72f7d93 Improve convergence of bpf_filter.c toward style(9).
MFC after:	3 weeks
Submitted by:	csjp
2008-03-09 21:13:43 +00:00
rwatson
763a53ea3e Move IFF_NEEDSGIANT warning from if_ethersubr.c to if.c so it is displayed
for all network interfaces, not just ethernet-like ones.

Upgrade it to a louder WARNING and be explicit that the flag is obsolete.
Support for IFF_NEEDSGIANT will be removed in a few months (see arch@ for
details) and will not appear in 8.0.

Upgrade if_watchdog to a WARNING.
2008-03-07 16:00:44 +00:00
thompsa
13c68a6bcd Improve EtherIP interaction with the bridge
- Set M_BCAST|M_MCAST for incoming frames
 - Send the frame to a local interface if the bridge returns the mbuf

Submitted by:	Eugene Grosbein
Tested by:	Boris Kochergin
2008-03-06 19:02:37 +00:00
jhb
f3a2cbebdb Use RTFREE_LOCKED() instead of rtfree() when releasing a reference on the
'rt' route in rtredirect() as 'rt' is always locked.

MFC after:	1 week
PR:		kern/117913
Submitted by:	Stefan Lambrev  stefan.lambrev of moneybookers.com
2008-02-13 16:57:58 +00:00
rwatson
b3df56a7f6 Add comment that bpfread() has multi-threading issues.
Fix minor white space nit.
2008-02-02 20:35:05 +00:00
thompsa
2d8d5733ef Remove a chunk of duplicated code, test the destination address against the
bridge the same way we check member interfaces.
2008-01-18 09:34:09 +00:00
thompsa
f99e03f7fe IEEE 802.1D-2004 states, frames containing any of the group MAC Addresses
specified in Table 7-10 in their destination address field shall not be relayed
by the Bridge. Add a check in bridge_forward() to adhere to this.

PR:		kern/119744
2008-01-18 00:19:10 +00:00
thompsa
2ba124325f Sync from OpenBSD r1.118, nuke clause 3 & 4. 2008-01-17 09:46:16 +00:00
rwatson
72360ebb8e Update netisr comment for the SMPng world order: netisr is no longer
implemented using the ISR facility, and cannot be triggered by calling
splnet()/splx().

MFC after:	3 weeks
2007-12-31 20:58:50 +00:00
thompsa
93319cc102 Pass any unmatched slowprotocols frames up the stack instead of dropping them,
there are more subtypes than just LACP.
2007-12-31 01:16:35 +00:00
mux
58da8ec521 Add a workaround for a deadlock between the rt_setgate() and rt_check()
functions.  It is easily triggered by running routed, and, I expect, by
running any other daemon that uses routing sockets.

Reviewed by:	net@
MFC after:	1 week
2007-12-27 10:00:57 +00:00
thompsa
19ae9a5e77 Fix a panic where if the mbuf was consumed by the filter for requeueing
(dummynet), ipsec_filter() would return the empty error code and the ipsec code
would continue to forward/deference the null mbuf.

Found by:       m0n0wall
Reviewed by:    bz
MFC after:      3 days
2007-12-26 08:41:58 +00:00
rwatson
940f7f2284 Use __FBSDID() in the kernel BPF implementation.
MFC after:	3 days
2007-12-25 13:24:02 +00:00
rwatson
d2a56e1c39 Remove trailing whitespace from lines in BPF.
MFC after:	3 days
2007-12-23 14:10:33 +00:00
thompsa
dbcd24960c Simplify the error handling and use the dereferenced sc->sc_ifp pointer. 2007-12-18 09:13:04 +00:00
thompsa
c0871871de When the bridge has an address and a packet comes in for it then drop it if the
link has been marked discarding by Spanning Tree. This would cause the bridge
to see duplicate packets to itself even if STP has correctly calculated the
topology and blocked redundant links.

Reported by:	trasz
Tested by:	trasz
MFC after:	3 days
2007-12-18 07:04:50 +00:00
thompsa
3b94f3069c - Use the macro to check the port status has it will also test if its
administratively down (!IFF_UP)
 - Use the same parameters to lagg_link_active() to get the backup port as in
   the output path, this didnt actually matter in practice as sc_primary is
   always the first on the port list.

MFC after:	3 days
2007-12-18 02:12:03 +00:00
thompsa
1321a9c9d1 Add myself to the copyright. 2007-12-17 18:49:44 +00:00
kmacy
5d9e84762f widen the routing event interface (arp update, redirect, and eventually pmtu change)
into separate functions

revert previous commit's changes to arpresolve and add a new interface
arpresolve2 which does arp resolution without an mbuf
2007-12-17 07:40:34 +00:00
kmacy
96bf4f5295 fix bonehead cut and paste error in last commit 2007-12-15 22:06:23 +00:00
kmacy
d568417c8a Create separate capability flags for TCP over IPv4 and TCP over IPv6 2007-12-15 21:01:48 +00:00
kmacy
755846c8cc add interface capability for TOE 2007-12-15 20:22:09 +00:00
kmacy
50706577a4 add interface for allowing consumers to register for ARP updates,
redirects, and path MTU changes

Reviewed by: silby
2007-12-12 20:53:25 +00:00
sam
08954540f8 Wake On Lan (WOL) infrastructure
Submitted by:	Stefan Sperling <stsp@stsp.name>
Reviewed by:	brooks
2007-12-10 02:31:00 +00:00
thompsa
228f20a1fe Fix spelling.
Obtained from:	OpenBSD
2007-12-09 20:47:12 +00:00
kmacy
12b5f9c8c9 Add padding for anticipated functionality
- vimage
 - TOE
 - multiq
 - host rtentry caching

Rename spare used by 80211 to if_llsoftc

Reviewed by: rwatson, gnn
MFC after: 1 day
2007-12-07 01:46:13 +00:00
julian
bad03ab89c No need to assert that a == b when we just set a = b. 2007-12-06 22:40:17 +00:00
thompsa
a3cd956d35 Support monitor mode where the frame is discarded after bpf and stats processing. 2007-12-05 00:42:28 +00:00