Commit Graph

2354 Commits

Author SHA1 Message Date
Christian S.J. Peron
ffeeb9241a Disable zerocopy by default for now. It's causing some problems in pcap
consumers which fork after the shared pages have been setup.  pflogd(8)
is an example.  The problem is understood and there is a fix coming in
shortly.

Folks who want to continue using it can do so by setting

net.bpf.zerocopy_enable

to 1.

Discussed with:	rwatson
2009-03-10 14:28:19 +00:00
Robert Watson
e82669d99b When resetting a BPF descriptor, properly check that zero-copy buffers
are not currently owned by userspace before clearing or rotating them.

Otherwise we may not play by the rules of the shared memory protocol,
potentially corrupting packet data or causing userspace applications
that are playing by the rules to spin due to being notified that a
buffer is complete but the shared memory header not reflecting that.

This behavior was seen with pflogd by a number of reporters; note that
this fix is not sufficient to get pflogd properly working with
zero-copy BPF, due to pflogd opening the BPF device before forking,
leading to the shared memory buffer not being propery inherited in the
privilege-separated child.  We're still deciding how to fix that
problem.

This change exposes buffer-model specific strategy information in
reset_d(), which will be fixed at a later date once we've decided how
best to improve the BPF buffer abstraction.

Reviewed by:	csjp
Reported by:	keramida
2009-03-07 22:17:44 +00:00
Marius Strobl
c89c8a1029 On architectures with strict alignment requirements compensate
the misalignment of the IP header that prepending the EtherIP
header might have caused.

PR:		131921
MFC after:	1 week
2009-03-07 19:08:58 +00:00
Christian S.J. Peron
927094113e Mark the bpf stats sysctl as being mpsafe. We do not require
Giant here.
2009-03-07 17:07:29 +00:00
Robert Watson
784cd896fc Clarify some comments, fix some types, and rename ZBUF_FLAG_IMMUTABLE to
ZBUF_FLAG_ASSIGNED to make it clear why the buffer can't be written to:
it is assigned to userspace.
2009-03-07 10:21:37 +00:00
Bruce M Simpson
1e98b429fe Reserve a netisr slot for the IGMPv3 output queue. 2009-03-04 02:54:11 +00:00
Christian S.J. Peron
f32b457b6e Switch the default buffer mode in bpf(4) to zero-copy buffers.
Discussed with:	rwatson
2009-03-02 19:42:01 +00:00
Robert Watson
3055e123d7 Do a bit of struct ifnet cleanup in preparation for 8.0: group function
pointers together, move padding to the bottom of the structure, and add
two new integer spares due to attrition over time.  Remove unused spare
"flags" field, we can use one of the spare ints if we need it later.

This change requires a rebuild of device driver modules that depend on
the layout of ifnet for binary compatibility reasons.

Discussed with:	kmacy
2009-03-01 12:42:54 +00:00
Bjoern A. Zeeb
2bebb49117 Add size-guards evaluated at compile-time to the main struct vnet_*
which are not in a module of their own like gif.

Single kernel compiles and universe will fail if the size of the struct
changes. Th expected values are given in sys/vimage.h.
See the comments where how to handle this.

Requested by:	peter
2009-03-01 11:01:00 +00:00
Bjoern A. Zeeb
33553d6e99 For all files including net/vnet.h directly include opt_route.h and
net/route.h.

Remove the hidden include of opt_route.h and net/route.h from net/vnet.h.

We need to make sure that both opt_route.h and net/route.h are included
before net/vnet.h because of the way MRT figures out the number of FIBs
from the kernel option. If we do not, we end up with the default number
of 1 when including net/vnet.h and array sizes are wrong.

This does not change the list of files which depend on opt_route.h
but we can identify them now more easily.
2009-02-27 14:12:05 +00:00
Luigi Rizzo
b27b816fc5 we need if_var.h not if.h 2009-02-16 15:10:03 +00:00
Luigi Rizzo
a1d4f19c0d remove unnecessary forward declaration 2009-02-16 15:09:37 +00:00
Robert Watson
d7bf9f090d IFF_NEEDSGIANT will no longer be supported, so remove compatibility code
from if_sppp framework for interfaces requiring Giant.
2009-02-16 10:29:03 +00:00
Luigi Rizzo
ada55ca0b7 remove unnecessary #include from vnet.h and vinet.h
Approved by:	Marko Zec
2009-02-15 00:28:28 +00:00
Andrew Thompson
66c84010b4 bridge_delete_member is called via the event handler from if_detach
after the LLADDR is reclaimed which causes a null pointer deref with
inherit_mac enabled. Record the ifnet pointer of the interface and then compare
that to find when to re-assign the bridge address.

Submitted by:	sam
2009-02-13 19:20:25 +00:00
Maxim Konovalov
96c8ef3a18 o In case of the error do not forget to deallocate a cloned device unit.
PR:		kern/131642
Submitted by:	Dmitrij Tejblum
MFC after:	1 week
2009-02-13 12:59:54 +00:00
Robert Watson
86c2bc49cc Remove unused ifaddr local variable in ioctl routine.
MFC after:	3 days
2009-02-13 00:01:11 +00:00
Jamie Gritton
9c79d2432e Call prison_if from rtm_get_jailed, instead of splitting it out into
prison_check_ip4 and prison_check_ip6.  As prison_if includes a jailed()
check, remove that check before calling rtm_get_jailed.

Approved by:	bz (mentor)
2009-02-05 14:58:16 +00:00
Jamie Gritton
b89e82dd87 Standardize the various prison_foo_ip[46] functions and prison_if to
return zero on success and an error code otherwise.  The possible errors
are EADDRNOTAVAIL if an address being checked for doesn't match the
prison, and EAFNOSUPPORT if the prison doesn't have any addresses in
that address family.  For most callers of these functions, use the
returned error code instead of e.g. a hard-coded EADDRNOTAVAIL or
EINVAL.

Always include a jailed() check in these functions, where a non-jailed
cred always returns success (and makes no changes).  Remove the explicit
jailed() checks that preceded many of the function calls.

Approved by:	bz (mentor)
2009-02-05 14:06:09 +00:00
Randall Stewart
2f4afd2125 Adds support for SCTP checksum offload. This means
we, like TCP and UDP, move the checksum calculation
into the IP routines when there is no hardware support
we call into the normal SCTP checksum routine.

The next round of SCTP updates will use
this functionality. Of course the IGB driver needs
a few updates to support the new intel controller set
that actually does SCTP csum offload too.

Reviewed by:	gnn, rwatson, kmacy
2009-02-03 11:00:43 +00:00
Bjoern A. Zeeb
2e730bea0a Like with r185713 make sure to not leak a lock as rtalloc1(9) returns
a locked route. Thus we have to use RTFREE_LOCKED(9) to get it unlocked
and rtfree(9)d rather than just rtfree(9)d.

Since the PR was filed, new places with the same problem were added
with new code.  Also check that the rt is valid before freeing it
either way there.

PR:		kern/129793
Submitted by:	Dheeraj Reddy <dheeraj@ece.gatech.edu>
MFC after:	2 weeks
Committed from:	Bugathon #6
2009-01-31 10:48:02 +00:00
Bjoern A. Zeeb
1cecba0fcd For consistency with prison_{local,remote,check}_ipN rename
prison_getipN to prison_get_ipN.

Submitted by:	jamie (as part of a larger patch)
MFC after:	1 week
2009-01-25 10:11:58 +00:00
John Baldwin
eb322a6f77 Only start the if_slowtimo timer (which drives the if_watchdog methods of
network interfaces) if we have at least one interface with an if_watchdog
routine.

MFC after:	2 weeks
2009-01-23 20:53:01 +00:00
Qing Li
82b334e80d The RTF_LLINFO was revived unconditionally, but within the kernel the
check on the sysctl argument value being RTF_LLINFO is conditioned on
the COMPAT_ROUTE_FLAGS kernel option. This mismatch caused the L2
table retrieval failure, and the arp/ndp -an command displays empty L2
tables.

Reviewed by:   pjd
2009-01-16 09:01:45 +00:00
Qing Li
14981d8057 Revive the RTF_LLINFO flag in route.h. The kernel code is guarded
by the new kernel option COMPAT_ROUTE_FLAGS for binary backward
compatibility. The RTF_LLDATA flag maps to the same value as RTF_LLINFO.
RTF_LLDATA is used by the arp and ndp utilities. The RTF_LLDATA flag is
always returned to the userland regardless whether the COMPAT_ROUTE_FLAGS
is defined.
2009-01-12 11:24:32 +00:00
Robert Watson
3dc85f8d63 Do invoke mac_ifnet_check_transmit() and mac_ifnet_create_mbuf()
in the loopback and synthetic loopback code so that packets are
access control checked and relabeled.  Previously, the MAC
Framework enforced that packets sent over the loopback weren't
relabeled, but this will allow policies to make explicit choices
about how and whether to relabel packets on the loopback.  Also,
for SIMPLEX devices, this produces more consistent behavior for
looped back packets to the local MAC address by labeling those
packets as coming from the interface.

Discussed with:	csjp
Obtained from:	TrustedBSD Project
2009-01-10 23:50:23 +00:00
Bjoern A. Zeeb
c2ded8aefb Rather than using the cred from curthread, take it from the thread
referenced in the sysctl req argument.

Reviewed by:	rwatson
MFC after:	2 weeks
2009-01-09 23:57:59 +00:00
Bjoern A. Zeeb
813dd6ae5e Restrict arp, ndp and theoretically the FIB listing (if not
read with libkvm) to the addresses of a prison, when inside a
jail. [1]
As the patch from the PR was pre-'new-arp', add checks to the
llt_dump handlers as well.

While touching RTM_GET in route_output(), consistently use
curthread credentials rather than the creds from the socket
there. [2]

PR:		kern/68189
Submitted by:	Mark Delany <sxcg2-fuwxj@qmda.emu.st> [1]
Discussed with:	rwatson [2]
Reviewed by:	rwatson
MFC after:	4 weeks
2009-01-09 21:57:49 +00:00
Bjoern A. Zeeb
ebda3fc380 Take the cred from curthread rather than curproc as curproc would need
locking but the credential from curthread (usually) never changes.

Discussed with:	jhb
MFC after:	2 weeks
2009-01-09 16:22:32 +00:00
Qing Li
a42ea597ff The log message should terminate with a newline instead
of a tab character.
2009-01-02 22:51:30 +00:00
Qing Li
8eca593c5a This checkin addresses a couple of issues:
1. The "route" command allows route insertion through the interface-direct
   option "-iface". During if_attach(), an sockaddr_dl{} entry is created
   for the interface and is part of the interface address list. This
   sockaddr_dl{} entry describes the interface in detail. The "route"
   command selects this entry as the "gateway" object when the "-iface"
   option is present. The "arp" and "ndp" commands also interact with the
   kernel through the routing socket when adding and removing static L2
   entries. The static L2 information is also provided through the
   "gateway" object with an AF_LINK family type, similar to what is
   provided by the "route" command. In order to differentiate between
   these two types of operations, a RTF_LLDATA flag is introduced. This
   flag is set by the "arp" and "ndp" commands when issuing the add and
   delete commands. This flag is also set in each L2 entry returned by the
   kernel. The "arp" and "ndp" command follows a convention where a RTM_GET
   is issued first followed by a RTM_ADD/DELETE. This RTM_GET request fills
   in the fields for a "rtm" object, which is reinjected into the kernel by
   a subsequent RTM_ADD/DELETE command. The entry returend from RTM_GET
   is a prefix route, so the RTF_LLDATA flag must be specified when issuing
   the RTM_ADD/DELETE messages.

2. Enforce the convention that NET_RT_FLAGS with a 0 w_arg is the
   specification for retrieving L2 information. Also optimized the
   code logic.

Reviewed by:   julian
2008-12-26 19:45:24 +00:00
Qing Li
eef7434f0d The "tun?" dev need not be opened at all. One is allowed to perform
the following operations, e.g.:
1) ifconfig tun0 create
2) ifconfig tun0 10.1.1.1 10.1.1.2
3) route add -net 192.103.54.0/24 -iface tun0
4) ifconfig tun0 destroy
If cv wait on the TUN_CLOSED flag, then the last operation (4) will
block forever.

Revert the previous changes and fix the mtx_unlock() leak.
2008-12-25 22:32:32 +00:00
Kip Macy
a76e397b3b - Close a race during which the open flag could be cleared but the tun_softc would still be referenced
by adding a separate TUN_CLOSED flag that is set after tunclose is done referencing it.

- drop the tun_mtx after the flag check to avoid holding it across if_detach which can recurse in to
  if_tun.c
2008-12-25 02:14:25 +00:00
Qing Li
388600e803 Provide a condition variable to delay the cloned interface
destroy operation until the referenced clone device has
been closed by the process properly. The behavior is now
consistently with the previous release.

Reviewed by: 	  Kip Macy
2008-12-22 01:56:56 +00:00
Kip Macy
6241d13a1b if_rtdel is always called with the RADIX_NODE_HEAD lock held 2008-12-18 09:59:24 +00:00
Kip Macy
d24c444ca0 add ifnet_byindex_locked to allow for use of IFNET_RLOCK 2008-12-18 04:50:44 +00:00
George V. Neville-Neil
a6c8d9978a Add TWINAX (Twin Axial Copper for 10G networking) media types.
Add code to the Chelsio driver so that it can recognize different
module types which may be plugged into it, including SR, LR lasers
and TWINAX copper cables.

Obtained from:	Chelsio Inc.
MFC after:	1 week
2008-12-17 22:59:29 +00:00
Andrew Thompson
f812e06742 - Protect against sc->sc_primary being null
- Initialise speed where its used
2008-12-17 21:04:43 +00:00
Andrew Thompson
be07c18007 Update the interface baudrate taking into account the max speed for the
different aggregation protocols.
2008-12-17 20:58:10 +00:00
Qing Li
9928dafbb8 Remove the rt argument from nd6_storelladdr() because
rt is no longer accessed.
2008-12-17 10:27:34 +00:00
Kip Macy
64c44e5db8 Keep stats in drbr_enqueue
Discussed with: ps
2008-12-17 08:12:50 +00:00
Kip Macy
c368cff776 avoid trying to acquire a shared lock while holding an exclusive lock
by making the ifnet lock acquisition exclusive
2008-12-17 04:33:52 +00:00
Kip Macy
1635d9171c merge in 2 buf_ring helper routines for enqueueing and freeing buf_rings 2008-12-17 04:00:43 +00:00
Kip Macy
991f8615e4 convert ifnet and afdata locks from mutexes to rwlocks 2008-12-17 00:11:56 +00:00
Andrew Thompson
09efca80df Also propagate the if_hwassist value to the parent so that cksum offload works.
Submitted by:	Tom Hicks (thicks_averesys.com)
2008-12-16 22:16:34 +00:00
Robert Watson
d2c205d5de A few locking fixes and cleanups to pfil hook registration,
unregistration, and execution:

- Add some brackets for clarity and trim a bit of vertical whitespace.
- Remove comments that may not contribute to clarity, such as "Lock"
  before acquiring a lock and "Get memory" before allocating memory.
- During hook registration, don't drop pfil_list_lock between checking
  for a duplicate and registering the hook, as this leaves a race
  condition by failing to enforce the "no duplicate hooks" invariant.
- Don't lock the hook during registration, since it's not yet in use.
- Document assumption that hooks will be quiesced before being
  unregistered.
- Don't write-lock hooks during removal because they are assumed
  quiesced.
- Rename "done" label to "locked_error" to be clear that it's an error
  path on the way out of hook execution.

MFC after:	pretty soon
2008-12-16 17:03:22 +00:00
Kip Macy
d193ecc9a6 remove assertion checks for now - ipfw uses its own lock for protecting its radix tree instance 2008-12-16 11:01:36 +00:00
Kip Macy
7b4d716b62 style and spelling fix 2008-12-16 04:41:39 +00:00
Kip Macy
e1344b9604 assert that the radix node head is locked when manipulating the tree 2008-12-16 04:40:43 +00:00
Kip Macy
8a61a4eec4 add macro for destroying an llentry's rwlock 2008-12-16 00:20:15 +00:00