Commit Graph

3074 Commits

Author SHA1 Message Date
ed
c0a01b0858 Allow certain headers to be included more easily.
Spotted by:	http://hacks.owlfolio.org/header-survey/
2013-05-21 21:20:10 +00:00
melifaro
70dfcb99bb Use separate function to update mbuf checksum flags instead of
duplicating the same code in different places.

MFC after:	2 weeks
2013-05-18 08:14:21 +00:00
melifaro
967c651cd7 Fix rte leak introduced in r248070.
MFC after:	2 weeks
2013-05-18 07:10:22 +00:00
julian
329247aec2 Finally change the mbuf to have its own fib field instead of stealing
4 flag bits. This was supposed to happen in 8.0, and again in 2012..

MFC after:	never
2013-05-16 16:20:17 +00:00
hrs
960e7e9fd4 Add IFF_MONITOR support to gre(4).
Tested by:	Chip Marshall
MFC after:	1 week
2013-05-11 19:05:38 +00:00
andre
cc8c6e4d01 Back out r249318, r249320 and r249327 due to a heisenbug most
likely related to a race condition in the ipi_hash_lock with
the exact cause currently unknown but under investigation.
2013-05-06 16:42:18 +00:00
eadler
a5a9ec51d6 Correct a few sizeof()s
Submitted by:	swildner@DragonFlyBSD.org
Reviewed by:	alfred
2013-05-01 04:37:34 +00:00
luigi
ff43752d35 remove $Id$ (whitespace change) 2013-04-30 16:00:21 +00:00
glebius
b4bc270e8f Add const qualifier to the dst parameter of the ifnet if_output method. 2013-04-26 12:50:32 +00:00
oleg
6ff116b3ca Recover missing arp_ifinit() call.
MFC after:	2 weeks
2013-04-18 20:13:33 +00:00
glebius
d9c22bdbc9 Switch lagg(4) statistics to counter(9).
The lagg(4) is often used to bond high speed links, so basic per-packet +=
on statistics cause cache misses and statistics loss.

Perfect solution would be to convert ifnet(9) to counters(9), but this
requires much more work, and unfortunately ABI change, so temporarily
patch lagg(4) manually.

We store counters in the softc, and once per second push their values
to legacy ifnet counters.

Sponsored by:	Nginx, Inc.
2013-04-15 13:00:42 +00:00
glebius
e79bb9704b Fix build. 2013-04-10 08:09:25 +00:00
andre
306fddaf78 Change certain heavily used network related mutexes and rwlocks to
reside on their own cache line to prevent false sharing with other
nearby structures, especially for those in the .bss segment.

NB: Those mutexes and rwlocks with variables next to them that get
changed on every invocation do not benefit from their own cache line.
Actually it may be net negative because two cache misses would be
incurred in those cases.
2013-04-09 21:02:20 +00:00
ae
844d612b2a Use IP6STAT_INC/IP6STAT_DEC macros to update ip6 stats.
MFC after:	1 week
2013-04-09 07:11:22 +00:00
markj
a688a70512 Ignore interface renames instead of removing the interface from the bridge
group.

Reviewed by:	rstone
Approved by:	rstone (co-mentor)
Sponsored by:	Sandvine Incorporated
MFC after:	1 week
2013-03-28 20:37:07 +00:00
glebius
82edd7c363 Remove __FreeBSD_version ifdefs. 2013-03-22 20:44:16 +00:00
ae
b3c4973a10 Fix style and comments. 2013-03-19 05:51:47 +00:00
glebius
b37af62b9e Use m_get/m_gethdr instead of compat macros.
Sponsored by:	Nginx, Inc.
2013-03-15 12:55:30 +00:00
glebius
76306b3465 - Use m_getcl() instead of hand allocating.
- Convert panic() to KASSERT.
- Remove superfluous cleaning of mbuf fields after allocation.
- Add comment on possible use of m_get2() here.

Sponsored by:	Nginx, Inc.
2013-03-15 12:52:59 +00:00
glebius
37a43650ed Functions m_getm2() and m_get2() have different order of arguments,
and that can drive someone crazy. While m_get2() is young and not
documented yet, change its order of arguments to match m_getm2().

Sorry for churn, but better now than later.
2013-03-12 13:42:47 +00:00
glebius
18f90896db Reinitialize eh after pfil(9) processing.
PR:		176764
Submitted by:	adri
2013-03-11 12:06:57 +00:00
melifaro
fccbb24392 Fix long-standing issue with interface routes being unprotected:
Use RTM_PINNED flag to mark route as immutable.
Forbid deleting immutable routes without special rtrequest1_fib() flag.
Adding interface address with prefix already in route table is handled
by atomically deleting old prefix and adding interface one.

Discussed with:	andre, eri
MFC after:	3 weeks
2013-03-08 20:33:50 +00:00
melifaro
2a77ba4103 Write lock is not required for find&compare operation.
MFC after:	2 weeks
2013-03-05 13:38:45 +00:00
glebius
f8098d720c Finish the r244185. This fixes ever growing counter of pfsync bad
length packets, which was actually harmless.

Note that peers with different version of head/ may grow this
counter, but it is harmless - all pfsync data is processed.

Reported & tested by:	Anton Yuzhaninov <citrin citrin.ru>
Sponsored by:		Nginx, Inc
2013-02-15 09:03:56 +00:00
glebius
a47c0295c5 Resolve source address selection in presense of CARP. Add a couple
of helper functions:

- carp_master()   - boolean function which is true if an address
		    is in the MASTER state.
- ifa_preferred() - boolean function that compares two addresses,
		    and is aware of CARP.

  Utilize ifa_preferred() in ifa_ifwithnet().

  The previous version of patch also changed source address selection
logic in jails using carp_master(), but we failed to negotiate this part
with Bjoern. May be we will approach this problem again later.

Reported & tested by:	Anton Yuzhaninov <citrin citrin.ru>
Sponsored by:		Nginx, Inc
2013-02-11 10:58:22 +00:00
rrs
75ad250e97 This fixes a out-of-order problem with several
of the newer drivers. The basic problem was
that the driver was pulling the mbuf off the
drbr ring and then when sending with xmit(), encounting
a full transmit ring. Thus the lower layer
xmit() function would return an error, and the
drivers would then append the data back on to the ring.
For TCP this is a horrible scenario sure to bring
on a fast-retransmit.

The fix is to use drbr_peek() to pull the data pointer
but not remove it from the ring. If it fails then
we either call the new drbr_putback or drbr_advance
method. Advance moves it forward (we do this sometimes
when the xmit() function frees the mbuf). When
we succeed we always call advance. The
putback will always copy the mbuf back to the top
of the ring. Note that the putback *cannot* be used
with a drbr_dequeue() only with drbr_peek(). We most
of the time, in putback, would not need to copy it
back since most likey the mbuf is still the same, but
sometimes xmit() functions will change the mbuf via
a pullup or other call. So the optimial case for
the single consumer is to always copy it back. If
we ever do a multiple_consumer (for lagg?) we
will  need a test and atomic in the put back possibly
a seperate putback_mc() in the ring buf.

Reviewed by:	jhb@freebsd.org, jlv@freebsd.org
2013-02-07 15:20:54 +00:00
glebius
7f832c3059 Retire struct sockaddr_inarp.
Since ARP and routing are separated, "proxy only" entries
don't have any meaning, thus we don't need additional field
in sockaddr to pass SIN_PROXY flag.

New kernel is binary compatible with old tools, since sizes
of sockaddr_inarp and sockaddr_in match, and sa_family are
filled with same value.

The structure declaration is left for compatibility with
third party software, but in tree code no longer use it.

Reviewed by:	ru, andre, net@
2013-01-31 08:55:21 +00:00
glebius
5e925b9e5a route_output() always supplies info with RTAX_GATEWAY member that
points to a sockaddr of AF_LINK family. Assert this instead of
checking.
2013-01-29 21:44:22 +00:00
np
66b6d0e94e Move lle_event to if_llatbl.h
lle_event replaced arp_update_event after the ARP rewrite and ended up
in if_ether.h simply because arp_update_event used to be there too.
IPv6 neighbor discovery is going to grow lle_event support and this is a
good time to move it to if_llatbl.h.

The two in-tree consumers of this event - OFED and toecore - are not
affected.

Reviewed by:	bz@
2013-01-25 23:58:21 +00:00
glebius
9f6a60e000 - Utilize m_get2(), accidentially fixing some signedness bugs.
- Return EMSGSIZE in both cases if uio_resid is oversized or undersized.
- No need to clear rcvif.
2013-01-24 14:29:31 +00:00
luigi
47486b84ac leftover from r245579... flags for semi transparent mode and direct
forwarding through a VALE switch
2013-01-23 03:49:48 +00:00
glebius
bc87b91f9e If lagg(4) can't forward a packet due to underlying port problems,
return much more meaningful ENETDOWN to the stack, instead of EBUSY.
2013-01-21 08:59:31 +00:00
glebius
cf8b6db820 - Add dashes before copyright notices.
- Add $FreeBSD$.
- Remove unused define.
2013-01-07 19:36:11 +00:00
peter
184da5d83d Juggle some internal symbols from our antique zlib (that originally came
in from kernel-pppd which is long gone) so that ZFS and DTRACE play nice.

This is a horrible hack to get freefall to compile, and is in dire need
of reconciliation.  This antique zlib-1.04 code needs to go away.
2013-01-06 14:59:59 +00:00
ae
f89408c03e Add an ability to set net.link.stf.permit_rfc1918 from the loader.
MFC after:	2 weeks
2012-12-27 21:26:08 +00:00
ae
6064718574 Add net.link.stf.permit_rfc1918 sysctl variable. It can be used to allow
the use of private IPv4 addresses with stf(4).

MFC after:	2 weeks
2012-12-27 20:59:22 +00:00
kevlo
df821e6509 Fix typo in comment.
Reviewed by:	thompsa
2012-12-18 06:37:23 +00:00
glebius
8137816adb Fix problem in r238990. The LLE_LINKED flag should be tested prior to
entering llentry_free(), and in case if we lose the race, we should simply
perform LLE_FREE_LOCKED(). Otherwise, if the race is lost by the thread
performing arptimer(), it will remove two references from the lle instead
of one.

Reported by:	Ian FREISLICH <ianf clue.co.za>
2012-12-13 11:11:15 +00:00
ghelmer
726beb3d43 Changes to resolve races in bpfread() and catchpacket() that, at worst,
cause kernel panics.

Add a flag to the bpf descriptor to indicate whether the hold buffer
is in use. In bpfread(), set the "hold buffer in use" flag before
dropping the descriptor lock during the call to bpf_uiomove().
Everywhere else the hold buffer is used or changed, wait while
the hold buffer is in use by bpfread(). Add a KASSERT in bpfread()
after re-acquiring the descriptor lock to assist uncovering any
additional hold buffer races.
2012-12-10 16:14:44 +00:00
hrs
377b89c55f - Move definition of V_deembed_scopeid to scope6_var.h.
- Deembed scope id in L3 address in in6_lltable_dump().
- Simplify scope id recovery in rtsock routines.
- Remove embedded scope id handling in ndp(8) and route(8) completely.
2012-12-05 19:45:24 +00:00
glebius
8e20fa5ae9 Mechanically substitute flags from historic mbuf allocator with
malloc(9) flags within sys.

Exceptions:

- sys/contrib not touched
- sys/mbuf.h edited manually
2012-12-05 08:04:20 +00:00
hrs
db5359d69a - Fix LOR in sa6_recoverscope() in rt_msg2()[1].
- Check V_deembed_scopeid before checking if sa_family == AF_INET6.
- Fix scope id handing in route(8)[2] and ifconfig(8).

Reported by:	rpaulo[1], Mateusz Guzik[1], peter[2]
2012-12-04 17:12:23 +00:00
melifaro
20cedcd3c8 Fix bpf_if structure leak introduced in r235745.
Move all such structures to delayed-free lists and
delete all matching on interface departure event.

MFC after:	1 week
2012-12-02 21:43:37 +00:00
pjd
30b3450aa9 - Use more appropriate loop (do { } while()) when generating ethernet address
for bridge interface.
- If we found a collision we can break the loop - only one collision is
  possible and one is exactly enough to need to renegerate.

Obtained from:	WHEEL Systems
MFC after:	1 week
2012-11-29 08:06:23 +00:00
andre
cd840ea089 Remove unused and unnecessary CSUM_IP_FRAGS checksumming capability.
Checksumming the IP header of fragments is no different from doing
normal IP headers.

Discussed with:	yongari
MFC after:	1 week
2012-11-27 19:31:49 +00:00
davidxu
852ac8ea6c Pass allocated unit number to make_dev, otherwise kernel panics later while
cloning second tap.

Reviewed by: kevlo,ed
2012-11-27 12:23:57 +00:00
glebius
d927dc2039 Better safe than sorry: reinitialize eh after ng_ether(4) and
if_bridge(4) processing, since mbuf may be modified there.

Submitted by:	youngari
2012-11-27 06:35:26 +00:00
glebius
ff9490f069 Re-initialize eh pointer after m_adj()
Submitted by:	Kohji Okuno <okuno.kohji jp.panasonic.com>
Reviewed by:	yongari
2012-11-26 19:45:01 +00:00
adrian
2191ae37a2 Fix up a compile time warning if INET6 isn't defined. 2012-11-18 04:51:46 +00:00
hrs
456b7a9341 Fill sin6_scope_id in sockaddr_in6 before passing it from the kernel to
userland via routing socket or sysctl.  This eliminates the following
KAME-specific sin6_scope_id handling routine from each userland utility:

 sin6.sin6_scope_id = ntohs(*(u_int16_t *)&sin6.sin6_addr.s6_addr[2]);

This behavior can be controlled by net.inet6.ip6.deembed_scopeid.  This is
set to 1 by default (sin6_scope_id will be filled in the kernel).

Reviewed by:	bz
2012-11-17 20:19:00 +00:00
ghelmer
249e3589f9 Work around a race in bpfread() by validating the hold buffer pointer
before freeing it. Otherwise, we can lose a buffer and cause a panic
in catchpacket().
2012-11-06 21:07:04 +00:00
ae
4354018055 Remove the recently added sysctl variable net.pfil.forward.
Instead, add protocol specific mbuf flags M_IP_NEXTHOP and
M_IP6_NEXTHOP. Use them to indicate that the mbuf's chain
contains the PACKET_TAG_IPFORWARD tag. And do a tag lookup
only when this flag is set.

Suggested by:	andre
2012-11-02 01:20:55 +00:00
glebius
f79061ff05 o Remove last argument to ip_fragment(), and obtain all needed information
on checksums directly from mbuf flags. This simplifies code.
o Clear CSUM_IP from the mbuf in ip_fragment() if we did checksums in
  hardware. Some driver may not announce CSUM_IP in theur if_hwassist,
  although try to do checksums if CSUM_IP set on mbuf. Example is em(4).
o While here, consistently use CSUM_IP instead of its alias CSUM_DELAY_IP.
  After this change CSUM_DELAY_IP vanishes from the stack.

Submitted by:	Sebastian Kuzminsky <seb lineratesystems.com>
2012-10-26 21:06:33 +00:00
ae
71112b5a8e Remove the IPFIREWALL_FORWARD kernel option and make possible to turn
on the related functionality in the runtime via the sysctl variable
net.pfil.forward. It is turned off by default.

Sponsored by:	Yandex LLC
Discussed with:	net@
MFC after:	2 weeks
2012-10-25 09:39:14 +00:00
glebius
278c83657a Fix fallout from r240071. If destination interface lookup fails,
we should broadcast a packet, not try to deliver it to NULL.

Reported by:	rpaulo
2012-10-24 18:33:44 +00:00
glebius
5cc3ac5902 Switch the entire IPv4 stack to keep the IP packet header
in network byte order. Any host byte order processing is
done in local variables and host byte order values are
never[1] written to a packet.

  After this change a packet processed by the stack isn't
modified at all[2] except for TTL.

  After this change a network stack hacker doesn't need to
scratch his head trying to figure out what is the byte order
at the given place in the stack.

[1] One exception still remains. The raw sockets convert host
byte order before pass a packet to an application. Probably
this would remain for ages for compatibility.

[2] The ip_input() still subtructs header len from ip->ip_len,
but this is planned to be fixed soon.

Reviewed by:	luigi, Maxim Dounin <mdounin mdounin.ru>
Tested by:	ray, Olivier Cochard-Labbe <olivier cochard.me>
2012-10-22 21:09:03 +00:00
melifaro
030e8d5bab Make PFIL use per-VNET lock instead of per-AF lock. Since most used packet
filters (ipfw and PF) use the same ruleset with the same lock for both
AF_INET and AF_INET6 there is no need in more fine-grade locking.
However, it is possible to request personal lock by specifying
PFIL_FLAG_PRIVATE_LOCK flag in pfil_head structure (see pfil.9 for
more details).

Export PFIL lock via rw_lock(9)/rm_lock(9)-like API permitting pfil consumers
to use this lock instead of own lock. This help reducing locks on main
traffic path.

pfil_assert() is currently not implemented due to absense of rm_assert().
Waiting for some kind of r234648 to be merged in HEAD.

This change is part of bigger patch reducing routing locking.

Sponsored by:	Yandex LLC
Reviewed by:	glebius, ae
OK'd by:	silence on net@
MFC after:	3 weeks
2012-10-22 14:10:17 +00:00
andre
91a702221a Update to previous r241688 to use __func__ instead of spelled out function
name in log(9) message.

Suggested by:	glebius
2012-10-19 10:07:55 +00:00
andre
a2f4fcfc3d Use LOG_WARNING level in in_attachdomain1() instead of printf().
Submitted by:	vijju.singh-at-gmail.com
2012-10-18 14:08:26 +00:00
andre
34a9a386cb Mechanically remove the last stray remains of spl* calls from net*/*.
They have been Noop's for a long time now.
2012-10-18 13:57:24 +00:00
glebius
7bdf4320f7 Utilize new macro to initialize if_baudrate(). 2012-10-18 09:57:56 +00:00
glebius
d5815329e7 Fix VIMAGE build.
Reported by:	Nikolai Lifanov <lifanov mail.lifanov.com>
Pointy hat to:	glebius
2012-10-17 21:19:27 +00:00
emax
0dfb309a1f provide helper if_initbaudrate() to set if_baudrate_pf and if_baudrate_pf.
again, use ixgbe(4) as an example of how to use new helper function.

Reviewed by:	jhb
MFC after:	1 week
2012-10-17 19:24:13 +00:00
delphij
37fb264720 Fix build. 2012-10-17 08:19:08 +00:00
emax
f4127691ff report total number of ports for each lagg(4) interface
via net.link.lagg.X.count sysctl

MFC after:	1  week
2012-10-16 22:43:14 +00:00
emax
214df82afa introduce concept of ifi_baudrate power factor. the idea is to work
around the problem where high speed interfaces (such as ixgbe(4))
are not able to report real ifi_baudrate. bascially, take a spare
byte from struct if_data and use it to store ifi_baudrate power
factor. in other words,

real ifi_baudrate = ifi_baudrate * 10 ^ ifi_baudrate power factor

this should be backwards compatible with old binaries. use ixgbe(4)
as an example on how drivers would set ifi_baudrate power factor

Discussed with:	kib, scottl, glebius
MFC after:	1 week
2012-10-16 20:18:15 +00:00
glebius
05f24a6b77 Make the "struct if_clone" opaque to users of the cloning API. Users
now use function calls:

  if_clone_simple()
  if_clone_advanced()

to initialize a cloner, instead of macros that initialize if_clone
structure.

Discussed with:		brooks, bz, 1 year ago
2012-10-16 13:37:54 +00:00
kevlo
ceb08698f2 Revert previous commit...
Pointyhat to:	kevlo (myself)
2012-10-10 08:36:38 +00:00
kevlo
8747a46991 Prefer NULL over 0 for pointers 2012-10-09 08:27:40 +00:00
glebius
f3a0231bff A step in resolving mess with byte ordering for AF_INET. After this change:
- All packets in NETISR_IP queue are in net byte order.
  - ip_input() is entered in net byte order and converts packet
    to host byte order right _after_ processing pfil(9) hooks.
  - ip_output() is entered in host byte order and converts packet
    to net byte order right _before_ processing pfil(9) hooks.
  - ip_fragment() accepts and emits packet in net byte order.
  - ip_forward(), ip_mloopback() use host byte order (untouched actually).
  - ip_fastforward() no longer modifies packet at all (except ip_ttl).
  - Swapping of byte order there and back removed from the following modules:
    pf(4), ipfw(4), enc(4), if_bridge(4).
  - Swapping of byte order added to ipfilter(4), based on __FreeBSD_version
  - __FreeBSD_version bumped.
  - pfil(9) manual page updated.

Reviewed by:	ray, luigi, eri, melifaro
Tested by:	glebius (LE), ray (BE)
2012-10-06 10:02:11 +00:00
delphij
a130b811b9 MFV: libpcap 1.3.0.
MFC after:	4 weeks
2012-10-05 18:42:50 +00:00
thompsa
7b87b7c711 Remove the M_NOWAIT from bridge_rtable_init as it isn't needed. The function
return value is not even checked and could lead to a panic on a null sc_rthash.

MFC after:	2 weeks
2012-10-04 07:40:55 +00:00
emaste
eff3cfa56f Cast through void * to silence compiler warning
The base netmap pointer and offsets involved are provided by the kernel
side of the netmap interface and will have appropriate alignment.

Sponsored by: ADARA Networks
MFC After: 2 weeks
2012-10-03 21:41:20 +00:00
jhb
aa29abaca0 Rename the module for 'device enc' to "if_enc" to avoid conflicting with
the CAM "enc" peripheral (part of ses(4)).  Previously the two modules
used the same name, so only one was included in a linked kernel causing
enc0 to not be created if you added IPSEC to GENERIC.  The new module
name follows the pattern of other network interfaces (e.g. "if_loop").

MFC after:	1 week
2012-10-02 12:25:30 +00:00
glebius
4b29d585cf The drbr(9) API appeared to be so unclear, that most drivers in
tree used it incorrectly, which lead to inaccurate overrated
if_obytes accounting. The drbr(9) used to update ifnet stats on
drbr_enqueue(), which is not accurate since enqueuing doesn't
imply successful processing by driver. Dequeuing neither mean
that. Most drivers also called drbr_stats_update() which did
accounting again, leading to doubled if_obytes statistics. And
in case of severe transmitting, when a packet could be several
times enqueued and dequeued it could have been accounted several
times.

o Thus, make drbr(9) API thinner. Now drbr(9) merely chooses between
  ALTQ queueing or buf_ring(9) queueing.
  - It doesn't touch the buf_ring stats any more.
  - It doesn't touch ifnet stats anymore.
  - drbr_stats_update() no longer exists.

o buf_ring(9) handles its stats itself:
  - It handles br_drops itself.
  - br_prod_bytes stats are dropped. Rationale: no one ever
    reads them but update of a common counter on every packet
    negatively affects performance due to excessive cache
    invalidation.
  - buf_ring_enqueue_bytes() reduced to buf_ring_enqueue(), since
    we no longer account bytes.

o Drivers handle their stats theirselves: if_obytes, if_omcasts.

o mlx4(4), igb(4), em(4), vxge(4), oce(4) and  ixv(4) no longer
  use drbr_stats_update(), and update ifnet stats theirselves.

o bxe(4) was the most correct driver, it didn't call
  drbr_stats_update(), thus it was the only driver accurate under
  moderate load. Now it also maintains stats itself.

o ixgbe(4) had already taken stats from hardware, so just
  - drop software stats updating.
  - take multicast packet count from hardware as well.

o mxge(4) just no longer needs NO_SLOW_STATS define.

o cxgb(4), cxgbe(4) need no change, since they obtain stats
  from hardware.

Reviewed by:	jfv, gnn
2012-09-28 18:28:27 +00:00
glebius
99ed11b97d - In the bridge_enqueue() do success/error accounting for
each fragment, not only once.
- In the GRAB_OUR_PACKETS() macro do increase if_ibytes.
2012-09-26 20:09:48 +00:00
emaste
8d97b5af4a Correct misspelling in debug output. 2012-09-26 01:09:19 +00:00
emaste
3a425ea5d6 Revert part of an earlier patch attempt that snuck in with r240938. 2012-09-25 23:41:45 +00:00
emaste
e89f1e2950 Avoid INVARIANTS panic destroying an in-use tap(4)
The requirement (implied by the KASSERT in tap_destroy) that the tap is
closed isn't valid; destroy_dev will block in devdrn while other threads
are in d_* functions.

Note: if_tun had the same issue, addressed in SVN revisions r186391,
r186483 and r186497.  The use of the condvar there appears to be
redundant with the functionality provided by destroy_dev.

Sponsored by:	ADARA Networks
Reviewed by:	dwhite
MFC after:	2 weeks
2012-09-25 22:10:14 +00:00
emaste
25f7973cf8 Remove an incorrect comment 2012-09-25 21:19:17 +00:00
glebius
450219f7cf Convert lagg(4) to use if_transmit instead of if_start.
In collaboration with:	thompsa, sbruno, fabient
2012-09-20 10:05:10 +00:00
glebius
9062851653 Utilize Jenkins hash with random seed for source nodes storage. 2012-09-20 06:52:05 +00:00
glebius
439d708ae8 Add missing break.
Pointy hat to:	glebius
2012-09-20 03:09:58 +00:00
glebius
63628d08be Fix build, pass the pointy hat please. 2012-09-18 12:21:32 +00:00
glebius
c3ead4d7df Make ruleset anchors in pf(4) reentrant. We've got two problems here:
1) Ruleset parser uses a global variable for anchor stack.
2) When processing a wildcard anchor, matching anchors are marked.

To fix the first one:

o Allocate anchor processing stack on stack. To make this allocation
  as small as possible, following measures taken:
  - Maximum stack size reduced from 64 to 32.
  - The struct pf_anchor_stackframe trimmed by one pointer - parent.
    We can always obtain the parent via the rule pointer.
  - When pf_test_rule() calls pf_get_translation(), the former lends
    its stack to the latter, to avoid recursive allocation 32 entries.

The second one appeared more tricky. The code, that marks anchors was
added in OpenBSD rev. 1.516 of pf.c. According to commit log, the idea
is to enable the "quick" keyword on an anchor rule. The feature isn't
documented anywhere. The most obscure part of the 1.516 was that code
examines the "match" mark on a just processed child, which couldn't be
put here by current frame. Since this wasn't documented even in the
commit message and functionality of this is not clear to me, I decided
to drop this examination for now. The rest of 1.516 is redone in a
thread safe manner - the mark isn't put on the anchor itself, but on
current stack frame. To avoid growing stack frame, we utilize LSB
from the rule pointer, relying on kernel malloc(9) returning pointer
aligned addresses.

Discussed with:		dhartmei
2012-09-18 10:54:56 +00:00
glebius
933e74cb8b - Add $FreeBSD$ to allow modifications to this file.
- Move $OpenBSD$ to a more standard place.
2012-09-18 10:52:46 +00:00
glebius
0ccf4838d7 o Create directory sys/netpfil, where all packet filters should
reside, and move there ipfw(4) and pf(4).

o Move most modified parts of pf out of contrib.

Actual movements:

sys/contrib/pf/net/*.c		-> sys/netpfil/pf/
sys/contrib/pf/net/*.h		-> sys/net/
contrib/pf/pfctl/*.c		-> sbin/pfctl
contrib/pf/pfctl/*.h		-> sbin/pfctl
contrib/pf/pfctl/pfctl.8	-> sbin/pfctl
contrib/pf/pfctl/*.4		-> share/man/man4
contrib/pf/pfctl/*.5		-> share/man/man5

sys/netinet/ipfw		-> sys/netpfil/ipfw

The arguable movement is pf/net/*.h -> sys/net. There are
future plans to refactor pf includes, so I decided not to
break things twice.

Not modified bits of pf left in contrib: authpf, ftp-proxy,
tftp-proxy, pflogd.

The ipfw(4) movement is planned to be merged to stable/9,
to make head and stable match.

Discussed with:		bz, luigi
2012-09-14 11:51:49 +00:00
glebius
5190d38ee3 Merge the projects/pf/head branch, that was worked on for last six months,
into head. The most significant achievements in the new code:

 o Fine grained locking, thus much better performance.
 o Fixes to many problems in pf, that were specific to FreeBSD port.

New code doesn't have that many ifdefs and much less OpenBSDisms, thus
is more attractive to our developers.

  Those interested in details, can browse through SVN log of the
projects/pf/head branch. And for reference, here is exact list of
revisions merged:

r232043, r232044, r232062, r232148, r232149, r232150, r232298, r232330,
r232332, r232340, r232386, r232390, r232391, r232605, r232655, r232656,
r232661, r232662, r232663, r232664, r232673, r232691, r233309, r233782,
r233829, r233830, r233834, r233835, r233836, r233865, r233866, r233868,
r233873, r234056, r234096, r234100, r234108, r234175, r234187, r234223,
r234271, r234272, r234282, r234307, r234309, r234382, r234384, r234456,
r234486, r234606, r234640, r234641, r234642, r234644, r234651, r235505,
r235506, r235535, r235605, r235606, r235826, r235991, r235993, r236168,
r236173, r236179, r236180, r236181, r236186, r236223, r236227, r236230,
r236252, r236254, r236298, r236299, r236300, r236301, r236397, r236398,
r236399, r236499, r236512, r236513, r236525, r236526, r236545, r236548,
r236553, r236554, r236556, r236557, r236561, r236570, r236630, r236672,
r236673, r236679, r236706, r236710, r236718, r237154, r237155, r237169,
r237314, r237363, r237364, r237368, r237369, r237376, r237440, r237442,
r237751, r237783, r237784, r237785, r237788, r237791, r238421, r238522,
r238523, r238524, r238525, r239173, r239186, r239644, r239652, r239661,
r239773, r240125, r240130, r240131, r240136, r240186, r240196, r240212.

I'd like to thank people who participated in early testing:

Tested by:	Florian Smeets <flo freebsd.org>
Tested by:	Chekaluk Vitaly <artemrts ukr.net>
Tested by:	Ben Wilber <ben desync.com>
Tested by:	Ian FREISLICH <ianf cloudseed.co.za>
2012-09-08 06:41:54 +00:00
melifaro
b490e053cd Fix the build broken by r240099.
Hide link_pfil_hook under _KERNEL macro.

MFC after:    3 weeks
2012-09-04 22:17:33 +00:00
melifaro
1fbae66b6e Introduce new link-layer PFIL hook V_link_pfil_hook.
Merge ether_ipfw_chk() and part of bridge_pfil() into
unified ipfw_check_frame() function called by PFIL.
This change was suggested by rwatson? @ DevSummit.

Remove ipfw headers from ether/bridge code since they are unneeded now.

Note this thange introduce some (temporary) performance penalty since
PFIL read lock has to be acquired for every link-level packet.

MFC after:     3 weeks
2012-09-04 19:43:26 +00:00
glebius
0259eae71d - Move jenkins.h to jenkins_hash.c
- Provide missing function that can do hashing of arbitrary sized buffer.
- Refetch lookup3.c and do only minimal edits to it, so that diff between
  our jenkins_hash.c and lookup3.c is minimal.
- Add declarations for jenkins_hash(), jenkins_hash32() to sys/hash.h.
- Document these functions in hash(9)

Obtained from:	http://burtleburtle.net/bob/c/lookup3.c
2012-09-04 12:07:33 +00:00
glebius
7e324c5767 Change bridge(4) to use if_transmit for forwarding packets to underlying
interfaces instead of queueing.

Tested by:	ray
2012-09-03 10:08:20 +00:00
glebius
165d5550eb In ifc_alloc_unit():
- In the !wildcard case, return ENOSPC instead of confusing EEXIST
  in case if ifc->ifc_maxunit reached.
- Fix unit leak, that I've introduced in previous revision.

Submitted by:	Daan Vreeken <Daan vitsch.nl>
2012-08-30 12:18:45 +00:00
jhb
989ddf5cba Fix a silly grammar bogon.
Submitted by:	Stephen McKay
2012-08-21 19:07:28 +00:00
jhb
afa5448935 Refine the changes made in r208212 to avoid bogus failures from
if_delmulti() when clearing the configuration for a subinterface when
the parent interface is being detached.  The current code was still
triggering an assertion in if_delmulti() due to the parent interface being
partially detached.  Fix this by not calling if_delmulti() at all if the
parent interface is being detached.  Warn if if_delmulti() fails when the
parent is not being detached (but similar to 208212, still proceed with
tearing down the vlan state).

Tested by:	ae@
MFC after:	1 month
2012-08-20 16:00:33 +00:00
jhb
8f6ba08bcc Unexpand a couple of TAILQ_FOREACH()s. 2012-08-17 16:01:24 +00:00
kib
cac2fe116f After the PHYS_TO_VM_PAGE() function was de-inlined, the main reason
to pull vm_param.h was removed.  Other big dependency of vm_page.h on
vm_param.h are PA_LOCK* definitions, which are only needed for
in-kernel code, because modules use KBI-safe functions to lock the
pages.

Stop including vm_param.h into vm_page.h. Include vm_param.h
explicitely for the kernel code which needs it.

Suggested and reviewed by:	alc
MFC after:    2 weeks
2012-08-05 14:11:42 +00:00
glebius
abf245020a Fix races between in_lltable_prefix_free(), lla_lookup(),
llentry_free() and arptimer():

o Use callout_init_rw() for lle timeout, this allows us safely
  disestablish them.
  - This allows us to simplify the arptimer() and make it
    race safe.
o Consistently use ifp->if_afdata_lock to lock access to
  linked lists in the lle hashes.
o Introduce new lle flag LLE_LINKED, which marks an entry that
  is attached to the hash.
  - Use LLE_LINKED to avoid double unlinking via consequent
    calls to llentry_free().
  - Mark lle with LLE_DELETED via |= operation istead of =,
    so that other flags won't be lost.
o Make LLE_ADDREF(), LLE_REMREF() and LLE_FREE_LOCKED() more
  consistent and provide more informative KASSERTs.

The patch is a collaborative work of all submitters and myself.

PR:		kern/165863
Submitted by:	Andrey Zonov <andrey zonov.org>
Submitted by:	Ryan Stone <rysto32 gmail.com>
Submitted by:	Eric van Gyzen <eric_van_gyzen dell.com>
2012-08-02 13:57:49 +00:00
glebius
34fe3f296a The llentry_update() is used only by flowtable and the latter
always passes NULL pointer to it. Thus, code can be simplified
and function renamed to llentry_alloc() to match rtalloc().
2012-08-02 13:20:44 +00:00
glebius
588de42f27 Some more whitespace cleanup. 2012-08-01 09:00:26 +00:00
glebius
53cb168f80 Some style(9) and whitespace changes.
Together with:	Andrey Zonov <andrey zonov.org>
2012-07-31 11:31:12 +00:00
bz
245ecd0552 Hardcode the loopback rx/tx checkum options for IPv6 to on without
checking. This allows the FreeBSD 9.1 release process to move forward.
Work around the problem that loopback connections to local addresses
not on loopback interfaces and not on interfaces w/ IPv6 checksum offloading
enabled would not work.
A proper fix to allow us to disable the "checksum offload" on loopback
for testing, measurements, ... as we allow for IPv4 needs to put in
place later.

Reported by:	tuexen, Matthew Seaman (m.seaman infracaninophile.co.uk)
Reported by:	Mike Andrews (mandrews bit0.com), kib, ...
PR:		kern/170070
MFC after:	1 day
X-MFC after:	re approval
2012-07-28 20:31:39 +00:00
melifaro
497943ec39 Permit changing MTU in 6to4 relay.
This behavior is recommended by RFC 4213 clause 3.2.

Sometimes fragmentation is the least evil.
For example, some Linux IPVS kernels forwards
ICMPv6 checksums to real servers incorrectly.

Reviewed by:      hrs(previous version)
Approved by:      kib(mentor)
MFC after:        1 week
2012-07-15 17:44:27 +00:00
emaste
b19509b9f3 Simplify error case
Submitted by:	thompsa@
2012-07-10 20:59:35 +00:00
emaste
83fbc9d844 Plug potential mbuf leak when bridging fragments
If an error occurs when transmitting one mbuf in a chain of fragments,
free the subsequent fragments instead of leaking them.

Sponsored by:   ADARA Networks
2012-07-10 13:17:32 +00:00
trociny
f5f1e19755 In epair_clone_destroy(), when destroying the second half, we have to
switch to its vnet before calling ether_ifdetach(). Otherwise if the
second half resides in a different vnet, if_detach() silently fails
leaving a stale pointer in V_ifnet list, and the system crashes trying
to access this pointer later.

Another solution could be not to allow to destroy epair unless both
ends are in the home vnet.

Discussed with:	bz
Tested by:	delphij
2012-07-09 20:38:18 +00:00
emaste
e754993f68 Restore error handling lost in r191603
This was missed in the change from IFQ_ENQUEUE to if_transmit.

Sponsored by:   ADARA Networks
2012-07-09 14:16:49 +00:00
emaste
e91d8ed669 Implement SIOCGIFMEDIA for if_tap(4)
Appease certain if_tap(4) consumers by providing simulated Ethernet
media status.

DragonFly commit 70d9a675bf5441cc854a843ead702d08928c37f3

Obtained from:  DragonFly BSD
2012-07-06 23:17:30 +00:00
glebius
418a04b467 When ip_output()/ip6_output() is supplied a struct route *ro argument,
it skips FLOWTABLE lookup. However, the non-NULL ro has dual meaning
here: it may be supplied to provide route, and it may be supplied to
store and return to caller the route that ip_output()/ip6_output()
finds. In the latter case skipping FLOWTABLE lookup is pessimisation.

The difference between struct route filled by FLOWTABLE and filled
by rtalloc() family is that the former doesn't hold a reference on
its rtentry. Reference is hold by flow entry, and it is about to
be released in future. Thus, route filled by FLOWTABLE shouldn't
be passed to RTFREE() macro.

- Introduce new flag for struct route/route_in6, that marks route
  not holding a reference on rtentry.
- Introduce new macro RO_RTFREE() that cleans up a struct route
  depending on its kind.
- All callers to ip_output()/ip6_output() that do supply non-NULL
  but empty route should use RO_RTFREE() to free results of
  lookup.
- ip_output()/ip6_output() now do FLOWTABLE lookup always when
  ro->ro_rt == NULL.

Tested by:	tuexen (SCTP part)
2012-07-04 07:37:53 +00:00
thompsa
c61cad51ad Add the same check as vlan(4) where we ignore the ifnet departure event if the
interface is just being renamed.

PR:		kern/169557
Submitted by:	Mark Johnston
MFC after:	3 days
2012-06-30 19:09:02 +00:00
jhb
722e057169 Hold GIF_LOCK() for almost all of gif_start(). It is required to be held
across in_gif_output() and in6_gif_output() anyway, and once it is held
across those it might as well be held for the entire loop.  This simplifies
the code and removes the need for the custom IFF_GIF_WANTED flag (which
belonged in the softc and not as an IFF_* flag anyway).

Tested by:	Vincent Hoffman  vince  unsane co uk
2012-06-29 15:21:34 +00:00
np
67d5f1a727 - Updated TOE support in the kernel.
- Stateful TCP offload drivers for Terminator 3 and 4 (T3 and T4) ASICs.
  These are available as t3_tom and t4_tom modules that augment cxgb(4)
  and cxgbe(4) respectively.  The cxgb/cxgbe drivers continue to work as
  usual with or without these extra features.

- iWARP driver for Terminator 3 ASIC (kernel verbs).  T4 iWARP in the
  works and will follow soon.

Build-tested with make universe.

30s overview
============
What interfaces support TCP offload?  Look for TOE4 and/or TOE6 in the
capabilities of an interface:
# ifconfig -m | grep TOE

Enable/disable TCP offload on an interface (just like any other ifnet
capability):
# ifconfig cxgbe0 toe
# ifconfig cxgbe0 -toe

Which connections are offloaded?  Look for toe4 and/or toe6 in the
output of netstat and sockstat:
# netstat -np tcp | grep toe
# sockstat -46c | grep toe

Reviewed by:	bz, gnn
Sponsored by:	Chelsio communications.
MFC after:	~3 months (after 9.1, and after ensuring MFC is feasible)
2012-06-19 07:34:13 +00:00
rrs
7aa509cdff Fix comment to better reflect how we are
cheating and using the csum_data. Also fix
style issues with the comments.
2012-06-12 13:31:32 +00:00
rrs
13591e0a79 Note to self. Have morning coffee *before* committing things.
There is no mac_addr in the mbuf for BSD.. cheat like
we are supposed to and use the csum field since our friend
the gif tunnel itself will never use offload.
2012-06-12 12:44:17 +00:00
rrs
8fa5fc067b Opps forgot to commit the flag. 2012-06-12 12:40:15 +00:00
rrs
813f2809c5 Allow a gif tunnel to be used with ALTq.
Reviewed by:	gnn
2012-06-12 10:44:09 +00:00
thompsa
8c0678ab02 Fix a panic I introduced in r234487, the bridge softc pointer is set to null
early in the detach so rearrange things not to explode.

Reported by:	David Roffiaen, Gustau Perez Querol
Tested by:	David Roffiaen
MFC after:	3 days
2012-06-11 20:12:13 +00:00
melifaro
ca3b31c289 Fix typo introduced in r236559.
Pointed by:   bcr
Approved by:  kib(mentor)
2012-06-09 10:04:40 +00:00
trociny
e463b28f1f Sort includes.
Submitted by:	Daan Vreeken <pa4dan Bliksem.VEHosting.nl>
MFC after:	3 days
2012-06-07 19:48:45 +00:00
trociny
2d97bce56e Add VIMAGE support to if_tap.
PR:		kern/152047, kern/158686
Submitted by:	Daan Vreeken <pa4dan Bliksem.VEHosting.nl>
MFC after:	1 week
2012-06-07 19:46:46 +00:00
melifaro
511c0436a1 Fix panic introduced by r235745. Panic occurs after first packet traverse renamed interface.
Add several comments on locking

Found by:         avg
Approved by:      ae(mentor)
Tested by:        avg
MFC after:        1 week
2012-06-04 12:36:58 +00:00
tuexen
dc64091687 Seperate SCTP checksum offloading for IPv4 and IPv6.
While there: remove some trainling whitespaces.

MFC after: 3 days
X-MFC with: 236170
2012-05-30 20:56:07 +00:00
jkim
f2bf1d383e Fix style(9) nits, reduce unnecessary type castings, etc., for bpf_setf(). 2012-05-29 22:28:46 +00:00
jkim
9a4c9bf04c - Save the previous filter right before we set new one.
- Reduce duplicate code and make it little easier to read.

MFC after:	2 weeks
2012-05-29 22:21:53 +00:00
jkim
2287ab7bca Fix 32-bit shim for BIOCSETF to drop all packets buffered on the descriptor
and reset statistics as it should.

MFC after:	3 days
2012-05-29 18:44:53 +00:00
melifaro
ce59e84fbe Fix BPF_JITTER code broken by r235746.
Pointed by:       jkim
Reviewed by:      jkim (except locking changes)
Approved by:      (mentor)
MFC after:        2 weeks
2012-05-29 12:52:30 +00:00
rea
cda9e92421 if_lagg: allow to invoke SIOCSLAGGPORT multiple times in a row
Currently, 'ifconfig laggX down' does not remove members from this
lagg(4) interface.  So, 'service netif stop laggX' followed by
'service netif start laggX' will choke, because "stop" will leave
interfaces attached to the laggX and ifconfig from the "start" will
refuse to add already-existing interfaces.

The real-world case is when I am bundling together my Ethernet and
WiFi interfaces and using multiple profiles for accessing network in
different places: system being booted up with one profile, but later
this profile being exchanged to another one, followed by 'service
netif restart' will not add WiFi interface back to the lagg: the
"stop" action from 'service netif restart' will shut down my main WiFi
interface, so wlan0 that exists in the lagg0 will be destroyed and
purged from lagg0; the "start" action will try to re-add both
interfaces, but since Ethernet one is already in lagg0, ifconfig will
refuse to add the wlan0 from WiFi interface.

Since adding the interface to the lagg(4) when it is already here
should be an idempotent action: we're really not changing anything,
so this fix doesn't change the semantics of interface addition.

Approved by: thompsa
Reviewed by: emaste
MFC after: 1 week
2012-05-28 12:13:04 +00:00
bz
ac429c7044 It turns out that too many drivers are not only parsing the L2/3/4
headers for TSO but also for generic checksum offloading.  Ideally we
would only have one common function shared amongst all drivers, and
perhaps when updating them for IPv6 we should introduce that.
Eventually we should provide the meta information along with mbufs to
avoid (re-)parsing entirely.

To not break IPv6 (checksums and offload) and to be able to MFC the
changes without risking to hurt 3rd party drivers, duplicate the v4
framework, as other OSes have done as well.

Introduce interface capability flags for TX/RX checksum offload with
IPv6, to allow independent toggling (where possible).  Add CSUM_*_IPV6
flags for UDP/TCP over IPv6, and reserve further for SCTP, and IPv6
fragmentation.  Define CSUM_DELAY_DATA_IPV6 as we do for legacy IP and
add an alias for CSUM_DATA_VALID_IPV6.

This pretty much brings IPv6 handling in line with IPv4.
TSO is still handled in a different way and not via if_hwassist.

Update ifconfig to allow (un)setting of the new capability flags.
Update loopback to announce the new capabilities and if_hwassist flags.

Individual driver updates will have to follow, as will SCTP.

Reported by:	gallatin, dim, ..
Reviewed by:	gallatin (glanced at?)
MFC after:	3 days
X-MFC with:	r235961,235959,235958
2012-05-28 09:30:13 +00:00
thompsa
d7eba5b13d Turn LACP debugging from a compile time option to a sysctl, it is very handy to
be able to turn it on when negotiation to a switch misbehaves.

Submitted by:	Andrew Boyer
MFC after:	3 days
2012-05-26 08:09:01 +00:00
bz
bc95289fae MFp4 bz_ipv6_fast:
Simple yet effective change enabling checksum "offload" on loopback
  for IPv6 to avoid expensive computations.

  Sponsored by:	The FreeBSD Foundation
  Sponsored by:	iXsystems

Reviewed by:	gnn (as part of the whole)
MFC After:	3 days
2012-05-25 02:21:17 +00:00
melifaro
e145f3abac Make most BPF ioctls() SMP-safe.
Approved by:      kib(mentor)
MFC in:            4 weeks
2012-05-21 22:21:00 +00:00
melifaro
e5f61c6580 Call bpf_jitter() before acquiring BPF global lock due to malloc() being used inside bpf_jitter.
Eliminate bpf_buffer_alloc() and allocate BPF buffers on descriptor creation and BIOCSBLEN ioctl.
This permits us not to allocate buffers inside bpf_attachd() which is protected by global lock.

Approved by:      kib(mentor)
MFC in:            4 weeks
2012-05-21 22:19:19 +00:00
melifaro
34ec5c8650 Fix old panic when BPF consumer attaches to destroying interface.
'flags' field is added to the end of bpf_if structure. Currently the only
flag is BPFIF_FLAG_DYING which is set on bpf detach and checked by bpf_attachd()
Problem can be easily triggered on SMP stable/[89] by the following command (sort of):
'while true; do ifconfig vlan222 create vlan 222 vlandev em0 up ; tcpdump -pi vlan222 & ; ifconfig vlan222 destroy ; done'

Fix possible use-after-free when BPF detaches itself from interface, freeing bpf_bif memory,
while interface is still UP and there can be routes via this interface.
Freeing is now delayed till ifnet_departure_event is received via eventhandler(9) api.

Convert bpfd rwlock back to mutex due lack of performance gain (currently checking if packet
matches filter is done without holding bpfd lock and we have to acquire write lock if packet matches)

Approved by:      kib(mentor)
MFC in:            4 weeks
2012-05-21 22:17:29 +00:00
melifaro
1c776bfa6e Fix panic on attaching to non-existent interface (introduced by r233937, pointed by hrs@)
Fix panic on tcpdump being attached to interface being removed (introduced by r233937, pointed by hrs@ and adrian@)
Protect most of bpf_setf() by BPF global lock

Add several forgotten assertions (thanks to adrian@)

Document current locking model inside bpf.c
Document EVENTHANDLER(9) usage inside BPF.

Approved by:       kib(mentor)
Tested by:         gnn
MFC in:            4 weeks
2012-05-21 22:13:48 +00:00
marcel
434c53cbc3 Use the LLINDEX macro to access the link-level I/F index. This makes
it possible to work with a different type for the sdl_index field --
it only requires a recompile.

Obtained from:	Juniper Networks, Inc.
2012-05-19 02:39:43 +00:00
delphij
a17ebbd192 Sync DLTs with the latest pcap version.
MFC after:	2 weeks
2012-05-14 05:10:41 +00:00
melifaro
46b1e41aff Revert r234834 per luigi@ request.
Cleaner solution (e.g. adding another header) should be done here.

Original log:
  Move several enums and structures required for L2 filtering from ip_fw_private.h to ip_fw.h.
  Remove ipfw/ip_fw_private.h header from non-ipfw code.

Requested by:      luigi
Approved by:       kib(mentor)
2012-05-03 08:56:43 +00:00
emaste
62cde8b2a2 Relax restriction on direct tx to child ports
Lagg(4) restricts the type of packet that may be sent directly to a child
port, to avoid undesired output from accidental misconfiguration.
Previously only ETHERTYPE_PAE was permitted.

BPF writes to a lagg(4) child port are presumably intentional, so just
allow them, while still blocking other packets that should take the
aggregation path.

PR:		kern/138620
Approved by:	thompsa@
2012-05-03 01:41:12 +00:00
melifaro
b600972ec6 Move several enums and structures required for L2 filtering from ip_fw_private.h to ip_fw.h.
Remove ipfw/ip_fw_private.h header from non-ipfw code.

Approved by:        ae(mentor)
MFC after:          2 weeks
2012-04-30 10:22:23 +00:00
melifaro
30081abc9a Do not require radix write lock to be held while dumping route table
via sysctl(4) interface. This permits router not to stop forwarding
packets while route table is being written to user-supplied buffer.

Reported by:        Pawel Tyll <ptyll@nitronet.pl>
Approved by:        kib(mentor)

MFC after:          1 week
2012-04-22 16:13:23 +00:00
thompsa
b85b9300bc Move the interface media check to a taskqueue, some interfaces (usb) sleep
during SIOCGIFMEDIA and we were holding locks.
2012-04-20 10:06:28 +00:00
thompsa
5ffa03204c Add linkstate to bridge(4), set the link to up when at least one underlying
interface is up, otherwise the link is down.

This, among other things, allows carp to work on a bridge.

Prodded by:	glebius
Tested by:	Alexander Lunev
2012-04-20 09:55:50 +00:00
thompsa
8379680eec Remove KASSERTS, they do not add any value here since the pointer is about to
be derefernced anyway.
2012-04-18 01:39:14 +00:00
luigi
39cda1ca43 A bit of cleanup in the names of fields of netmap-related structures.
Use the name 'ring' instead of 'queue' in all fields.
Bump NETMAP_API.
2012-04-13 16:03:07 +00:00
luigi
aed1b833cf remove an unnecessary #define 2012-04-12 10:32:34 +00:00
thompsa
8d206d7279 Set the proto to LAGG_PROTO_NONE before calling the detach routine so packets
are discarded, this is an issue because lacp drops the lock which may allow
network threads to access freed memory. Expand the lock coverage so the
detach/attach happen atomically.

Submitted by:	Andrew Boyer (earlier version)
2012-04-12 01:07:17 +00:00
jhb
4e983169d1 Add media types for 40G media that might be used with FreeBSD.
Reviewed by:	bz
MFC after:	2 weeks
2012-04-10 13:59:35 +00:00
melifaro
ff7ed324aa Fix build broken by r233938.
Pointed by:     David Wolfskill <david@catwhisker.org>
Approved by:    kib (mentor)
Pointy hat to:  melifaro
2012-04-06 13:34:19 +00:00
melifaro
85ccef88d3 - Improve performace for writer-only BPF users.
Linux and Solaris (at least OpenSolaris) has PF_PACKET socket families to send
raw ethernet frames. The only FreeBSD interface that can be used to send raw frames
is BPF. As a result, many programs like cdpd, lldpd, various dhcp stuff uses
BPF only to send data. This leads us to the situation when software like cdpd,
being run on high-traffic-volume interface significantly reduces overall performance
since we have to acquire additional locks for every packet.

Here we add sysctl that changes BPF behavior in the following way:
If program came and opens BPF socket without explicitly specifyin read filter we
assume it to be write-only and add it to special writer-only per-interface list.
This makes bpf_peers_present() return 0, so no additional overhead is introduced.
After filter is supplied, descriptor is added to original per-interface list permitting
packets to be captured.

Unfortunately, pcap_open_live() sets catch-all filter itself for the purpose of
setting snap length.

Fortunately, most programs explicitly sets (event catch-all) filter after that.
tcpdump(1) is a good example.

So a bit hackis approach is taken: we upgrade description only after second
BIOCSETF is received.

Sysctl is named net.bpf.optimize_writers and is turned off by default.

- While here, document all sysctl variables in bpf.4

Sponsored by Yandex LLC

Reviewed by:    glebius (previous version)
Reviewed by:    silence on -net@
Approved by:    (mentor)

MFC after:      4 weeks
2012-04-06 06:55:21 +00:00
melifaro
8b1d10268c - Improve BPF locking model.
Interface locks and descriptor locks are converted from mutex(9) to rwlock(9).
This greately improves performance: in most common case we need to acquire 1
reader lock instead of 2 mutexes.

- Remove filter(descriptor) (reader) lock in bpf_mtap[2]
This was suggested by glebius@. We protect filter by requesting interface
writer lock on filter change.

- Cover struct bpf_if under BPF_INTERNAL define. This permits including bpf.h
without including rwlock stuff. However, this is is temporary solution,
struct bpf_if should be made opaque for any external caller.

Found by:       Dmitrij Tejblum <tejblum@yandex-team.ru>
Sponsored by:   Yandex LLC

Reviewed by:    glebius (previous version)
Reviewed by:    silence on -net@
Approved by:    (mentor)

MFC after:      3 weeks
2012-04-06 06:53:58 +00:00