Commit Graph

2945 Commits

Author SHA1 Message Date
Gleb Smirnoff
7b11548469 Add missing break.
Pointy hat to:	glebius
2012-09-20 03:09:58 +00:00
Gleb Smirnoff
9ed8bbbdbe Fix build, pass the pointy hat please. 2012-09-18 12:21:32 +00:00
Gleb Smirnoff
1d6139c0e4 Make ruleset anchors in pf(4) reentrant. We've got two problems here:
1) Ruleset parser uses a global variable for anchor stack.
2) When processing a wildcard anchor, matching anchors are marked.

To fix the first one:

o Allocate anchor processing stack on stack. To make this allocation
  as small as possible, following measures taken:
  - Maximum stack size reduced from 64 to 32.
  - The struct pf_anchor_stackframe trimmed by one pointer - parent.
    We can always obtain the parent via the rule pointer.
  - When pf_test_rule() calls pf_get_translation(), the former lends
    its stack to the latter, to avoid recursive allocation 32 entries.

The second one appeared more tricky. The code, that marks anchors was
added in OpenBSD rev. 1.516 of pf.c. According to commit log, the idea
is to enable the "quick" keyword on an anchor rule. The feature isn't
documented anywhere. The most obscure part of the 1.516 was that code
examines the "match" mark on a just processed child, which couldn't be
put here by current frame. Since this wasn't documented even in the
commit message and functionality of this is not clear to me, I decided
to drop this examination for now. The rest of 1.516 is redone in a
thread safe manner - the mark isn't put on the anchor itself, but on
current stack frame. To avoid growing stack frame, we utilize LSB
from the rule pointer, relying on kernel malloc(9) returning pointer
aligned addresses.

Discussed with:		dhartmei
2012-09-18 10:54:56 +00:00
Gleb Smirnoff
9e8c4accee - Add $FreeBSD$ to allow modifications to this file.
- Move $OpenBSD$ to a more standard place.
2012-09-18 10:52:46 +00:00
Gleb Smirnoff
3b3a8eb937 o Create directory sys/netpfil, where all packet filters should
reside, and move there ipfw(4) and pf(4).

o Move most modified parts of pf out of contrib.

Actual movements:

sys/contrib/pf/net/*.c		-> sys/netpfil/pf/
sys/contrib/pf/net/*.h		-> sys/net/
contrib/pf/pfctl/*.c		-> sbin/pfctl
contrib/pf/pfctl/*.h		-> sbin/pfctl
contrib/pf/pfctl/pfctl.8	-> sbin/pfctl
contrib/pf/pfctl/*.4		-> share/man/man4
contrib/pf/pfctl/*.5		-> share/man/man5

sys/netinet/ipfw		-> sys/netpfil/ipfw

The arguable movement is pf/net/*.h -> sys/net. There are
future plans to refactor pf includes, so I decided not to
break things twice.

Not modified bits of pf left in contrib: authpf, ftp-proxy,
tftp-proxy, pflogd.

The ipfw(4) movement is planned to be merged to stable/9,
to make head and stable match.

Discussed with:		bz, luigi
2012-09-14 11:51:49 +00:00
Gleb Smirnoff
d6d3f01e0a Merge the projects/pf/head branch, that was worked on for last six months,
into head. The most significant achievements in the new code:

 o Fine grained locking, thus much better performance.
 o Fixes to many problems in pf, that were specific to FreeBSD port.

New code doesn't have that many ifdefs and much less OpenBSDisms, thus
is more attractive to our developers.

  Those interested in details, can browse through SVN log of the
projects/pf/head branch. And for reference, here is exact list of
revisions merged:

r232043, r232044, r232062, r232148, r232149, r232150, r232298, r232330,
r232332, r232340, r232386, r232390, r232391, r232605, r232655, r232656,
r232661, r232662, r232663, r232664, r232673, r232691, r233309, r233782,
r233829, r233830, r233834, r233835, r233836, r233865, r233866, r233868,
r233873, r234056, r234096, r234100, r234108, r234175, r234187, r234223,
r234271, r234272, r234282, r234307, r234309, r234382, r234384, r234456,
r234486, r234606, r234640, r234641, r234642, r234644, r234651, r235505,
r235506, r235535, r235605, r235606, r235826, r235991, r235993, r236168,
r236173, r236179, r236180, r236181, r236186, r236223, r236227, r236230,
r236252, r236254, r236298, r236299, r236300, r236301, r236397, r236398,
r236399, r236499, r236512, r236513, r236525, r236526, r236545, r236548,
r236553, r236554, r236556, r236557, r236561, r236570, r236630, r236672,
r236673, r236679, r236706, r236710, r236718, r237154, r237155, r237169,
r237314, r237363, r237364, r237368, r237369, r237376, r237440, r237442,
r237751, r237783, r237784, r237785, r237788, r237791, r238421, r238522,
r238523, r238524, r238525, r239173, r239186, r239644, r239652, r239661,
r239773, r240125, r240130, r240131, r240136, r240186, r240196, r240212.

I'd like to thank people who participated in early testing:

Tested by:	Florian Smeets <flo freebsd.org>
Tested by:	Chekaluk Vitaly <artemrts ukr.net>
Tested by:	Ben Wilber <ben desync.com>
Tested by:	Ian FREISLICH <ianf cloudseed.co.za>
2012-09-08 06:41:54 +00:00
Alexander V. Chernikov
73c23f3ba1 Fix the build broken by r240099.
Hide link_pfil_hook under _KERNEL macro.

MFC after:    3 weeks
2012-09-04 22:17:33 +00:00
Alexander V. Chernikov
7d4317bd40 Introduce new link-layer PFIL hook V_link_pfil_hook.
Merge ether_ipfw_chk() and part of bridge_pfil() into
unified ipfw_check_frame() function called by PFIL.
This change was suggested by rwatson? @ DevSummit.

Remove ipfw headers from ether/bridge code since they are unneeded now.

Note this thange introduce some (temporary) performance penalty since
PFIL read lock has to be acquired for every link-level packet.

MFC after:     3 weeks
2012-09-04 19:43:26 +00:00
Gleb Smirnoff
62208ca5d2 - Move jenkins.h to jenkins_hash.c
- Provide missing function that can do hashing of arbitrary sized buffer.
- Refetch lookup3.c and do only minimal edits to it, so that diff between
  our jenkins_hash.c and lookup3.c is minimal.
- Add declarations for jenkins_hash(), jenkins_hash32() to sys/hash.h.
- Document these functions in hash(9)

Obtained from:	http://burtleburtle.net/bob/c/lookup3.c
2012-09-04 12:07:33 +00:00
Gleb Smirnoff
3582a9f6c6 Change bridge(4) to use if_transmit for forwarding packets to underlying
interfaces instead of queueing.

Tested by:	ray
2012-09-03 10:08:20 +00:00
Gleb Smirnoff
3932d76033 In ifc_alloc_unit():
- In the !wildcard case, return ENOSPC instead of confusing EEXIST
  in case if ifc->ifc_maxunit reached.
- Fix unit leak, that I've introduced in previous revision.

Submitted by:	Daan Vreeken <Daan vitsch.nl>
2012-08-30 12:18:45 +00:00
John Baldwin
b90dde2fbc Fix a silly grammar bogon.
Submitted by:	Stephen McKay
2012-08-21 19:07:28 +00:00
John Baldwin
28cc4d37e6 Refine the changes made in r208212 to avoid bogus failures from
if_delmulti() when clearing the configuration for a subinterface when
the parent interface is being detached.  The current code was still
triggering an assertion in if_delmulti() due to the parent interface being
partially detached.  Fix this by not calling if_delmulti() at all if the
parent interface is being detached.  Warn if if_delmulti() fails when the
parent is not being detached (but similar to 208212, still proceed with
tearing down the vlan state).

Tested by:	ae@
MFC after:	1 month
2012-08-20 16:00:33 +00:00
John Baldwin
2541fcd953 Unexpand a couple of TAILQ_FOREACH()s. 2012-08-17 16:01:24 +00:00
Konstantin Belousov
1c771f9222 After the PHYS_TO_VM_PAGE() function was de-inlined, the main reason
to pull vm_param.h was removed.  Other big dependency of vm_page.h on
vm_param.h are PA_LOCK* definitions, which are only needed for
in-kernel code, because modules use KBI-safe functions to lock the
pages.

Stop including vm_param.h into vm_page.h. Include vm_param.h
explicitely for the kernel code which needs it.

Suggested and reviewed by:	alc
MFC after:    2 weeks
2012-08-05 14:11:42 +00:00
Gleb Smirnoff
ea53792942 Fix races between in_lltable_prefix_free(), lla_lookup(),
llentry_free() and arptimer():

o Use callout_init_rw() for lle timeout, this allows us safely
  disestablish them.
  - This allows us to simplify the arptimer() and make it
    race safe.
o Consistently use ifp->if_afdata_lock to lock access to
  linked lists in the lle hashes.
o Introduce new lle flag LLE_LINKED, which marks an entry that
  is attached to the hash.
  - Use LLE_LINKED to avoid double unlinking via consequent
    calls to llentry_free().
  - Mark lle with LLE_DELETED via |= operation istead of =,
    so that other flags won't be lost.
o Make LLE_ADDREF(), LLE_REMREF() and LLE_FREE_LOCKED() more
  consistent and provide more informative KASSERTs.

The patch is a collaborative work of all submitters and myself.

PR:		kern/165863
Submitted by:	Andrey Zonov <andrey zonov.org>
Submitted by:	Ryan Stone <rysto32 gmail.com>
Submitted by:	Eric van Gyzen <eric_van_gyzen dell.com>
2012-08-02 13:57:49 +00:00
Gleb Smirnoff
b1d86af706 The llentry_update() is used only by flowtable and the latter
always passes NULL pointer to it. Thus, code can be simplified
and function renamed to llentry_alloc() to match rtalloc().
2012-08-02 13:20:44 +00:00
Gleb Smirnoff
b9aee262e5 Some more whitespace cleanup. 2012-08-01 09:00:26 +00:00
Gleb Smirnoff
ea50c13ebe Some style(9) and whitespace changes.
Together with:	Andrey Zonov <andrey zonov.org>
2012-07-31 11:31:12 +00:00
Bjoern A. Zeeb
5e5c0e7980 Hardcode the loopback rx/tx checkum options for IPv6 to on without
checking. This allows the FreeBSD 9.1 release process to move forward.
Work around the problem that loopback connections to local addresses
not on loopback interfaces and not on interfaces w/ IPv6 checksum offloading
enabled would not work.
A proper fix to allow us to disable the "checksum offload" on loopback
for testing, measurements, ... as we allow for IPv4 needs to put in
place later.

Reported by:	tuexen, Matthew Seaman (m.seaman infracaninophile.co.uk)
Reported by:	Mike Andrews (mandrews bit0.com), kib, ...
PR:		kern/170070
MFC after:	1 day
X-MFC after:	re approval
2012-07-28 20:31:39 +00:00
Alexander V. Chernikov
99ab4b1297 Permit changing MTU in 6to4 relay.
This behavior is recommended by RFC 4213 clause 3.2.

Sometimes fragmentation is the least evil.
For example, some Linux IPVS kernels forwards
ICMPv6 checksums to real servers incorrectly.

Reviewed by:      hrs(previous version)
Approved by:      kib(mentor)
MFC after:        1 week
2012-07-15 17:44:27 +00:00
Ed Maste
bf7a35de2d Simplify error case
Submitted by:	thompsa@
2012-07-10 20:59:35 +00:00
Ed Maste
683fa2b5d7 Plug potential mbuf leak when bridging fragments
If an error occurs when transmitting one mbuf in a chain of fragments,
free the subsequent fragments instead of leaking them.

Sponsored by:   ADARA Networks
2012-07-10 13:17:32 +00:00
Mikolaj Golub
7edc3d88eb In epair_clone_destroy(), when destroying the second half, we have to
switch to its vnet before calling ether_ifdetach(). Otherwise if the
second half resides in a different vnet, if_detach() silently fails
leaving a stale pointer in V_ifnet list, and the system crashes trying
to access this pointer later.

Another solution could be not to allow to destroy epair unless both
ends are in the home vnet.

Discussed with:	bz
Tested by:	delphij
2012-07-09 20:38:18 +00:00
Ed Maste
21151865d5 Restore error handling lost in r191603
This was missed in the change from IFQ_ENQUEUE to if_transmit.

Sponsored by:   ADARA Networks
2012-07-09 14:16:49 +00:00
Ed Maste
d5fb967ab2 Implement SIOCGIFMEDIA for if_tap(4)
Appease certain if_tap(4) consumers by providing simulated Ethernet
media status.

DragonFly commit 70d9a675bf5441cc854a843ead702d08928c37f3

Obtained from:  DragonFly BSD
2012-07-06 23:17:30 +00:00
Gleb Smirnoff
bf9840512a When ip_output()/ip6_output() is supplied a struct route *ro argument,
it skips FLOWTABLE lookup. However, the non-NULL ro has dual meaning
here: it may be supplied to provide route, and it may be supplied to
store and return to caller the route that ip_output()/ip6_output()
finds. In the latter case skipping FLOWTABLE lookup is pessimisation.

The difference between struct route filled by FLOWTABLE and filled
by rtalloc() family is that the former doesn't hold a reference on
its rtentry. Reference is hold by flow entry, and it is about to
be released in future. Thus, route filled by FLOWTABLE shouldn't
be passed to RTFREE() macro.

- Introduce new flag for struct route/route_in6, that marks route
  not holding a reference on rtentry.
- Introduce new macro RO_RTFREE() that cleans up a struct route
  depending on its kind.
- All callers to ip_output()/ip6_output() that do supply non-NULL
  but empty route should use RO_RTFREE() to free results of
  lookup.
- ip_output()/ip6_output() now do FLOWTABLE lookup always when
  ro->ro_rt == NULL.

Tested by:	tuexen (SCTP part)
2012-07-04 07:37:53 +00:00
Andrew Thompson
61587a8403 Add the same check as vlan(4) where we ignore the ifnet departure event if the
interface is just being renamed.

PR:		kern/169557
Submitted by:	Mark Johnston
MFC after:	3 days
2012-06-30 19:09:02 +00:00
John Baldwin
304050dde0 Hold GIF_LOCK() for almost all of gif_start(). It is required to be held
across in_gif_output() and in6_gif_output() anyway, and once it is held
across those it might as well be held for the entire loop.  This simplifies
the code and removes the need for the custom IFF_GIF_WANTED flag (which
belonged in the softc and not as an IFF_* flag anyway).

Tested by:	Vincent Hoffman  vince  unsane co uk
2012-06-29 15:21:34 +00:00
Navdeep Parhar
09fe63205c - Updated TOE support in the kernel.
- Stateful TCP offload drivers for Terminator 3 and 4 (T3 and T4) ASICs.
  These are available as t3_tom and t4_tom modules that augment cxgb(4)
  and cxgbe(4) respectively.  The cxgb/cxgbe drivers continue to work as
  usual with or without these extra features.

- iWARP driver for Terminator 3 ASIC (kernel verbs).  T4 iWARP in the
  works and will follow soon.

Build-tested with make universe.

30s overview
============
What interfaces support TCP offload?  Look for TOE4 and/or TOE6 in the
capabilities of an interface:
# ifconfig -m | grep TOE

Enable/disable TCP offload on an interface (just like any other ifnet
capability):
# ifconfig cxgbe0 toe
# ifconfig cxgbe0 -toe

Which connections are offloaded?  Look for toe4 and/or toe6 in the
output of netstat and sockstat:
# netstat -np tcp | grep toe
# sockstat -46c | grep toe

Reviewed by:	bz, gnn
Sponsored by:	Chelsio communications.
MFC after:	~3 months (after 9.1, and after ensuring MFC is feasible)
2012-06-19 07:34:13 +00:00
Randall Stewart
4013878828 Fix comment to better reflect how we are
cheating and using the csum_data. Also fix
style issues with the comments.
2012-06-12 13:31:32 +00:00
Randall Stewart
cef68c63ec Note to self. Have morning coffee *before* committing things.
There is no mac_addr in the mbuf for BSD.. cheat like
we are supposed to and use the csum field since our friend
the gif tunnel itself will never use offload.
2012-06-12 12:44:17 +00:00
Randall Stewart
6f17e3a31a Opps forgot to commit the flag. 2012-06-12 12:40:15 +00:00
Randall Stewart
776b728856 Allow a gif tunnel to be used with ALTq.
Reviewed by:	gnn
2012-06-12 10:44:09 +00:00
Andrew Thompson
08e348234c Fix a panic I introduced in r234487, the bridge softc pointer is set to null
early in the detach so rearrange things not to explode.

Reported by:	David Roffiaen, Gustau Perez Querol
Tested by:	David Roffiaen
MFC after:	3 days
2012-06-11 20:12:13 +00:00
Alexander V. Chernikov
4fe83b8159 Fix typo introduced in r236559.
Pointed by:   bcr
Approved by:  kib(mentor)
2012-06-09 10:04:40 +00:00
Mikolaj Golub
62b1b42507 Sort includes.
Submitted by:	Daan Vreeken <pa4dan Bliksem.VEHosting.nl>
MFC after:	3 days
2012-06-07 19:48:45 +00:00
Mikolaj Golub
fef68bb8dc Add VIMAGE support to if_tap.
PR:		kern/152047, kern/158686
Submitted by:	Daan Vreeken <pa4dan Bliksem.VEHosting.nl>
MFC after:	1 week
2012-06-07 19:46:46 +00:00
Alexander V. Chernikov
784292f89a Fix panic introduced by r235745. Panic occurs after first packet traverse renamed interface.
Add several comments on locking

Found by:         avg
Approved by:      ae(mentor)
Tested by:        avg
MFC after:        1 week
2012-06-04 12:36:58 +00:00
Michael Tuexen
a6cff10f2a Seperate SCTP checksum offloading for IPv4 and IPv6.
While there: remove some trainling whitespaces.

MFC after: 3 days
X-MFC with: 236170
2012-05-30 20:56:07 +00:00
Jung-uk Kim
9b7d4a7f2d Fix style(9) nits, reduce unnecessary type castings, etc., for bpf_setf(). 2012-05-29 22:28:46 +00:00
Jung-uk Kim
8b04b48a7d - Save the previous filter right before we set new one.
- Reduce duplicate code and make it little easier to read.

MFC after:	2 weeks
2012-05-29 22:21:53 +00:00
Jung-uk Kim
6f731135ac Fix 32-bit shim for BIOCSETF to drop all packets buffered on the descriptor
and reset statistics as it should.

MFC after:	3 days
2012-05-29 18:44:53 +00:00
Alexander V. Chernikov
a86227d176 Fix BPF_JITTER code broken by r235746.
Pointed by:       jkim
Reviewed by:      jkim (except locking changes)
Approved by:      (mentor)
MFC after:        2 weeks
2012-05-29 12:52:30 +00:00
Eygene Ryabinkin
f74d5a7a20 if_lagg: allow to invoke SIOCSLAGGPORT multiple times in a row
Currently, 'ifconfig laggX down' does not remove members from this
lagg(4) interface.  So, 'service netif stop laggX' followed by
'service netif start laggX' will choke, because "stop" will leave
interfaces attached to the laggX and ifconfig from the "start" will
refuse to add already-existing interfaces.

The real-world case is when I am bundling together my Ethernet and
WiFi interfaces and using multiple profiles for accessing network in
different places: system being booted up with one profile, but later
this profile being exchanged to another one, followed by 'service
netif restart' will not add WiFi interface back to the lagg: the
"stop" action from 'service netif restart' will shut down my main WiFi
interface, so wlan0 that exists in the lagg0 will be destroyed and
purged from lagg0; the "start" action will try to re-add both
interfaces, but since Ethernet one is already in lagg0, ifconfig will
refuse to add the wlan0 from WiFi interface.

Since adding the interface to the lagg(4) when it is already here
should be an idempotent action: we're really not changing anything,
so this fix doesn't change the semantics of interface addition.

Approved by: thompsa
Reviewed by: emaste
MFC after: 1 week
2012-05-28 12:13:04 +00:00
Bjoern A. Zeeb
356ab07e2d It turns out that too many drivers are not only parsing the L2/3/4
headers for TSO but also for generic checksum offloading.  Ideally we
would only have one common function shared amongst all drivers, and
perhaps when updating them for IPv6 we should introduce that.
Eventually we should provide the meta information along with mbufs to
avoid (re-)parsing entirely.

To not break IPv6 (checksums and offload) and to be able to MFC the
changes without risking to hurt 3rd party drivers, duplicate the v4
framework, as other OSes have done as well.

Introduce interface capability flags for TX/RX checksum offload with
IPv6, to allow independent toggling (where possible).  Add CSUM_*_IPV6
flags for UDP/TCP over IPv6, and reserve further for SCTP, and IPv6
fragmentation.  Define CSUM_DELAY_DATA_IPV6 as we do for legacy IP and
add an alias for CSUM_DATA_VALID_IPV6.

This pretty much brings IPv6 handling in line with IPv4.
TSO is still handled in a different way and not via if_hwassist.

Update ifconfig to allow (un)setting of the new capability flags.
Update loopback to announce the new capabilities and if_hwassist flags.

Individual driver updates will have to follow, as will SCTP.

Reported by:	gallatin, dim, ..
Reviewed by:	gallatin (glanced at?)
MFC after:	3 days
X-MFC with:	r235961,235959,235958
2012-05-28 09:30:13 +00:00
Andrew Thompson
5fc4c149ab Turn LACP debugging from a compile time option to a sysctl, it is very handy to
be able to turn it on when negotiation to a switch misbehaves.

Submitted by:	Andrew Boyer
MFC after:	3 days
2012-05-26 08:09:01 +00:00
Bjoern A. Zeeb
d4b93a67d9 MFp4 bz_ipv6_fast:
Simple yet effective change enabling checksum "offload" on loopback
  for IPv6 to avoid expensive computations.

  Sponsored by:	The FreeBSD Foundation
  Sponsored by:	iXsystems

Reviewed by:	gnn (as part of the whole)
MFC After:	3 days
2012-05-25 02:21:17 +00:00
Alexander V. Chernikov
97aacec622 Make most BPF ioctls() SMP-safe.
Approved by:      kib(mentor)
MFC in:            4 weeks
2012-05-21 22:21:00 +00:00
Alexander V. Chernikov
c7b0200eb5 Call bpf_jitter() before acquiring BPF global lock due to malloc() being used inside bpf_jitter.
Eliminate bpf_buffer_alloc() and allocate BPF buffers on descriptor creation and BIOCSBLEN ioctl.
This permits us not to allocate buffers inside bpf_attachd() which is protected by global lock.

Approved by:      kib(mentor)
MFC in:            4 weeks
2012-05-21 22:19:19 +00:00
Alexander V. Chernikov
afa85850e7 Fix old panic when BPF consumer attaches to destroying interface.
'flags' field is added to the end of bpf_if structure. Currently the only
flag is BPFIF_FLAG_DYING which is set on bpf detach and checked by bpf_attachd()
Problem can be easily triggered on SMP stable/[89] by the following command (sort of):
'while true; do ifconfig vlan222 create vlan 222 vlandev em0 up ; tcpdump -pi vlan222 & ; ifconfig vlan222 destroy ; done'

Fix possible use-after-free when BPF detaches itself from interface, freeing bpf_bif memory,
while interface is still UP and there can be routes via this interface.
Freeing is now delayed till ifnet_departure_event is received via eventhandler(9) api.

Convert bpfd rwlock back to mutex due lack of performance gain (currently checking if packet
matches filter is done without holding bpfd lock and we have to acquire write lock if packet matches)

Approved by:      kib(mentor)
MFC in:            4 weeks
2012-05-21 22:17:29 +00:00
Alexander V. Chernikov
6c74ff0ea6 Fix panic on attaching to non-existent interface (introduced by r233937, pointed by hrs@)
Fix panic on tcpdump being attached to interface being removed (introduced by r233937, pointed by hrs@ and adrian@)
Protect most of bpf_setf() by BPF global lock

Add several forgotten assertions (thanks to adrian@)

Document current locking model inside bpf.c
Document EVENTHANDLER(9) usage inside BPF.

Approved by:       kib(mentor)
Tested by:         gnn
MFC in:            4 weeks
2012-05-21 22:13:48 +00:00
Marcel Moolenaar
cf0d539f8b Use the LLINDEX macro to access the link-level I/F index. This makes
it possible to work with a different type for the sdl_index field --
it only requires a recompile.

Obtained from:	Juniper Networks, Inc.
2012-05-19 02:39:43 +00:00
Xin LI
47db53c31a Sync DLTs with the latest pcap version.
MFC after:	2 weeks
2012-05-14 05:10:41 +00:00
Alexander V. Chernikov
bdf942c3f0 Revert r234834 per luigi@ request.
Cleaner solution (e.g. adding another header) should be done here.

Original log:
  Move several enums and structures required for L2 filtering from ip_fw_private.h to ip_fw.h.
  Remove ipfw/ip_fw_private.h header from non-ipfw code.

Requested by:      luigi
Approved by:       kib(mentor)
2012-05-03 08:56:43 +00:00
Ed Maste
6107adc3f5 Relax restriction on direct tx to child ports
Lagg(4) restricts the type of packet that may be sent directly to a child
port, to avoid undesired output from accidental misconfiguration.
Previously only ETHERTYPE_PAE was permitted.

BPF writes to a lagg(4) child port are presumably intentional, so just
allow them, while still blocking other packets that should take the
aggregation path.

PR:		kern/138620
Approved by:	thompsa@
2012-05-03 01:41:12 +00:00
Alexander V. Chernikov
7bd5e9b143 Move several enums and structures required for L2 filtering from ip_fw_private.h to ip_fw.h.
Remove ipfw/ip_fw_private.h header from non-ipfw code.

Approved by:        ae(mentor)
MFC after:          2 weeks
2012-04-30 10:22:23 +00:00
Alexander V. Chernikov
c2508034a2 Do not require radix write lock to be held while dumping route table
via sysctl(4) interface. This permits router not to stop forwarding
packets while route table is being written to user-supplied buffer.

Reported by:        Pawel Tyll <ptyll@nitronet.pl>
Approved by:        kib(mentor)

MFC after:          1 week
2012-04-22 16:13:23 +00:00
Andrew Thompson
2885c19ebd Move the interface media check to a taskqueue, some interfaces (usb) sleep
during SIOCGIFMEDIA and we were holding locks.
2012-04-20 10:06:28 +00:00
Andrew Thompson
7702d4013b Add linkstate to bridge(4), set the link to up when at least one underlying
interface is up, otherwise the link is down.

This, among other things, allows carp to work on a bridge.

Prodded by:	glebius
Tested by:	Alexander Lunev
2012-04-20 09:55:50 +00:00
Andrew Thompson
ddf3201009 Remove KASSERTS, they do not add any value here since the pointer is about to
be derefernced anyway.
2012-04-18 01:39:14 +00:00
Luigi Rizzo
d76bf4ff7b A bit of cleanup in the names of fields of netmap-related structures.
Use the name 'ring' instead of 'queue' in all fields.
Bump NETMAP_API.
2012-04-13 16:03:07 +00:00
Luigi Rizzo
d5d42003f4 remove an unnecessary #define 2012-04-12 10:32:34 +00:00
Andrew Thompson
b517176ad9 Set the proto to LAGG_PROTO_NONE before calling the detach routine so packets
are discarded, this is an issue because lacp drops the lock which may allow
network threads to access freed memory. Expand the lock coverage so the
detach/attach happen atomically.

Submitted by:	Andrew Boyer (earlier version)
2012-04-12 01:07:17 +00:00
John Baldwin
19a3210a66 Add media types for 40G media that might be used with FreeBSD.
Reviewed by:	bz
MFC after:	2 weeks
2012-04-10 13:59:35 +00:00
Alexander V. Chernikov
9431cc1696 Fix build broken by r233938.
Pointed by:     David Wolfskill <david@catwhisker.org>
Approved by:    kib (mentor)
Pointy hat to:  melifaro
2012-04-06 13:34:19 +00:00
Alexander V. Chernikov
51ec1eb70d - Improve performace for writer-only BPF users.
Linux and Solaris (at least OpenSolaris) has PF_PACKET socket families to send
raw ethernet frames. The only FreeBSD interface that can be used to send raw frames
is BPF. As a result, many programs like cdpd, lldpd, various dhcp stuff uses
BPF only to send data. This leads us to the situation when software like cdpd,
being run on high-traffic-volume interface significantly reduces overall performance
since we have to acquire additional locks for every packet.

Here we add sysctl that changes BPF behavior in the following way:
If program came and opens BPF socket without explicitly specifyin read filter we
assume it to be write-only and add it to special writer-only per-interface list.
This makes bpf_peers_present() return 0, so no additional overhead is introduced.
After filter is supplied, descriptor is added to original per-interface list permitting
packets to be captured.

Unfortunately, pcap_open_live() sets catch-all filter itself for the purpose of
setting snap length.

Fortunately, most programs explicitly sets (event catch-all) filter after that.
tcpdump(1) is a good example.

So a bit hackis approach is taken: we upgrade description only after second
BIOCSETF is received.

Sysctl is named net.bpf.optimize_writers and is turned off by default.

- While here, document all sysctl variables in bpf.4

Sponsored by Yandex LLC

Reviewed by:    glebius (previous version)
Reviewed by:    silence on -net@
Approved by:    (mentor)

MFC after:      4 weeks
2012-04-06 06:55:21 +00:00
Alexander V. Chernikov
e4b3229aa5 - Improve BPF locking model.
Interface locks and descriptor locks are converted from mutex(9) to rwlock(9).
This greately improves performance: in most common case we need to acquire 1
reader lock instead of 2 mutexes.

- Remove filter(descriptor) (reader) lock in bpf_mtap[2]
This was suggested by glebius@. We protect filter by requesting interface
writer lock on filter change.

- Cover struct bpf_if under BPF_INTERNAL define. This permits including bpf.h
without including rwlock stuff. However, this is is temporary solution,
struct bpf_if should be made opaque for any external caller.

Found by:       Dmitrij Tejblum <tejblum@yandex-team.ru>
Sponsored by:   Yandex LLC

Reviewed by:    glebius (previous version)
Reviewed by:    silence on -net@
Approved by:    (mentor)

MFC after:      3 weeks
2012-04-06 06:53:58 +00:00
John Baldwin
02ed02af7b Retire the IF_ADDR_LOCK() and IF_ADDR_UNLOCK() compat macros from HEAD.
The new [RW]LOCK macros are merged back to 8.x so should be suitable for
new code in HEAD even if it is to be MFC'd.
2012-03-19 21:09:12 +00:00
Bjoern A. Zeeb
bfca216eb9 Hide kernel option ROUTETABLES evaluations in the implementation
rather than the header file.  With this also move RT_MAXFIBS and
RT_NUMFIBS into the implemantion to avoid further usage in other
code. rt_numfibs is all that should be needed.

This allows users to change the number of FIBs from 1..RT_MAXFIBS(16)
dynamically using the tunable without the need to change the kernel
config for the maximum anymore.  This means that thet multi-FIB
feature is now fully available with GENERIC kernels.
The kernel option ROUTETABLES can still be used to set the default
numbers of FIBs in absence of the tunable.

Ok.ed by:	julian, hrs, melifaro
MFC after:	2 weeks
2012-03-18 11:23:40 +00:00
Luigi Rizzo
a72505824c - remove an extra parenthesis in a closing brace;
- add the macro NETMAP_RING_FIRST_RESERVED() which returns
  the index of the first non-released buffer in the ring
  (this is useful for code that retains buffers for some time
  instead of processing them immediately)
2012-03-11 17:35:12 +00:00
Andrew Thompson
cd613b6351 Move the vlan buffer space into the union which also fixes an unused variable
warning with !INET & !INET6.

Spotted by:	pluknet
2012-03-07 07:22:53 +00:00
Andrew Thompson
86f67641a9 Add the ability to set which packet layers are used for the load balance hash
calculation.
2012-03-06 22:58:13 +00:00
Marko Zec
2db13e7575 Properly restore curvnet context when returning early from
ether_input_internal().

This change only affects options VIMAGE kernel builds.

PR:		kern/165643
Submitted by:	Vijay Singh
MFC after:	3 days
2012-03-04 11:11:03 +00:00
Juli Mallett
9624d94701 o) Add COMPAT_FREEBSD32 support for MIPS kernels using the n64 ABI with userlands
using the o32 ABI.  This mostly follows nwhitehorn's lead in implementing
   COMPAT_FREEBSD32 on powerpc64.
o) Add a new type to the freebsd32 compat layer, time32_t, which is time_t in the
   32-bit ABI being used.  Since the MIPS port is relatively-new, even the 32-bit
   ABIs use a 64-bit time_t.
o) Because time{spec,val}32 has the same size and layout as time{spec,val} on MIPS
   with 32-bit compatibility, then, disable some code which assumes otherwise
   wrongly when built for MIPS.  A more general macro to check in this case would
   seem like a good idea eventually.  If someone adds support for using n32
   userland with n64 kernels on MIPS, then they will have to add a variety of
   flags related to each piece of the ABI that can vary.  That's probably the
   right time to generalize further.
o) Add MIPS to the list of architectures which use PAD64_REQUIRED in the
   freebsd32 compat code.  Probably this should be generalized at some point.

Reviewed by:	gonzo
2012-03-03 08:19:18 +00:00
Andrew Thompson
70b23a4596 Use a more appropriate default for the maximum number of addresses in the
bridge forwarding table.

PR:		docs/164564
Discussed with:	brueffer
2012-02-29 20:58:21 +00:00
Luigi Rizzo
64ae02c365 A bunch of netmap fixes:
USERSPACE:
1. add support for devices with different number of rx and tx queues;

2. add better support for zero-copy operation, adding an extra field
   to the netmap ring to indicate how many buffers we have already processed
   but not yet released (with help from Eddie Kohler);

3. The two changes above unfortunately require an API change, so while
   at it add a version field and some spares to the ioctl() argument
   to help detect mismatches.

4. update the manual page for the two changes above;

5. update sample applications in tools/tools/netmap

KERNEL:

1. simplify the internal structures moving the global wait queues
   to the 'struct netmap_adapter';

2. simplify the functions that map kring<->nic ring indexes

3. normalize device-specific code, helps mainteinance;

4. start exploring the impact of micro-optimizations (prefetch etc.)
   in the ixgbe driver.
   Use 'legacy' descriptors on the tx ring and prefetch slots gives
   about 20% speedup at 900 MHz. Another 7-10% would come from removing
   the explict calls to bus_dmamap* in the core (they are effectively
   NOPs in this case, but it takes expensive load of the per-buffer
   dma maps to figure out that they are all NULL.

   Rx performance not investigated.

I am postponing the MFC so i can import a few more improvements
before merging.
2012-02-27 19:05:01 +00:00
Andrew Thompson
8d45bd6e80 Only look for a usable MAC address for the bridge ID from ports within our
bridge, this allows us to have more than one independent bridge in the same
STP domain.

PR:		kern/164369
Submitted by:	Nikos Vassiliadis (earlier version)
MFC after:	2 weeks
2012-02-24 17:50:36 +00:00
Andrew Thompson
3122b9120c Add a sysctl/tunable default value for the use_flowid sysctl in r232008. 2012-02-23 21:56:53 +00:00
Andrew Thompson
47190ea664 Indicate this function decrements the timer as well as testing for expiry. 2012-02-23 20:58:52 +00:00
Kip Macy
a93cda789a When using flowtable llentrys can outlive the interface with which they're associated
at which the lle_tbl pointer points to freed memory and the llt_free pointer is no longer
valid.

Move the free pointer in to the llentry itself and update the initalization sites.

MFC after:	2 weeks
2012-02-23 18:21:37 +00:00
Andrew Thompson
2ad65e315d Now that network interfaces advertise if they support linkstate notifications
we do not need to perform a media ioctl every 15 seconds.
2012-02-23 06:26:16 +00:00
Andrew Thompson
4661f8627c bstp_input() always consumes the packet so remove the mbuf handling dance
around it.

Obtained from:	OpenBSD (r1.37)
2012-02-23 00:59:21 +00:00
Andrew Thompson
0bf97ae271 Using the flowid in the mbuf assumes the network card is giving a good hash for
the traffic flow, this may not be the case giving poor traffic distribution.
Add a sysctl which allows us to fall back to our own flow hash code.

PR:		kern/164901
Submitted by:	Eugene Grosbein
MFC after:	1 week
2012-02-22 22:01:30 +00:00
Bjoern A. Zeeb
9dba179d5e IFC @231845
Sponsored by:	Cisco Systems, Inc.
2012-02-17 00:27:48 +00:00
Tijl Coosemans
265f940acc Change some headers such that lang/gcc* ports no longer patch them.
The lang/gcc* ports patch headers where they think something is
non-standard. These patched headers override the system headers which means
you have to rebuild these ports whenever you do installworld to make sure
they contain the latest changes.
2012-02-14 12:50:20 +00:00
Bjoern A. Zeeb
6d076ae8f7 Introduce a new NET_RT_IFLISTL API to query the address list. It works
on extended and extensible structs if_msghdrl and ifa_msghdrl.  This
will allow us to extend both the msghdrl structs and eventually if_data
in the future without breaking the ABI.

Bump __FreeBSD_version to allow ports to more easily detect the new API.

Reviewed by:	glebius, brooks
MFC after:	3 days
2012-02-11 06:02:16 +00:00
Bjoern A. Zeeb
e82cf13bfb Backout changes from r228571. Remove if_data from struct ifa_msghdr again.
While this breaks carp on HEAD temporary, it restores the upgrade path from
stable, and head before 20111215.

Reviewed by:	glebius, brooks
2012-02-11 05:59:54 +00:00
Sergey Kandaurov
4ecf274be7 g/c last bit of old ipv6 prefix management.
Reviewed by:	bz
Obtained from:	NetBSD, net/if.h, rev 1.80
2012-02-08 22:05:26 +00:00
Luigi Rizzo
5819da83ce - change the buffer size from a constant to a
TUNABLE variable (hw.netmap.buf_size) so we can experiment
  with values different from 2048 which may give better cache performance.

- rearrange the memory allocation code so it will be easier
  to replace it with a different implementation. The current code
  relies on a single large contiguous chunk of memory obtained through
  contigmalloc.
  The new implementation (not committed yet) uses multiple
  smaller chunks which are easier to fit in a fragmented address
  space.
2012-02-08 11:43:29 +00:00
Pawel Jakub Dawidek
50c8ec53f6 Allow to set if_bridge(4) sysctls from /boot/loader.conf.
MFC after:	3 days
2012-02-07 14:50:33 +00:00
Gleb Smirnoff
e8aa8bdd64 Fix typo in r231010.
Submitted by:	linimon
2012-02-05 12:52:28 +00:00
Gleb Smirnoff
cad582b753 Better comment for ifa_init(), ifa_ref(), ifa_free(). 2012-02-05 08:53:05 +00:00
Gleb Smirnoff
adc7231d5d In ifa_init() initialize if_data.ifi_datalen. This would be
required after upcoming changes from bz@.

Discussed with:	bz
2012-02-05 08:31:15 +00:00
Bjoern A. Zeeb
81d5d46b3c Add multi-FIB IPv6 support to the core network stack supplementing
the original IPv4 implementation from r178888:

- Use RT_DEFAULT_FIB in the IPv4 implementation where noticed.
- Use rt*fib() KPI with explicit RT_DEFAULT_FIB where applicable in
  the NFS code.
- Use the new in6_rt* KPI in TCP, gif(4), and the IPv6 network stack
  where applicable.
- Split in6_rtqtimo() and in6_mtutimo() as done in IPv4 and equally
  prevent multiple initializations of callouts in in6_inithead().
- Use wrapper functions where needed to preserve the current KPI to
  ease MFCs.  Use BURN_BRIDGES to indicate expected future cleanup.
- Fix (related) comments (both technical or style).
- Convert to rtinit() where applicable and only use custom loops where
  currently not possible otherwise.
- Multicast group, most neighbor discovery address actions and faith(4)
  are locked to the default FIB.  Individual IPv6 addresses will only
  appear in the default FIB, however redirect information and prefixes
  of connected subnets are automatically propagated to all FIBs by
  default (mimicking IPv4 behavior as closely as possible).

Sponsored by:	Cisco Systems, Inc.
2012-02-03 13:08:44 +00:00
Bjoern A. Zeeb
a84986256f Move a comment from rtinit1() to the top of the file where dealing with
the (maximum) number of FIBs trying to clarify that evetually FIBs
should probably attached to domain(9) specific storage. [1]

Add a comment on a limitimation on the rt_add_addr_allfibs option.

Use RT_DEFAULT_FIB instead of 0 where applicable.

Add empty line to functions without local variables per style.

Put public yet unused in-tree function rtinit_fib() under BURN_BRIDGES
to indicate that it might go away in the future.

No functional change.

Discussed with:	julian [1] (clarification on what the original one meant)
Sponsored by:	Cisco Systems, Inc.
2012-02-03 12:25:14 +00:00
Bjoern A. Zeeb
b3dd077152 Minor optimization doing input validation with a possible early return
before doing further work.

Sponsored by:	Cisco Systems, Inc.
2012-02-03 11:20:11 +00:00
Bjoern A. Zeeb
096f27864f Fix FLOWTABLE IPv6 handling in route.c missed in r205066.
While doing so, for consistency with the rtalloc_ign_fib(9) interface
called, remove the "in_" prefix from rtalloc_ign_wrapper() no longer
indicating that it would only handle the INET case.

Sponsored by:	Cisco Systems, Inc.
2012-02-03 10:17:34 +00:00
Bjoern A. Zeeb
b680a383a8 Allow for IPv6 to allocate (and in the VIMAGE case free) as many routing
tables (FIBs) as IPv4.
Prepare various general rt* functions for multi-FIB IPv6 handling in
addition to already existing multi-FIB IPv4 cases.

Sponsored by:	Cisco Systems, Inc.
2012-02-03 09:23:55 +00:00
Bjoern A. Zeeb
556d81ddd7 Rather than putting magic 0s as FIB argument into the rt* calls, provide
a macro RT_DEFAULT_FIB defined to 0 to more easily identify the cases
tied to the default FIB.

Sponsored by:	Cisco Systems, Inc.
2012-02-03 09:06:24 +00:00