Commit Graph

3471 Commits

Author SHA1 Message Date
Bryan Drewery
2780ba06c7 Avoid passing an uninitialized 'i'. Currently nothing was depending on it
anyhow.

Coverity CID:	1331562
2015-10-29 18:58:18 +00:00
Andrey V. Elsukov
50bc87bc43 Check the size of data available in mbuf, before using them.
PR:		202667
MFC after:	1 week
2015-10-28 17:55:37 +00:00
Kristof Provost
2602284308 pf: Fix compliation warning with gcc
While fixing the PF_ANEQ() macro I messed up the parentheses, leading to
compliation warnings with gcc.

Spotted by:     ian
Pointy Hat:     kp
2015-10-25 18:09:03 +00:00
Kristof Provost
7d7624233a PF_ANEQ() macro will in most situations returns TRUE comparing two identical
IPv4 packets (when it should return FALSE). It happens because PF_ANEQ() doesn't
stop if first 32 bits of IPv4 packets are equal and starts to check next 3*32
bits (like for IPv6 packet). Those bits containt some garbage and in result
PF_ANEQ() wrongly returns TRUE.

Fix: Check if packet is of AF_INET type and if it is then compare only first 32
bits of data.

PR:		204005
Submitted by:	Miłosz Kaniewski
2015-10-25 13:14:53 +00:00
Ed Maste
40a02d00a5 if_tap: correct typo in sysctl description (Enably)
Sponsored by:	The FreeBSD Foundation
2015-10-21 19:56:16 +00:00
Alexander V. Chernikov
f221bcaa06 Remove several compat functions from pre-fib era. 2015-10-17 17:26:44 +00:00
Hiroki Sato
b7a581eaa6 Fix a panic when destroying a lagg interface.
Differential Revision:	https://reviews.freebsd.org/D3883
2015-10-16 01:16:01 +00:00
Kristof Provost
c110fc49da pf: Fix TSO issues
In certain configurations (mostly but not exclusively as a VM on Xen) pf
produced packets with an invalid TCP checksum.

The problem was that pf could only handle packets with a full checksum. The
FreeBSD IP stack produces TCP packets with a pseudo-header checksum (only
addresses, length and protocol).
Certain network interfaces expect to see the pseudo-header checksum, so they
end up producing packets with invalid checksums.

To fix this stop calculating the full checksum and teach pf to only update TCP
checksums if TSO is disabled or the change affects the pseudo-header checksum.

PR:		154428, 193579, 198868
Reviewed by:	sbruno
MFC after:	1 week
Relnotes:	yes
Sponsored by:	RootBSD
Differential Revision:	https://reviews.freebsd.org/D3779
2015-10-14 16:21:41 +00:00
Hiroki Sato
023d10cbc7 Fix a bug that caused reinitialization failure of MAC addresses on
the lagg interface when removing the primary port.

PR:			201916
Differential Revision:	https://reviews.freebsd.org/D3301
2015-10-07 06:32:34 +00:00
Marcelo Araujo
973532fc7d Remove per complete the fec aggregation protocol.
The remove began with revision r271733.

NOTE: This patch must never be merge to 10-Stable

Reviewed by:	glebius
Approved by:	bapt (mentor)
Relnotes:	Yes
Sponsored by:	EuroBSDCon Sweden.
Differential Revision:	D3786
2015-10-04 08:00:29 +00:00
Hiroki Sato
f1aaad0cd9 Add IFCAP_LINKSTATE support. 2015-10-03 09:15:23 +00:00
Andrey V. Elsukov
1a6fb597b0 Always detach encap handler when reconfiguring tunnel.
Reported by:	hrs
MFC after:	1 week
2015-10-03 03:57:58 +00:00
Alexander V. Chernikov
1558cb2448 Eliminate nd6_nud_hint() and its TCP bindings.
Initially function was introduced in r53541 (KAME initial commit) to
  "provide hints from upper layer protocols that indicate a connection
  is making "forward progress"" (quote from RFC 2461 7.3.1 Reachability
  Confirmation).
However, it was converted to do nothing (e.g. just return) in r122922
  (tcp_hostcache implementation) back in 2003. Some defines were moved
  to tcp_var.h in r169541. Then, it was broken (for non-corner cases)
  by r186119 (L2<>L3 split) in 2008 (NULL ifp in nd6_lookup). So,
  right now this code is broken and has no "real" base users.

Differential Revision:	https://reviews.freebsd.org/D3699
2015-09-27 05:29:34 +00:00
Alexander V. Chernikov
1fe201c322 Simplify the way of attaching IPv6 link-layer header.
Problem description:
How do we currently perform layer 2 resolution and header imposition:

For IPv4 we have the following chain:
  ip_output() -> (ether|atm|whatever)_output() -> arpresolve()

Lookup is done in proper place (link-layer output routine) and it is possible
  to provide cached lle data.

For IPv6 situation is more complex:
  ip6_output() -> nd6_output() -> nd6_output_ifp() -> (whatever)_output() ->
    nd6_storelladdr()

We have ip6_ouput() which calls nd6_output() instead of link output routine.
nd6_output() does the following:
  * checks if lle exists, creates it if needed (similar to arpresolve())
  * performes lle state transitions (similar to arpresolve())
  * calls nd6_output_ifp() which pushes packets to link output routine along
    with running SeND/MAC hooks regardless of lle state
    (e.g. works as run-hooks placeholder).

After that, iface output routine like ether_output() calls nd6_storelladdr()
  which performs lle lookup once again.

As a result, we perform lookup twice for each outgoing packet for most types
  of interfaces. We also need to maintain runtime-checked table of 'nd6-free'
  interfaces (see nd6_need_cache()).

Fix this behavior by eliminating first ND lookup. To be more specific:
  * make all nd6_output() consumers use nd6_output_ifp() instead
  * rename nd6_output[_slow]() to nd6_resolve_[slow]()
  * convert nd6_resolve() and nd6_resolve_slow() to arpresolve() semantics,
    e.g. copy L2 address to buffer instead of pushing packet towards lower
    layers
  * Make all nd6_storelladdr() users use nd6_resolve()
  * eliminate nd6_storelladdr()

The resulting callchain is the following:
  ip6_output() -> nd6_output_ifp() -> (whatever)_output() -> nd6_resolve()

Error handling:
Currently sending packet to non-existing la results in ip6_<output|forward>
  -> nd6_output() -> nd6_output _lle() which returns 0.
In new scenario packet is propagated to <ether|whatever>_output() ->
  nd6_resolve() which will return EWOULDBLOCK, and that result
  will be converted to 0.

(And EWOULDBLOCK is actually used by IB/TOE code).

Sponsored by:		Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D1469
2015-09-16 14:26:28 +00:00
Andrey V. Elsukov
b71bed24a6 Use KASSERT for some checks, that are late to do.
Discussed with:	melifaro, glebius
2015-09-16 13:17:00 +00:00
Oleg Bulyzhin
3f70ebbf05 Remove superfluous m_freem().
MFC after:	1 month
2015-09-16 10:07:45 +00:00
Alexander V. Chernikov
59c180c35c Unify loopback route switching:
* prepare gateway before insertion
* use RTM_CHANGE instead of explicit find/change route
* Remove fib argument from ifa_switch_loopback_route added in r264887:
  if old ifp fib differes from new one, that the caller
  is doing something wrong
* Make ifa_*_loopback_route call single ifa_maintain_loopback_route().
2015-09-16 06:23:15 +00:00
Alexander V. Chernikov
d3cdb71655 * Require explicitl lle unlink prior to calling llentry_delete().
This one slightly decreases time of holding afdata wlock.
* While here, make nd6_free() return void. No one has used its return value
  since r186119.
2015-09-15 06:48:19 +00:00
Eric van Gyzen
17a036563d Fix the handling of IPv6 On-Link Redirects.
On receipt of a redirect message, install an interface route for the
redirected destination.  On removal of the corresponding Neighbor Cache
entry, remove the interface route.

This requires changes in rtredirect_fib() to cope with an AF_LINK
address for the gateway and with the absence of RTF_GATEWAY.

This fixes the "Redirected On-Link" test cases in the Tahi IPv6 Ready Logo
Phase 2 test suite.

Unrelated to the above, fix a recursion on the radix node head lock
triggered by the Tahi Redirected to Alternate Router test cases.

When I first wrote this patch in October 2012, all Section 2
(Neighbor Discovery) test cases passed on 10-CURRENT, 9-STABLE,
and 8-STABLE.  cem@ recently rebased the 10.x patch onto head and reported
that it passes Tahi.  (Thanks!)

These other test cases also passed in 2012:

* the RTF_MODIFIED case, with IPv4 and IPv6 (using a
  RTF_HOST|RTF_GATEWAY route for the destination)

* the redirected-to-self case, with IPv4 and IPv6

* a valid IPv4 redirect

All testing in 2012 was done with WITNESS and INVARIANTS.

Tested by:    EMC / Isilon Storage Division via Conrad Meyer (cem) in 2015,
              Mark Kelley <mark_kelley@dell.com> in 2012,
              TC Telkamp <terence_telkamp@dell.com> in 2012
PR:           152791
Reviewed by:  melifaro (current rev), bz (earlier rev)
Approved by:  kib (mentor)
MFC after:    1 month
Relnotes:     yes
Sponsored by: Dell Inc.
Differential Revision: https://reviews.freebsd.org/D3602
2015-09-14 19:17:25 +00:00
Alexander V. Chernikov
3e7a2321e3 * Do more fine-grained locking: call eventhandlers/free_entry
without holding afdata wlock
* convert per-af delete_address callback to global lltable_delete_entry() and
  more low-level "delete this lle" per-af callback
* fix some bugs/inconsistencies in IPv4/IPv6 ifscrub procedures

Sponsored by:		Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D3573
2015-09-14 16:48:19 +00:00
Hans Petter Selasky
d76d40126e Update TSO limits to include all headers.
To make driver programming easier the TSO limits are changed to
reflect the values used in the BUSDMA tag a network adapter driver is
using. The TCP/IP network stack will subtract space for all linklevel
and protocol level headers and ensure that the full mbuf chain passed
to the network adapter fits within the given limits.

Implementation notes:

If a network adapter driver needs to fixup the first mbuf in order to
support VLAN tag insertion, the size of the VLAN tag should be
subtracted from the TSO limit. Else not.

Network adapters which typically inline the complete header mbuf could
technically transmit one more segment. This patch does not implement a
mechanism to recover the last segment for data transmission. It is
believed when sufficiently large mbuf clusters are used, the segment
limit will not be reached and recovering the last segment will not
have any effect.

The current TSO algorithm tries to send MTU-sized packets, where the
MTU typically is 1500 bytes, which gives 1448 bytes of TCP data
payload per packet for IPv4. That means if the TSO length limitiation
is set to 65536 bytes, there will be a data payload remainder of
(65536 - 1500) mod 1448 bytes which is equal to 324 bytes. Trying to
recover total TSO length due to inlining mbuf header data will not
have any effect, because adding or removing the ETH/IP/TCP headers
to or from 324 bytes will not cause more or less TCP payload to be
TSO'ed.

Existing network adapter limits will be updated separately.

Differential Revision:	https://reviews.freebsd.org/D3458
Reviewed by:		rmacklem
MFC after:		2 weeks
2015-09-14 08:36:22 +00:00
Hiroki Sato
b1c250ff3f - Remove GIF_{SEND,ACCEPT}_REVETHIP.
- Simplify EADDRNOTAVAIL and EAFNOSUPPORT conditions.

MFC after:	3 days
2015-09-10 05:59:39 +00:00
Alexander V. Chernikov
441f9243df Constantify lookup key in ifa_ifwith* functions.
Some places in our network stack already have const
arguments (like if_output() routines and LLE functions).

Code using ifa_ifwith (and similar functins) along with
LLE/_output functions is currently bound to use tricks
like __DECONST(). Provide a cleaner way by making sockaddr
lookup key really constant.

MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D3464
2015-09-05 05:33:20 +00:00
Hiroki Sato
18e199ad72 Fix a panic which was reproducible by an infinite loop of
"ifconfig epair0 create && ifconfig epair0a destroy".

This was caused by an uninitialized function pointer in
softc->media.
2015-09-02 16:30:45 +00:00
Alexander V. Chernikov
3b0fd911fa Simplify lla_rt_output()/nd6_add_ifa_lle() by setting lle state in
alloc handler, based on flags.
2015-08-31 05:03:36 +00:00
Adrian Chadd
8f1111cf0b Remove now unused (and #if 0'ed out) headers. 2015-08-29 04:33:31 +00:00
Adrian Chadd
e5562eb934 Replace the printf()s with optional rate limited debugging for RSS.
Submitted by:	Tiwei Bie <btw@mail.ustc.edu.cn>
Differential Revision:	https://reviews.freebsd.org/D3471
2015-08-28 05:58:16 +00:00
Kristof Provost
64b3b4d611 pf: Remove support for 'scrub fragment crop|drop-ovl'
The crop/drop-ovl fragment scrub modes are not very useful and likely to confuse
users into making poor choices.
It's also a fairly large amount of complex code, so just remove the support
altogether.

Users who have 'scrub fragment crop|drop-ovl' in their pf configuration will be
implicitly converted to 'scrub fragment reassemble'.

Reviewed by:	gnn, eri
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D3466
2015-08-27 21:27:47 +00:00
Luiz Otavio O Souza
c614d0a443 Fix the spelling of eri's name.
Pointy hat to:	loos
MFC with:	r287009
2015-08-24 23:40:36 +00:00
Luiz Otavio O Souza
0a70aaf8f5 Add ALTQ(9) support for the CoDel algorithm.
CoDel is a parameterless queue discipline that handles variable bandwidth
and RTT.

It can be used as the single queue discipline on an interface or as a sub
discipline of existing queue disciplines such as PRIQ, CBQ, HFSC, FAIRQ.

Differential Revision:	https://reviews.freebsd.org/D3272
Reviewd by:	rpaulo, gnn (previous version)
Obtained from:	pfSense
Sponsored by:	Rubicon Communications (Netgate)
2015-08-21 22:02:22 +00:00
Alexander V. Chernikov
5a2555160f * Split allocation and table linking for lle's.
Before that, the logic besides lle_create() was the following:
  return existing if found, create if not. This behaviour was error-prone
  since we had to deal with 'sudden' static<>dynamic lle changes.
  This commit fixes bunch of different issues like:
  - refcount leak when lle is converted to static.
    Simple check case:
    console 1:
    while true;
      do for i in `arp -an|awk '$4~/incomp/{print$2}'|tr -d '()'`;
        do arp -s $i 00:22:44:66:88:00 ; arp -d $i;
      done;
    done
   console 2:
    ping -f any-dead-host-in-L2
   console 3:
    # watch for memory consumption:
    vmstat -m | awk '$1~/lltable/{print$2}'
  - possible problems in arptimer() / nd6_timer() when dropping/reacquiring
   lock.
  New logic explicitly handles use-or-create cases in every lla_create
  user. Basically, most of the changes are purely mechanical. However,
  we explicitly avoid using existing lle's for interface/static LLE records.
* While here, call lle_event handlers on all real table lle change.
* Create lltable_free_entry() calling existing per-lltable
  lle_free_t callback for entry deletion
2015-08-20 12:05:17 +00:00
Hiren Panchasara
0e02b43a07 Make LAG LACP fast timeout tunable through IOCTL.
Differential Revision:	D3300
Submitted by:		LN Sundararajan <lakshmi.n at msystechnologies>
Reviewed by:		wblock, smh, gnn, hiren, rpokala at panasas
MFC after:		2 weeks
Sponsored by:		Panasas
2015-08-12 20:21:04 +00:00
Alexander V. Chernikov
0447c1367a Use single 'lle_timer' callout in lltable instead of
two different names of the same timer.
2015-08-11 12:38:54 +00:00
Alexander V. Chernikov
314294de5c Store addresses instead of sockaddrs inside llentry.
This permits us having all (not fully true yet) all the info
needed in lookup process in first 64 bytes of 'struct llentry'.

struct llentry layout:
BEFORE:
[rwlock .. state .. state .. MAC ] (lle+1) [sockaddr_in[6]]
AFTER
[ in[6]_addr MAC .. state .. rwlock ]

Currently, address part of struct llentry has only 16 bytes for the key.
However, lltable does not restrict any custom lltable consumers with long
keys use the previous approach (store key at (lle+1)).

Sponsored by:	Yandex LLC
2015-08-11 09:26:11 +00:00
Alexander V. Chernikov
41cb42a633 MFP r276712.
* Split lltable_init() into lltable_allocate_htbl() (alloc
  hash table with default callbacks) and lltable_link() (
  links any lltable to the list).
* Switch from LLTBL_HASHTBL_SIZE to per-lltable hash size field.
* Move lltable setup to separate functions in in[6]_domifattach.
2015-08-11 05:51:00 +00:00
Alexander V. Chernikov
2caee4be35 Rename rt_foreach_fib() to rt_foreach_fib_walk().
Suggested by:	julian
2015-08-10 20:50:31 +00:00
Alexander V. Chernikov
11cdad9873 Partially merge r274887,r275334,r275577,r275578,r275586 to minimize
differences between projects/routing and HEAD.

This commit tries to keep code logic the same while changing underlying
code to use unified callbacks.

* Add llt_foreach_entry method to traverse all entries in given llt
* Add llt_dump_entry method to export particular lle entry in sysctl/rtsock
  format (code is not indented properly to minimize diff). Will be fixed
  in the next commits.
* Add llt_link_entry/llt_unlink_entry methods to link/unlink particular lle.
* Add llt_fill_sa_entry method to export address in the lle to sockaddr
  format.
* Add llt_hash method to use in generic hash table support code.
* Add llt_free_entry method which is used in llt_prefix_free code.

* Prepare for fine-grained locking by separating lle unlink and deletion in
  lltable_free() and lltable_prefix_free().

* Provide lltable_get<ifp|af>() functions to reduce direct 'struct lltable'
 access by external callers.

* Remove @llt agrument from lle_free() lle callback since it was unused.
* Temporarily add L3_CADDR() macro for 'const' sockaddr typecasting.
* Switch to per-af hashing code.
* Rename LLE_FREE_LOCKED() callback from in[6]_lltable_free() to
  in_[6]lltable_destroy() to avoid clashing with llt_free_entry() method.
  Update description from these functions.
* Use unified lltable_free_entry() function instead of per-af one.

Reviewed by:	ae
2015-08-10 12:03:59 +00:00
Alexander V. Chernikov
4bdf0b6a9a MFP r274295:
* Move interface route cleanup to route.c:rt_flushifroutes()
* Convert most of "for (fibnum = 0; fibnum < rt_numfibs; fibnum++)" users
  to use new rt_foreach_fib() instead of hand-rolling cycles.
2015-08-08 18:14:59 +00:00
Alexander V. Chernikov
e362cf0e9f MFP r274553:
* Move lle creation/deletion from lla_lookup to separate functions:
  lla_lookup(LLE_CREATE) -> lla_create
  lla_lookup(LLE_DELETE) -> lla_delete
lla_create now returns with LLE_EXCLUSIVE lock for lle.
* Provide typedefs for new/existing lltable callbacks.

Reviewed by:	ae
2015-08-08 17:48:54 +00:00
Luiz Otavio O Souza
9224217213 Remove the mtx_sleep() from the kqueue f_event filter.
The filter is called from the network hot path and must not sleep.

The filter runs with the descriptor lock held and does not manipulates the
buffers, so it is not necessary sleep when the hold buffer is in use.

Just ignore the hold buffer contents when it is being copied to user space
(when hold buffer in use is set).

This fix the "Sleeping thread owns a non-sleepable lock" panic when the
userland thread is too busy reading the packets from bpf(4).

PR:		200323
MFC after:	2 weeks
Sponsored by:	Rubicon Communications (Netgate)
2015-08-03 22:14:45 +00:00
Luiz Otavio O Souza
98fa5d858c Add a KASSERT() to make sure we wont rotate the buffers twice (rotate the
buffers while the hold buffer is in use).

Suggested by:	ed, ghelmer
MFC with:	r286142
2015-08-03 18:22:31 +00:00
John-Mark Gurney
bba6880eab looks like all archs either have clang or cdefs included before..
drop this include as unnecessary..

Requested by:	bde
2015-08-02 21:33:40 +00:00
John-Mark Gurney
70e47040b0 convert to C11's _Static_assert, and pull in sys/cdefs.h for
compatibility w/ older non-C11 compilers...

passed make tinerdbox..

Suggested by:	imp
2015-08-02 00:15:52 +00:00
Luiz Otavio O Souza
f87e372ef2 Remove two unnecessary sleeps from the hot path in bpf(4).
The first one never triggers because bpf_canfreebuf() can only be true for
zero-copy buffers and zero-copy buffers are not read with read(2).

The second also never triggers, because we check the free buffer before
calling ROTATE_BUFFERS().  If the hold buffer is in use the free buffer
will be NULL and there is nothing else to do besides drop the packet.  If
the free buffer isn't NULL the hold buffer _is_ free and it is safe to
rotate the buffers.

Update the comment in ROTATE_BUFFERS macro to match the logic described
here.

While here fix a few typos in comments.

MFC after:	2 weeks
Sponsored by:	Rubicon Communications (Netgate)
2015-07-31 21:43:27 +00:00
Luiz Otavio O Souza
faa693cdbe Remove the sleep from the buffer allocation routine.
The buffer must be allocated (or even changed) before the interface is set
and thus, there is no need to verify if the buffer is in use.

MFC after:	2 weeks
Sponsored by:	Rubicon Communications (Netgate)
2015-07-31 20:25:54 +00:00
Luiz Otavio O Souza
4f42daa4a3 Do not allocate the buffers at opening of the descriptor, because once
the buffer is allocated we are committed to a particular buffer method
(BPF_BUFMODE_BUFFER in this case).

If we are using zero-copy buffers, the userland program must register its
buffers before set the interface.

If we are using kernel memory buffers, we can allocate the buffer at the
time that the interface is being set.

This fix allows the usage of BIOCSETBUFMODE after r235746.

Update the comments to reflect the recent changes.

MFC after:	2 weeks
Sponsored by:	Rubicon Communications (Netgate)
2015-07-31 20:02:12 +00:00
Andrey V. Elsukov
926381e108 Ansify if_stf.c 2015-07-31 09:04:22 +00:00
John-Mark Gurney
af024d3b23 temporarily fix build.. This isn't the final fix, and testing is
still on going, but it has passed world for mips and powerpc...

I know this has an extra semicolon, but this is the patch that is
tested...

Looks like better fix is to use _Static_assert...
2015-07-31 07:48:08 +00:00
John-Mark Gurney
817c7ed900 Clean up this header file...
use CTASSERTs now that we have them...

Replace a draft w/ RFC that's over 10 years old.

Note that _AALG and _EALG do not need to match what the IKE daemons
think they should be..  This is part of the KABI...  I decided to
renumber AESCTR, but since we've never had working AESCTR mode, I'm
not really breaking anything..  and it shortens a loop by quite
a bit..

remove SKIPJACK IPsec support...  SKIPJACK never made it out of draft
(in 1999), only has 80bit key, NIST recommended it stop being used
after 2010, and setkey nor any of the IKE daemons I checked supported
it...

jmgurney/ipsecgcm: a357a33, c75808b, e008669, b27b6d6

Reviewed by:	gnn (earlier version)
2015-07-31 00:23:21 +00:00
Andrey V. Elsukov
a5965d1513 Build if_stf(4) module only when both INET and INET6 support are enabled. 2015-07-30 10:26:43 +00:00