Commit Graph

1691 Commits

Author SHA1 Message Date
Mark Johnston
0973ca723c Always start IPv6 DAD asynchronously.
Otherwise we transmit the first neighbour solicitation in the context of the
caller of nd6_dad_start(), which can easily result in lock recursion. When
DAD is to be started after some delay, we send the first NS from the DAD
callout handler, so just change the implementation to do this in the
non-delayed case as well.

Reviewed by:	ae, hrs
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D6639
2016-06-02 17:17:15 +00:00
Bjoern A. Zeeb
3f58662dd9 The pr_destroy field does not allow us to run the teardown code in a
specific order.  VNET_SYSUNINITs however are doing exactly that.
Thus remove the VIMAGE conditional field from the domain(9) protosw
structure and replace it with VNET_SYSUNINITs.
This also allows us to change some order and to make the teardown functions
file local static.
Also convert divert(4) as it uses the same mechanism ip(4) and ip6(4) use
internally.

Slightly reshuffle the SI_SUB_* fields in kernel.h and add a new ones, e.g.,
for pfil consumers (firewalls), partially for this commit and for others
to come.

Reviewed by:		gnn, tuexen (sctp), jhb (kernel.h)
Obtained from:		projects/vnet
MFC after:		2 weeks
X-MFC:			do not remove pr_destroy
Sponsored by:		The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D6652
2016-06-01 10:14:04 +00:00
Michael Tuexen
3d48d25be7 Add PR_CONNREQUIRED for SOCK_STREAM sockets using SCTP.
This is required to signal connetion setup on non-blocking sockets
via becoming writable. This still allows for implicit connection
setup.

MFC after:	1 week
2016-05-30 18:24:23 +00:00
Gleb Smirnoff
6351b3857b Plug route reference underleak that happens with FLOWTABLE after r297225.
Submitted by:	Mike Karels <mike karels.net>
2016-05-27 17:31:02 +00:00
Mark Johnston
65fdf52123 Mark the prefix and default router list sysctl handlers MPSAFE.
MFC after:	2 weeks
2016-05-23 20:18:11 +00:00
Mark Johnston
cc51be7b81 Acquire the nd6 lock in the prefix list sysctl handler.
The nd6 lock will be used to synchronize access to the NDP prefix list.

MFC after:	2 weeks
Tested by:	Jason Wolfe (as part of a larger change)
2016-05-23 20:15:08 +00:00
Andrey V. Elsukov
1d4d43c0e7 Remove ip6 adjusting from the place where pointer couldn't be changed.
And add comment after calling PFIL hooks, where it could be changed.
2016-05-20 12:17:40 +00:00
Andrey V. Elsukov
9fdab4c052 Remove ip6 pointer initialization and strange check from the beginning
of ip6_output(). It isn't used until the first time adjusted.
Remove the comment about adjusting where it is actually initialized.
2016-05-20 12:09:10 +00:00
Mark Johnston
5e0a6f31e5 Move IPv6 malloc tag definitions into the IPv6 code. 2016-05-20 04:45:08 +00:00
Andrey V. Elsukov
f0937b2cf5 Since PFIL can change destination address, use its always actual value
from mbuf when calculating path mtu. Remove now unused finaldst variable.
Also constify dst argument in ip6_getpmtu() and ip6_getpmtu_ctl().

Reviewed by:	melifaro
Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
2016-05-19 12:45:20 +00:00
Andrey V. Elsukov
4ee7e5a63d Call RO_RTFREE() when we have detected the change of destination
address, otherwise the old route will be used with new destination.

MFC after:	1 week
2016-05-17 14:06:55 +00:00
Mark Johnston
c81c183dae Use Node Information flag names instead of hard-coding their values.
MFC after:	1 week
2016-05-15 03:22:13 +00:00
Mark Johnston
82366c228b Add sysctl descriptions for net.inet6.ip6 and net.inet6.icmp6.
icmp6.redirtimeout, icmp6.nd6_maxnudhint and ip6.rr_prune are left
undocumented as they appear to have no effect. Some existing sysctl
descriptions were modified for consistency and style, and the
ip6.tempvltime and ip6.temppltime handlers were rewritten to be a bit
simpler and to avoid setting the sysctl value before validating it.

MFC after:	3 weeks
2016-05-15 03:18:03 +00:00
Mark Johnston
415f6c24dd Remove an always-false error check in the AIFADDR_IN6 handler.
CID:		1250792
MFC after:	1 week
2016-05-15 03:01:40 +00:00
Mark Johnston
df890b8e73 Remove obsolescent comments from nd6_purge().
MFC after:	1 week
2016-05-09 23:43:12 +00:00
Mark Johnston
20b5f02214 Clean up callers of nd6_prelist_add().
nd6_prelist_add() sets *newp if and only if it is successful, so there's no
need for code that handles the case where the return value is 0 and
*newp == NULL. Fix some style bugs in nd6_prelist_add() while here.

MFC after:	1 week
2016-05-07 03:41:29 +00:00
Mark Johnston
83631b16a7 Remove two useless local variables from prelist_update().
MFC after:	1 week
2016-05-07 03:32:29 +00:00
Pedro F. Giffuni
a4641f4eaa sys/net*: minor spelling fixes.
No functional change.
2016-05-03 18:05:43 +00:00
Michael Tuexen
ec70917ffa When a client uses UDP encapsulation and lists IP addresses in the INIT
chunk, enable UDP encapsulation for all those addresses.
This helps clients using a userland stack to support multihoming if
they are not behind a NAT.

MFC after: 1 week
2016-05-01 21:48:55 +00:00
Michael Tuexen
7ae2ff0dba Use correct order of source and destination address and port. 2016-04-29 20:13:35 +00:00
Randall Stewart
abb901c5d7 Complete the UDP tunneling of ICMP msgs to those protocols
interested in having tunneled UDP and finding out about the
ICMP (tested by Michael Tuexen with SCTP.. soon to be using
this feature).

Differential Revision:	http://reviews.freebsd.org/D5875
2016-04-28 15:53:10 +00:00
Conrad Meyer
2769d06203 in_lltable_alloc and in6 copy: Don't leak LLE in error path
Fix a memory leak in error conditions introduced in r292978.

Reported by:	Coverity
CIDs:		1347009, 1347010
Sponsored by:	EMC / Isilon Storage Division
2016-04-26 23:13:48 +00:00
Luiz Otavio O Souza
b0ab3725db Fixes the comment to reflect the code.
Sponsored by:	Rubicon Communications (Netgate)
2016-04-25 23:12:39 +00:00
Pedro F. Giffuni
63b6b7a74a Indentation issues.
Contract some lines leftover from r298310.

Mea culpa.
2016-04-20 16:19:44 +00:00
Pedro F. Giffuni
02abd40029 kernel: use our nitems() macro when it is available through param.h.
No functional change, only trivial cases are done in this sweep,

Discussed in:	freebsd-current
2016-04-19 23:48:27 +00:00
Michael Tuexen
b1deed45e6 Address issues found by the XCode code analyzer. 2016-04-18 20:16:41 +00:00
Michael Tuexen
b9dd6a90b6 Fix the ICMP6 handling for SCTP.
Keep the IPv4 code in sync.

MFC after:	1 week
2016-04-16 21:34:49 +00:00
Pedro F. Giffuni
155d72c498 sys/net* : for pointers replace 0 with NULL.
Mostly cosmetical, no functional change.

Found with devel/coccinelle.
2016-04-15 17:30:33 +00:00
Andrey V. Elsukov
0b519fe44f Fix regression introduced in r296986.
Currently we don't keep zoneid in in6_ifaddr structure, because there
is still some code, that doesn't properly initialize sin6_scope_id,
but some functions use sa_equal() for addresses comparison. sa_equal()
compares full sockaddr_in6 structures and such comparison will fail.
For now use zero zoneid in in6ifa_ifwithaddr(). It is safe, because
used address is in embedded form. In future we will use zoneid, so mark it
with XXX comment.

Reported by:	kp
Tested by:	kp
2016-04-08 11:13:24 +00:00
George V. Neville-Neil
ce223fb715 Unbreak the RSS/PCBGROUp build. 2016-03-31 00:53:23 +00:00
Mark Johnston
47d2d39111 Fix the lladdr copy in in6_lltable_dump_entry() after r292978.
This bug caused "ndp -a" to show the wrong link layer address for neighbour
cache entries.

PR:	208067
2016-03-30 00:03:59 +00:00
Mark Johnston
435bece4c5 Modify nd6_llinfo_timer() to acquire the nd6 lock before the LLE lock.
When expiring a neighbour cache entry we may need to look up the associated
default router, which requires the nd6 read lock. To avoid an LOR, the nd6
lock should be acquired first.

X-MFC-With:	r296063
Tested by:	Larry Rosenman <ler@lerctr.org> (previous revision)
2016-03-29 19:23:00 +00:00
George V. Neville-Neil
84cc0778d0 FreeBSD previously provided route caching for TCP (and UDP). Re-add
route caching for TCP, with some improvements. In particular, invalidate
the route cache if a new route is added, which might be a better match.
The cache is automatically invalidated if the old route is deleted.

Submitted by:	Mike Karels
Reviewed by:	gnn
Differential Revision:	https://reviews.freebsd.org/D4306
2016-03-24 07:54:56 +00:00
Bjoern A. Zeeb
9901091eba Mfp4 @180378:
Factor out nd6 and in6_attach initialization to their own files.
  Also move destruction into those files though still called from
  the central initialization.

  Sponsored by:	CK Software GmbH
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Reviewed by:	gnn
Differential Revision:	https://reviews.freebsd.org/D5033
2016-03-22 15:43:47 +00:00
Mark Johnston
ff63037da1 Modify defrouter_remove() to perform the router lookup before removal.
This allows some simplification of its callers. No functional change
intended.

Tested by:	Larry Rosenman (as part of a larger change)
MFC after:	1 month
2016-03-17 19:01:44 +00:00
Andrey V. Elsukov
3e16fab37b Reduce the number of local variables. Remove redundant check that inp
pointer isn't NULL, it is safe, because we are handling IPV6_PKTINFO
socket option in this block of code. Also, use in6ifa_withaddr() instead
of ifa_withaddr().
2016-03-17 11:10:44 +00:00
Andrey V. Elsukov
04894769f4 Change in6_selectsrc() to allow usage of non-local IPv6 addresses in
IPV6_PKTINFO ancillary data when IPV6_BINDANY socket option is set.

Submitted by:	n_hibma
MFC after:	2 weeks
2016-03-17 10:59:30 +00:00
Gleb Smirnoff
56a5f52e80 New way to manage reference counting of mbuf external storage.
The m_ext.ext_cnt pointer becomes a union. It can now hold the refcount
value itself. To tell that m_ext.ext_flags flag EXT_FLAG_EMBREF is used.
The first mbuf to attach a cluster stores the refcount. The further mbufs
to reference the cluster point at refcount in the first mbuf. The first
mbuf is freed only when the last reference is freed.

The benefit over refcounts stored in separate slabs is that now refcounts
of different, unrelated mbufs do not share a cache line.

For EXT_EXTREF mbufs the zone_ext_refcnt is no longer needed, and m_extadd()
becomes void, making widely used M_EXTADD macro safe.

For EXT_SFBUF mbufs the sf_ext_ref() is removed, which was an optimization
exactly against the cache aliasing problem with regular refcounting.

Discussed with:		rrs, rwatson, gnn, hiren, sbruno, np
Reviewed by:		rrs
Differential Revision:	https://reviews.freebsd.org/D5396
Sponsored by:		Netflix
2016-03-01 00:17:14 +00:00
Mark Johnston
4de485fe5f Lock the NDP default router list and count defrouter references.
This addresses a number of race conditions that can cause crashes as a
result of unsynchronized access to the list.

PR:		206904
Tested by:	Larry Rosenman <ler@lerctr.org>,
		Kevin Bowling <kevin.bowling@kev009.com>
MFC after:	2 months
Differential Revision: https://reviews.freebsd.org/D5315
2016-02-25 20:12:05 +00:00
Michael Tuexen
9a59fb36b2 Don't leak an address in an error path.
CID:		1351729
MFC after:	3 days
2016-02-23 18:50:34 +00:00
Michael Tuexen
6df7c000d3 Fix reporting of mapped addressed in getpeername() and getsockname() for
IPv6 SCTP sockets.
This bugs were found because of an issue reported by PVS / D5245.
2016-02-18 21:05:04 +00:00
Mark Johnston
c15064c27a Release the ref acquired in nd6_dad_find() if DAD is already in progress.
MFC after:	1 week
2016-02-18 00:00:51 +00:00
Mark Johnston
38dd603f8c Use pfxrtr_del() instead of freeing advertising routers directly.
MFC after:	1 week
2016-02-17 23:55:24 +00:00
Mark Johnston
19fa242679 Remove a prototype for the non-existent prelist_del().
MFC after:	1 week
2016-02-17 23:53:24 +00:00
Gleb Smirnoff
c619ab95b3 Ternary operator has lower priority than OR.
Found by:	PVS-Studio
2016-02-17 21:17:14 +00:00
Mark Johnston
eb3ff358dc Add a missing newline to a log message.
MFC after:	1 week
2016-02-12 21:17:00 +00:00
Mark Johnston
01869be58e Rename the flags field of struct nd_defrouter to "raflags".
This field contains the flags inherited from the corresponding router
advertisement message and is not for storing private state.

MFC after:	1 week
2016-02-12 21:15:57 +00:00
Mark Johnston
a4a7c8254c Simplify defrtrlist_update() slightly in preparation for future changes.
No functional change intended.

MFC after:	1 week
2016-02-12 21:06:48 +00:00
Mark Johnston
f60d595f34 Remove a bogus comment from nd6_na_input().
The splnet() call that it refers to has been removed, and a lock for the
default router list is in fact needed.

MFC after:	1 week
2016-02-12 21:01:53 +00:00
Mark Johnston
baebd3e54f Remove superfluous return statements from the neighbour discovery code.
MFC after:	1 week
2016-02-12 20:55:22 +00:00
Mark Johnston
fc31564185 Fix style around allocations from M_IP6NDP.
- Don't cast the return value of malloc(9).
- Use M_ZERO instead of explicitly calling bzero(9).

MFC after:	1 week
2016-02-12 20:52:53 +00:00
Mark Johnston
97dca6a207 Remove some unreferenced NDP debug variable definitions.
MFC after:	1 week
2016-02-12 20:46:53 +00:00
Devin Teske
41c0ec9a16 Merge SVN r295220 (bz) from projects/vnet/
Fix a panic that occurs when a vnet interface is unavailable at the time the
vnet jail referencing said interface is stopped.

Sponsored by:	FIS Global, Inc.
2016-02-11 17:07:19 +00:00
Bjoern A. Zeeb
a5243af262 Code duplication but rib_head is special. Not found an easy way to go
back and harmize the use cases among RIB, IPFW, PF yet but it's also not
the scope of this work.   Prevents instant panics on teardown and frees
the FIB bits again.

Sponsored by:	The FreeBSD Foundation
2016-02-03 21:56:51 +00:00
Bjoern A. Zeeb
2414e86439 MfH @r295202
Expect to see panics in routing code at least now.
2016-02-03 11:49:51 +00:00
Gleb Smirnoff
8ec07310fa These files were getting sys/malloc.h and vm/uma.h with header pollution
via sys/mbuf.h
2016-02-01 17:41:21 +00:00
Alexander V. Chernikov
61eee0e202 MFP r287070,r287073: split radix implementation and route table structure.
There are number of radix consumers in kernel land (pf,ipfw,nfs,route)
  with different requirements. In fact, first 3 don't have _any_ requirements
  and first 2 does not use radix locking. On the other hand, routing
  structure do have these requirements (rnh_gen, multipath, custom
  to-be-added control plane functions, different locking).
Additionally, radix should not known anything about its consumers internals.

So, radix code now uses tiny 'struct radix_head' structure along with
  internal 'struct radix_mask_head' instead of 'struct radix_node_head'.
  Existing consumers still uses the same 'struct radix_node_head' with
  slight modifications: they need to pass pointer to (embedded)
  'struct radix_head' to all radix callbacks.

Routing code now uses new 'struct rib_head' with different locking macro:
  RADIX_NODE_HEAD prefix was renamed to RIB_ (which stands for routing
  information base).

New net/route_var.h header was added to hold routing subsystem internal
  data. 'struct rib_head' was placed there. 'struct rtentry' will also
  be moved there soon.
2016-01-25 06:33:15 +00:00
Bjoern A. Zeeb
0ea826e094 MFp4 @180892:
With pr_destroy being gone, call ip6_destroy from an ordered
	NET_SYSUNINT.  Make ip6_destroy() static as well.

Sponsored by:	The FreeBSD Foundation
2016-01-22 18:09:25 +00:00
Bjoern A. Zeeb
009e81b164 MFH @r294567 2016-01-22 15:11:40 +00:00
Bjoern A. Zeeb
1f12da0e82 Just checkpoint the WIP in order to be able to make the tree update
easier.  Note:  this is currently not in a usable state as certain
teardown parts are not called and the DOMAIN rework is missing.
More to come soon and find its way to head.

Obtained from:	P4 //depot/user/bz/vimage/...
Sponsored by:	The FreeBSD Foundation
2016-01-22 15:00:01 +00:00
Alexander V. Chernikov
af8451a51e Fix rte refcount leak in ip6_forward().
Reviewed by:	ae
MFC after:	2 weeks
Sponsored by:	Yandex LLC
2016-01-20 11:25:30 +00:00
Gleb Smirnoff
479795819a Verify the packet length in sctp6_input().
The sctp6_ctlinput() function does not properly check the length of the packet
it receives from the ICMP6 input routine. This means that an attacker can craft
a packet that will cause a kernel panic.

When the kernel receives an ICMP6 error message with one of the types/codes
it handles, it calls icmp6_notify_error() to deliver it to the upper-level
protocol. icmp6_notify_error() cycles through the extension headers (if any)
to find the protocol number of the first non-extension header. It does NOT
verify the length of the non-extension header.

It passes information about the packet (including the actual packet) to the
upper-level protocol's pr_ctlinput function. In the case of SCTP for IPv6,
icmp6_notify_error() calls sctp6_ctlinput().

sctp6_ctlinput() assumes that the incoming packet contains a sufficiently-long
SCTP header and calls m_copydata() to extract a copy of that header. In turn,
m_copydata() assumes that the caller has already verified that the offset and
length parameters are correct. If they are incorrect, it will dereference a
NULL pointer and cause a kernel panic.

In short, no one is sufficiently verifying the input, and the result is a
kernel panic.

Submitted by:	jtl
Security:	SA-16:01.sctp
2016-01-14 10:11:10 +00:00
Alexander V. Chernikov
59747033cd Bring RADIX_MPATH support to new routing KPI to ease migration.
Move actual rte selection process from rtalloc_mpath_fib()
  to the rt_path_selectrte() function. Add public
  rt_mpath_select() to use in fibX_lookup_ functions.
2016-01-11 08:45:28 +00:00
Alexander V. Chernikov
601c0b8bcc Split in6_selectsrc() into in6_selectsrc_addr() and in6_selectsrc_socket().
in6_selectsrc() has 2 class of users: socket-based one (raw/udp/pcb/etc) and
  socket-less (ND code). The main reason for that change is inability to
  specify non-default FIB for callers w/o socket since (internally) inpcb
  is used to determine fib.

As as result, add 2 wrappers for in6_selectsrc() (making in6_selectsrc()
  static):
1) in6_selectsrc_socket() for the former class. Embed scope_ambiguous check
  along with returning hop limit when needed.
2) in6_selectsrc_addr() for the latter case. Add 'fibnum' argument and
  pass IPv6 address  w/ explicitly specified scope as separate argument.

Reviewed by:	ae (previous version)
2016-01-10 13:40:29 +00:00
Alexander V. Chernikov
ab861e6c06 Do not hold ifaddr reference for the whole icmp6_reflect() exec time.
Copy source address, calculate hlim and release refcount instead.
2016-01-10 11:59:55 +00:00
Alexander V. Chernikov
5dba456c14 Remove prefix check from in6_addroute().
This check was added in initial? netinet6/ import
  back in 1999 (r53541).
It effectively became unnecessary after 'address/prefix clean-ups'
  KAME commit 90ff8792e676132096a440dd787f99a5a5860ee8 (github) in 2001
  (merged to FreeBSD in r78064) where prefix check was added to
  nd6_prefix_onlink(). Similar IPv4 check (in_addroute() was added
  in r137628).
Additionally, the right plance for this (or similar) check is the prefix
  addition code (nd6_prefix_onlink(), nd6_prefix_onlink_rtrequest(),
  in_addprefix() or rtinit()), but not the generic radix insert routine.
2016-01-09 11:41:37 +00:00
Alexander V. Chernikov
ea8d14925c Remove sys/eventhandler.h from net/route.h
Reviewed by:	ae
2016-01-09 09:34:39 +00:00
Alexander V. Chernikov
bacf668430 Finish r293098: make ip6_getpmtu() and ip6_getpmtu_ctl() use new routing API 2016-01-04 18:32:24 +00:00
Alexander V. Chernikov
9a1b64d5a0 Add rib_lookup_info() to provide API for retrieving individual route
entries data in unified format.

There are control plane functions that require information other than
  just next-hop data (e.g. individual rtentry fields like flags or
  prefix/mask). Given that the goal is to avoid rte reference/refcounting,
  re-use rt_addrinfo structure to store most rte fields. If caller wants
  to retrieve key/mask or gateway (which are sockaddrs and are allocated
  separately), it needs to provide sufficient-sized sockaddrs structures
  w/ ther pointers saved in passed rt_addrinfo.

Convert:
  * lltable new records checks (in_lltable_rtcheck(),
    nd6_is_new_addr_neighbor().
  * rtsock pre-add/change route check.
  * IPv6 NS ND-proxy check (RADIX_MPATH code was eliminated because
     1) we don't support RTF_ANNOUNCE ND-proxy for networks and there should
       not be multiple host routes for such hosts 2) if we have multiple
       routes we should inspect them (which is not done). 3) the entire idea
       of abusing KRT as storage for ND proxy seems odd. Userland programs
       should be used for that purpose).
2016-01-04 15:03:20 +00:00
Alexander V. Chernikov
357ce739b9 Remove 'struct route_int6' argument from in6_selectsrc() and
in6_selectif().

The main task of in6_selectsrc() is to return IPv6 SAS (along with
  output interface used for scope checks). No data-path code uses
  route argument for caching. The only users are icmp6 (reflect code),
  ND6 ns/na generation code. All this fucntions are control-plane, so
  there is no reason to try to 'optimize' something by passing cached
  route into to ip6_output(). Given that, simplify code by eliminating
  in6_selectsrc() 'struct route_in6' argument. Since in6_selectif() is
  used only by in6_selectsrc(), eliminate its 'struct route_in6' argument,
  too. While here, reshape rte-related code inside in6_selectif() to
  free lookup result immediately after saving all the needed fields.
2016-01-03 10:43:23 +00:00
Alexander V. Chernikov
0d4df0290e Handle IPV6_PATHMTU option by spliting ip6_getpmtu_ctl() from ip6_getpmtu().
Add ro_mtu field to 'struct route' to be able to pass lookup MTU back to
  the caller.

Currently, ip6_getpmtu() has 2 totally different use cases:
1) control plane (IPV6_PATHMTU req), where we just need to calculate MTU
  and return it, w/o any reusability.
2) Actual ip6_output() data path where we (nearly) always use the provided
  route lookup data. If this data is not 'valid' we need to perform another
  lookup and save the result (which cannot be re-used by ip6_output()).

Given that, handle 1) by calling separate function doing rte lookup itself.
  Resulting MTU is calculated by (newly-added) ip6_calcmtu() used by both
  ip6_getpmtu_ctl() and ip6_getpmtu().
For 2) instead of storing ref'ed rte, store mtu (the only needed data
  from the lookup result) inside newly-added ro_mtu field.
  'struct route' was shrinked by 8(or 4 bytes) in r292978. Grow it again
  by 4 bytes. New ro_mtu field will be used in other places like
  ip/tcp_output (EMSGSIZE handling from output routines).

Reviewed by:	ae
2016-01-03 09:54:03 +00:00
Alexander V. Chernikov
9a7ee988b5 Use lltable_get_ifp() instead of direct access to lltable fields. 2016-01-01 12:35:33 +00:00
Alexander V. Chernikov
4fb3a8208c Implement interface link header precomputation API.
Add if_requestencap() interface method which is capable of calculating
  various link headers for given interface. Right now there is support
  for INET/INET6/ARP llheader calculation (IFENCAP_LL type request).
  Other types are planned to support more complex calculation
  (L2 multipath lagg nexthops, tunnel encap nexthops, etc..).

Reshape 'struct route' to be able to pass additional data (with is length)
  to prepend to mbuf.

These two changes permits routing code to pass pre-calculated nexthop data
  (like L2 header for route w/gateway) down to the stack eliminating the
  need for other lookups. It also brings us closer to more complex scenarios
  like transparently handling MPLS nexthops and tunnel interfaces.
  Last, but not least, it removes layering violation introduced by flowtable
  code (ro_lle) and simplifies handling of existing if_output consumers.

ARP/ND changes:
Make arp/ndp stack pre-calculate link header upon installing/updating lle
  record. Interface link address change are handled by re-calculating
  headers for all lles based on if_lladdr event. After these changes,
  arpresolve()/nd6_resolve() returns full pre-calculated header for
  supported interfaces thus simplifying if_output().
Move these lookups to separate ether_resolve_addr() function which ether
  returs error or fully-prepared link header. Add <arp|nd6_>resolve_addr()
  compat versions to return link addresses instead of pre-calculated data.

BPF changes:
Raw bpf writes occupied _two_ cases: AF_UNSPEC and pseudo_AF_HDRCMPLT.
Despite the naming, both of there have ther header "complete". The only
  difference is that interface source mac has to be filled by OS for
  AF_UNSPEC (controlled via BIOCGHDRCMPLT). This logic has to stay inside
  BPF and not pollute if_output() routines. Convert BPF to pass prepend data
  via new 'struct route' mechanism. Note that it does not change
  non-optimized if_output(): ro_prepend handling is purely optional.
Side note: hackish pseudo_AF_HDRCMPLT is supported for ethernet and FDDI.
  It is not needed for ethernet anymore. The only remaining FDDI user is
  dev/pdq mostly untouched since 2007. FDDI support was eliminated from
  OpenBSD in 2013 (sys/net/if_fddisubr.c rev 1.65).

Flowtable changes:
  Flowtable violates layering by saving (and not correctly managing)
  rtes/lles. Instead of passing lle pointer, pass pointer to pre-calculated
  header data from that lle.

Differential Revision:	https://reviews.freebsd.org/D4102
2015-12-31 05:03:27 +00:00
Jonathan T. Looney
912568c881 Add the appropriate case statement for IPV6_BINDMULTI so the option can be
retrieved with getsockopt().

CID:	1229928
Differential Revision:	https://reviews.freebsd.org/D4737
Reviewed by:	adrian
Sponsored by:	Juniper Networks
2015-12-30 18:08:05 +00:00
Bjoern A. Zeeb
88310b4011 This code is not in modules that need KPI stability so no need to use
the wrapper functions as used in r252511.  We can directly use the
locking macros.

Reviewed by:		jtl, rwatson
MFC after:		2 weeks
Differential Revision:	https://reviews.freebsd.org/D4731
2015-12-30 17:10:03 +00:00
Garrett Wollman
2832ade6b8 in6_if2idlen: treat bridge(4) interfaces like other Ethernet interfaces
bridge(4) interfaces have an if_type of IFT_BRIDGE, rather than
IFT_ETHER, even though they only support Ethernet-style links.  This
caused in6_if2idlen to emit an "unknown link type (209)" warning to
the console every time it was called.  Add IFT_BRIDGE to the case
statement in the appropriate place, indicating that it uses the same
IPv6 address format as other Ethernet-like interfaces.

MFC after:	1 week
2015-12-28 18:29:47 +00:00
Bjoern A. Zeeb
bf72bf1281 Remove superfluous return (1) missed in r292601.
Reported by:	Matthew D. Fuller (fullermd over-yonder.net),
		Kevin Bowling (kevin.bowling kev009.com)
MFC after:	13 days
X-MFC with:	r292601
Sponsored by:	The FreeBSD Foundation
2015-12-23 10:23:47 +00:00
Bjoern A. Zeeb
0a03cf8ca6 Since r256624 we've been leaking routing table allocations
on vnet enabled jail shutdown. Call the provided cleanup
routines for IP versions 4 and 6 to plug these leaks.

Sponsored by:		The FreeBSD Foundation
MFC atfer:		2 weeks
Reviewed by:		gnn
Differential Revision:	https://reviews.freebsd.org/D4530
2015-12-22 14:53:19 +00:00
Steven Hartland
d6e82913c1 Revert r292275 & r292379
glebius has concerns about these changes so reverting those can be discussed
and addressed.

Sponsored by:	Multiplay
2015-12-17 14:41:30 +00:00
Steven Hartland
3a909afe8e Fix issues introduced by r292275
* Fix panic for etherswitches which don't have a LLADDR.
* Disabled DELAY in unsolicited NDA, which needs further work.
* Fixed missing DELAY in carp_send_na.
* style(9) fix.

Reported by:	kp & melifaro
X-MFC-With:	r292275
MFC after:	1 month
Sponsored by:	Multiplay
2015-12-16 22:26:28 +00:00
Alexander V. Chernikov
427c2f4ef0 Provide additional lle data in IPv6 lltable dump used by ndp(8).
Before the change, things like lle state were queried via
  SIOCGNBRINFO_IN6 by ndp(8) for _each_ lle entry in dump.
This ioctl was added in 1999, probably to avoid touching rtsock code.

This change maps SIOCGNBRINFO_IN6 data to standard rtsock dump the
 following way:
  expire (already) maps to rtm_rmx.rmx_expire
  isrouter -> rtm_flags & RTF_GATEWAY
  asked -> rtm_rmx.rmx_pksent
  state -> rtm_rmx.rmx_state (maps to rmx_weight via define)

Reviewed by:	ae
2015-12-16 10:14:16 +00:00
Steven Hartland
52e53e2de0 Fix lagg failover due to missing notifications
When using lagg failover mode neither Gratuitous ARP (IPv4) or Unsolicited
Neighbour Advertisements (IPv6) are sent to notify other nodes that the
address may have moved.

This results is slow failover, dropped packets and network outages for the
lagg interface when the primary link goes down.

We now use the new if_link_state_change_cond with the force param set to
allow lagg to force through link state changes and hence fire a
ifnet_link_event which are now monitored by rip and nd6.

Upon receiving these events each protocol trigger the relevant
notifications:
* inet4 => Gratuitous ARP
* inet6 => Unsolicited Neighbour Announce

This also fixes the carp IPv6 NA's that stopped working after r251584 which
added the ipv6_route__llma route.

The new behavour can be controlled using the sysctls:
* net.link.ether.inet.arp_on_link
* net.inet6.icmp6.nd6_on_link

Also removed unused param from lagg_port_state and added descriptions for the
sysctls while here.

PR:		156226
MFC after:	1 month
Sponsored by:	Multiplay
Differential Revision:	https://reviews.freebsd.org/D4111
2015-12-15 16:02:11 +00:00
Kristof Provost
7e037c12f2 inet6: Do not assume every interface has ip6 enabled.
Certain interfaces (e.g. pfsync0) do not have ip6 addresses (in other words,
ifp->if_afdata[AF_INET6] is NULL). Ensure we don't panic when the MTU is
updated.

pfsync interfaces will never have ip6 support, because it's explicitly disabled
in in6_domifattach().

PR:		205194
Reviewed by:	melifaro, hrs
Differential Revision:	https://reviews.freebsd.org/D4522
2015-12-14 19:44:49 +00:00
Alexander V. Chernikov
12cb7521c2 Remove LLE read lock from IPv6 fast path.
LLE structure is mostly unchanged during its lifecycle: there are only 2
things relevant for fast path lookup code:
1) link-level address change. Since r286722, these updates are performed
  under AFDATA WLOCK.
2) Some sort of feedback indicating that this particular entry is used so
  we send NS to perform reachability verification instead of expiring entry.
  The only signal that is needed from fast path is something like binary
  yes/no.
The latter is solved by the following changes:

Special r_skip_req (introduced in D3688) value is used for fast path feedback.
  It is read lockless by fast path, but updated under req_mutex mutex. If this
  field is non-zero, then fast path will acquire lock and set it back to 0.

After transitioning to STALE state, callout timer is armed to run each
  V_nd6_delay seconds to make sure that if packet was transmitted at the start
  of given interval, we would be able to switch to PROBE state in V_nd6_delay
  seconds as user expects.
(in STALE state) timer is rescheduled until original V_nd6_gctimer expires
  keeping lle in STALE state (remaining timer value stored in lle_remtime).
(in STALE state) timer is rescheduled if packet was transmitted less that
  V_nd6_delay seconds ago to make sure we transition to PROBE state exactly
  after V_n6_delay seconds.

As a result, all packets towards lle in REACHABLE/STALE/PROBE states are handled
  by fast path without acquiring lle read lock.

Differential Revision:		https://reviews.freebsd.org/D3780
2015-12-13 07:39:49 +00:00
Alexander V. Chernikov
9cce04b061 Use correct lookup key for gif route lookups.
This fixes r291993 change.
2015-12-09 22:09:33 +00:00
Alexander V. Chernikov
9977be4a64 Make in_arpinput(), inp_lookup_mcast_ifp(), icmp_reflect(),
ip_dooptions(), icmp6_redirect_input(), in6_lltable_rtcheck(),
  in6p_lookup_mcast_ifp() and in6_selecthlim() use new routing api.

Eliminate now-unused ip_rtaddr().
Fix lookup key fib6_lookup_nh_basic() which was lost diring merge.
Make fib6_lookup_nh_basic() and fib6_lookup_nh_extended() always
  return IPv6 destination address with embedded scope. Currently
  rw_gateway has it scope embedded, do the same for non-gatewayed
  destinations.

Sponsored by:	Yandex LLC
2015-12-09 11:14:27 +00:00
Alexander V. Chernikov
65ff3638df Merge helper fib* functions used for basic lookups.
Vast majority of rtalloc(9) users require only basic info from
route table (e.g. "does the rtentry interface match with the interface
  I have?". "what is the MTU?", "Give me the IPv4 source address to use",
  etc..).
Instead of hand-rolling lookups, checking if rtentry is up, valid,
  dealing with IPv6 mtu, finding "address" ifp (almost never done right),
  provide easy-to-use API hiding all the complexity and returning the
  needed info into small on-stack structure.

This change also helps hiding route subsystem internals (locking, direct
  rtentry accesses).
Additionaly, using this API improves lookup performance since rtentry is not
  locked.
(This is safe, since all the rtentry changes happens under both radix WLOCK
  and rtentry WLOCK).

Sponsored by:	Yandex LLC
2015-12-08 10:50:03 +00:00
Michael Tuexen
c979034b18 Fix the allocation of outgoing streams:
* When processing a cookie, use the number of
  streams announced in the INIT-ACK.
* When sending an INIT-ACK for an existing
  association, use the value from the association,
  not from the end-point.

MFC after:	1 week
2015-12-06 16:17:57 +00:00
Andrey V. Elsukov
9f8b8e793b mld_v2_dispatch_general_query() is used by mld_fasttimo_vnet() to send
a reply to the MLDv2 General Query. In case when router has a lot of
multicast groups, the reply can take several packets due to MTU limitation.
Also we have a limit MLD_MAX_RESPONSE_BURST == 4, that limits the number
of packets we send in one shot. Then we recalculate the timer value and
schedule the remaining packets for sending.
The problem is that when we call mld_v2_dispatch_general_query() to send
remaining packets, we queue new reply in the same mbuf queue. And when
number of packets is bigger than MLD_MAX_RESPONSE_BURST, we get endless
reply of MLDv2 reports.
To fix this, add the check for remaining packets in the queue.

PR:		204831
MFC after:	1 week
Sponsored by:	Yandex LLC
2015-12-01 11:17:41 +00:00
Alexander V. Chernikov
e8b0643eee Add new rt_foreach_fib_walk_del() function for deleting route entries
by filter function instead of picking into routing table details in
  each consumer.
Remove now-unused rt_expunge() (eliminating last external RTF_RNH_LOCKED
 user).
This simplifies future nexthops/mulitipath changes and rtrequest1_fib()
  locking refactoring.

Actual changes:
Add "rt_chain" field to permit rte grouping while doing batched delete
  from routing table (thus growing rte 200->208 on amd64).
Add "rti_filter" /  "rti_filterdata" / "rti_spare" fields to rt_addrinfo
  to pass filter function to various routing subsystems in standard way.
Convert all rt_expunge() customers to new rt_addinfo-based api and eliminate
  rt_expunge().
2015-11-30 05:51:14 +00:00
Andrey V. Elsukov
ef91a9765d Overhaul if_enc(4) and make it loadable in run-time.
Use hhook(9) framework to achieve ability of loading and unloading
if_enc(4) kernel module. INET and INET6 code on initialization registers
two helper hooks points in the kernel. if_enc(4) module uses these helper
hook points and registers its hooks. IPSEC code uses these hhook points
to call helper hooks implemented in if_enc(4).
2015-11-25 07:31:59 +00:00
Conrad Meyer
55faae77fa in6_mc_get: Fix recursion on if_addr_lock on malloc failure
Analogously to r291040, in6_mc_get recurses on if_addr_lock if the
M_NOWAIT allocation fails.  The fix is the same.

Suggested by:	Andrey V. Elsukov
Reviewed by:	jhb (ip4 version)
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D4138 (ip4 version)
2015-11-19 00:27:26 +00:00
Alexander V. Chernikov
637670e77e Bring back the ability of passing cached route via nd6_output_ifp(). 2015-11-15 16:02:22 +00:00
Randall Stewart
7c4676ddee This fixes several places where callout_stops return is examined. The
new return codes of -1 were mistakenly being considered "true". Callout_stop
now returns -1 to indicate the callout had either already completed or
was not running and 0 to indicate it could not be stopped.  Also update
the manual page to make it more consistent no non-zero in the callout_stop
or callout_reset descriptions.

MFC after:	1 Month with associated callout change.
2015-11-13 22:51:35 +00:00
Alexander V. Chernikov
ddd208f7ad Unify setting lladdr for AF_INET[6]. 2015-11-07 11:12:00 +00:00
Adrian Chadd
aaa46574b0 [netinet6]: Create a new IPv6 netisr which expects the frames to have been verified.
This is required for fragments and encapsulated data (eg tunneling) to be redistributed
to the RSS bucket based on the eventual IPv6 header and protocol (TCP, UDP, etc) header.

* Add an mbuf tag with the state of IPv6 options parsing before the frame is queued
  into the direct dispatch handler;
* Continue processing and complete the frame reception in the correct RSS bucket /
  netisr context.

Testing results are in the phabricator review.

Differential Revision:	https://reviews.freebsd.org/D3563
Submitted by:	Tiwei Bie <btw@mail.ustc.edu.cn>
2015-11-06 23:07:43 +00:00
Alexander V. Chernikov
ba99cc0b86 Use m_cat() to reassembly IPv6 packets.
Submitted by:	jonloony_gmail.com
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D3863
2015-10-27 22:11:09 +00:00
Alexander V. Chernikov
ab415c8307 Invoke lle_event for new entry iff it has lladdr set. 2015-10-04 19:10:27 +00:00
Alexander V. Chernikov
7503e0c783 Simplify if (lladdr) condition in nd6_cache_lladdr():
For case (7) (new entry) nothing has to be done except lle_event.
  Invoke this event directly from "create new lle" code block.
  For case (4) (existing entry, same mac) useless mac update was performed,
  along with LLENTRY_RESOLVED lle_event. There was no sense in doing that,
  since nothing really had changed. Simply avoid this condition instead.
  Given that, condition was simplified to (3),(5) states which can be merged
  with previous block.
2015-10-04 12:42:07 +00:00
Alexander V. Chernikov
9b420b3da4 Eliminate nd6_llinfo_settimer(). All consumers were converted to
use nd6_llinfo_settimer_locked() in r216022.
Make nd6_llinfo_settimer_locked() static: last external consumer was
converted in r288124.
2015-10-04 08:33:16 +00:00