Commit Graph

1671 Commits

Author SHA1 Message Date
markj
9346a8b630 Fix a typo.
MFC after:	1 week
2016-10-07 00:35:28 +00:00
markj
71534a1a0c Shorten and simplify some of the loops in pfxlist_onlink_check().
No functional change intended.

MFC after:	1 week
2016-10-07 00:34:57 +00:00
markj
149b5ce5db Use a const reference to prefixes in nd6_is_new_addr_neighbor().
MFC after:	1 week
2016-10-07 00:26:36 +00:00
markj
b0fce0526c nd6_dad_timer(): don't assert that the address is tentative.
It appears that this assertion can be tripped in some cases when
multiple interfaces are on the same link. Until this is resolved, revert a
part of r306305 and simply log a message if the DAD timer fires on a
non-tentative address.

Reported by:	jhb
X-MFC With:	r306305
2016-10-01 01:30:34 +00:00
ae
f7637dcb59 Fix bug introduced in r274300.
In icmp6_reflect() use original source address of erroneous packet as
destination address for source selection algorithm when original
destination address is not one of our own.

Reported by:	Mark Kamichoff <prox at prolixium com>
Tested by:	Mark Kamichoff <prox at prolixium com>
MFC after:	1 week
2016-09-29 19:57:37 +00:00
markj
67747e91f9 Convert checks in nd6_dad_start() and nd6_dad_timer() to assertions.
In particular, these functions can assume they are operating on tentative
addresses.

MFC after:	2 weeks
2016-09-24 21:40:24 +00:00
markj
e89e3efaa6 Rename ndpr_refcnt to ndpr_addrcnt.
This field counts derived addresses and is not a true refcount for prefix
objects, so the previous name was misleading.

MFC after:	1 week
2016-09-24 01:14:25 +00:00
markj
bc48ae0071 Reduce code duplication around NDP message handlers in icmp6_input().
No functional change intended.

MFC after:	2 weeks
2016-09-20 18:08:17 +00:00
kevlo
518bc28463 Remove the 4.3BSD compatible macro m_copy(), use m_copym() instead.
Reviewed by:	gnn
Differential Revision:	https://reviews.freebsd.org/D7878
2016-09-15 07:41:48 +00:00
karels
821d0f98a3 Fix L2 caching for UDP over IPv6
ip6_output() was missing cache invalidation code analougous to
ip_output.c. r304545 disabled L2 caching for UDP/IPv6 as a workaround.
This change adds the missing cache invalidation code and reverts
r304545.

Reviewed by:	gnn
Approved by:	gnn (mentor)
Tested by:	peter@, Mike Andrews
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D7591
2016-08-24 00:52:30 +00:00
bz
55cbdc7ad3 Remove the kernel optoion for IPSEC_FILTERTUNNEL, which was deprecated
more than 7 years ago in favour of a sysctl in r192648.
2016-08-21 18:55:30 +00:00
karels
2fb476f05a Disable L2 caching for UDP over IPv6
The ip6_output routine is missing L2 cache invalication as done
in ip_output.  Even with that code, some problems with UDP over
IPv6 have been reported.  Diabling L2 cache for that problem works
around the problem for now.

PR:		211872 211926
Reviewed by:	gnn
Approved by:	gnn (mentor)
MFC after:	immediate
2016-08-20 20:46:53 +00:00
ae
8c03d2551f Add ipfw_nat64 module that implements stateless and stateful NAT64.
The module works together with ipfw(4) and implemented as its external
action module.

Stateless NAT64 registers external action with name nat64stl. This
keyword should be used to create NAT64 instance and to address this
instance in rules. Stateless NAT64 uses two lookup tables with mapped
IPv4->IPv6 and IPv6->IPv4 addresses to perform translation.

A configuration of instance should looks like this:
 1. Create lookup tables:
 # ipfw table T46 create type addr valtype ipv6
 # ipfw table T64 create type addr valtype ipv4
 2. Fill T46 and T64 tables.
 3. Add rule to allow neighbor solicitation and advertisement:
 # ipfw add allow icmp6 from any to any icmp6types 135,136
 4. Create NAT64 instance:
 # ipfw nat64stl NAT create table4 T46 table6 T64
 5. Add rules that matches the traffic:
 # ipfw add nat64stl NAT ip from any to table(T46)
 # ipfw add nat64stl NAT ip from table(T64) to 64:ff9b::/96
 6. Configure DNS64 for IPv6 clients and add route to 64:ff9b::/96
    via NAT64 host.

Stateful NAT64 registers external action with name nat64lsn. The only
one option required to create nat64lsn instance - prefix4. It defines
the pool of IPv4 addresses used for translation.

A configuration of instance should looks like this:
 1. Add rule to allow neighbor solicitation and advertisement:
 # ipfw add allow icmp6 from any to any icmp6types 135,136
 2. Create NAT64 instance:
 # ipfw nat64lsn NAT create prefix4 A.B.C.D/28
 3. Add rules that matches the traffic:
 # ipfw add nat64lsn NAT ip from any to A.B.C.D/28
 # ipfw add nat64lsn NAT ip6 from any to 64:ff9b::/96
 4. Configure DNS64 for IPv6 clients and add route to 64:ff9b::/96
    via NAT64 host.

Obtained from:	Yandex LLC
Relnotes:	yes
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D6434
2016-08-13 16:09:49 +00:00
stevek
12aa5d6087 Move IPv4-specific jail functions to new file netinet/in_jail.c
_prison_check_ip4 renamed to prison_check_ip4_locked

Move IPv6-specific jail functions to new file netinet6/in6_jail.c
_prison_check_ip6 renamed to prison_check_ip6_locked

Add appropriate prototypes to sys/sys/jail.h

Adjust kern_jail.c to call prison_check_ip4_locked and
prison_check_ip6_locked accordingly.

Add netinet/in_jail.c and netinet6/in6_jail.c to the list of files that
need to be built when INET and INET6, respectively, are configured in the
kernel configuration file.

Reviewed by:	jtl
Approved by:	sjg (mentor)
Sponsored by:	Juniper Networks, Inc.
Differential Revision:	https://reviews.freebsd.org/D6799
2016-08-09 02:16:21 +00:00
ae
a937127683 Fix NULL pointer dereference.
ro pointer can be NULL when IPSec consumes mbuf.

PR:		211486
MFC after:	3 days
2016-08-02 12:18:06 +00:00
gallatin
11f6fcfd28 Rework IPV6 TCP path MTU discovery to match IPv4
- Re-write tcp_ctlinput6() to closely mimic the IPv4 tcp_ctlinput()

- Now that tcp_ctlinput6() updates t_maxseg, we can allow ip6_output()
  to send TCP packets without looking at the tcp host cache for every
  single transmit.

- Make the icmp6 code mimic the IPv4 code & avoid returning
  PRC_HOSTDEAD because it is so expensive.

Without these changes in place, every TCP6 pmtu discovery or host
unreachable ICMP resulted in a call to in6_pcbnotify() which walks the
tcbinfo table with the write lock held.  Because the tcbinfo table is
shared between IPv4 and IPv6, this causes huge scalabilty issues on
servers with lots of (~100K) TCP connections, to the point where even
a small percent of IPv6 traffic had a disproportionate impact on
overall throughput.

Reviewed by:	bz, rrs, ae (all earlier versions), lstewart (in Netflix's tree)
Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D7272
2016-08-01 17:02:21 +00:00
stevek
3acd4a25e6 Prepare for network stack as a module
- Move cr_canseeinpcb to sys/netinet/in_prot.c in order to separate the
   INET and INET6-specific code from the rest of the prot code (It is only
   used by the network stack, so it makes sense for it to live with the
   other network stack code.)
 - Move cr_canseeinpcb prototype from sys/systm.h to netinet/in_systm.h
 - Rename cr_seeotheruids to cr_canseeotheruids and cr_seeothergids to
   cr_canseeothergids, make them non-static, and add prototypes (so they
   can be seen/called by in_prot.c functions.)
 - Remove sw_csum variable from ip6_forward in ip6_forward.c, as it is an
   unused variable.

Reviewed by:	gnn, jtl
Approved by:	sjg (mentor)
Sponsored by:	Juniper Networks, Inc.
Differential Revision:	https://reviews.freebsd.org/D2901
2016-07-27 20:34:09 +00:00
karels
e31cf93024 Fix per-connection L2 caching in fast path
r301217 re-added per-connection L2 caching from a previous change,
but it omitted caching in the fast path.  Add it.

Reviewed By: gallatin
Approved by: gnn (mentor)
Differential Revision: https://reviews.freebsd.org/D7239
2016-07-22 02:11:49 +00:00
ae
2c47439b3f Add ipfw_nptv6 module that implements Network Prefix Translation for IPv6
as defined in RFC 6296. The module works together with ipfw(4) and
implemented as its external action module. When it is loaded, it registers
as eaction and can be used in rules. The usage pattern is similar to
ipfw_nat(4). All matched by rule traffic goes to the NPT module.

Reviewed by:	hrs
Obtained from:	Yandex LLC
MFC after:	1 month
Relnotes:	yes
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D6420
2016-07-18 19:46:31 +00:00
ae
7a18a4b316 Add net.inet6.ip6.intr_queue_maxlen sysctl. It can be used to
change netisr queue limit for IPv6 at runtime.

Obtained from:	Yandex LLC
MFC after:	2 weeks
Sponsored by:	Yandex LLC
2016-07-15 17:09:30 +00:00
dim
8a8ea0466a Fix a page fault in ip6_setpktopt(), occurring when the pflog module is
loaded, and syncthing is started, which uses setsockopt(IPV6_PKGINFO).

This is because pflog interfaces do not normally have an IPv6 address,
causing the ND_IFINFO() macro to dereference a NULL pointer.

Reviewed by:	ae
PR:		210943
MFC after:	3 days
2016-07-13 19:41:19 +00:00
tuexen
c2c8b26056 Don't consider the socket when processing an incoming ICMP/ICMP6 packet,
which was triggered by an SCTP packet. Whether a socket exists, is just
not relevant.

Approved by: re (kib)
MFC after: 1 week
2016-06-23 09:13:15 +00:00
ae
7cdbaef028 Fix the NULL pointer dereference for unresolved link layer entries in
the netinet6 code. Copy link layer address only when corresponding entry
has LLE_VALID flag.

PR:		210379
Approved by:	re (kib)
2016-06-22 11:29:21 +00:00
bz
7a1c0b1ad1 Get closer to a VIMAGE network stack teardown from top to bottom rather
than removing the network interfaces first. This change is rather larger
and convoluted as the ordering requirements cannot be separated.

Move the pfil(9) framework to SI_SUB_PROTO_PFIL, move Firewalls and
related modules to their own SI_SUB_PROTO_FIREWALL.
Move initialization of "physical" interfaces to SI_SUB_DRIVERS,
move virtual (cloned) interfaces to SI_SUB_PSEUDO.
Move Multicast to SI_SUB_PROTO_MC.

Re-work parts of multicast initialisation and teardown, not taking the
huge amount of memory into account if used as a module yet.

For interface teardown we try to do as many of them as we can on
SI_SUB_INIT_IF, but for some this makes no sense, e.g., when tunnelling
over a higher layer protocol such as IP. In that case the interface
has to go along (or before) the higher layer protocol is shutdown.

Kernel hhooks need to go last on teardown as they may be used at various
higher layers and we cannot remove them before we cleaned up the higher
layers.

For interface teardown there are multiple paths:
(a) a cloned interface is destroyed (inside a VIMAGE or in the base system),
(b) any interface is moved from a virtual network stack to a different
network stack ("vmove"), or (c) a virtual network stack is being shut down.
All code paths go through if_detach_internal() where we, depending on the
vmove flag or the vnet state, make a decision on how much to shut down;
in case we are destroying a VNET the individual protocol layers will
cleanup their own parts thus we cannot do so again for each interface as
we end up with, e.g., double-frees, destroying locks twice or acquiring
already destroyed locks.
When calling into protocol cleanups we equally have to tell them
whether they need to detach upper layer protocols ("ulp") or not
(e.g., in6_ifdetach()).

Provide or enahnce helper functions to do proper cleanup at a protocol
rather than at an interface level.

Approved by:		re (hrs)
Obtained from:		projects/vnet
Reviewed by:		gnn, jhb
Sponsored by:		The FreeBSD Foundation
MFC after:		2 weeks
Differential Revision:	https://reviews.freebsd.org/D6747
2016-06-21 13:48:49 +00:00
pfg
0f12e1993e Remove the SIOCSIFALIFETIME_IN6 ioctl.
The SIOCSIFALIFETIME_IN6 provided by the kame project is unused,
it can't really be used safely and has been completely removed from
NetBSD and OpenBSD.

Obtained from:	NetBSD (kern/35897)
PR:		210148 (exp-run)
Reviewed by:	ae, hrs
Relnotes:	yes
Approved by:	re (glebius)
Differential Revision:	https://reviews.freebsd.org/D5491
2016-06-13 22:31:16 +00:00
ae
48b268cd67 Cleanup unneded include "opt_ipfw.h".
It was used for conditional build IPFIREWALL_FORWARD support.
But IPFIREWALL_FORWARD option was removed a long time ago.
2016-06-09 05:48:34 +00:00
bz
5baf25edd0 Make KASSERT message more useful by printing the variables on which
we assert.

Obtained from:	projects/vnet
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2016-06-06 22:34:12 +00:00
bz
aaac6c5a13 Move the callout_reset() to the end of the work not having it stick
before we do anything.

Obtained from:	projects/vnet
MFC after:	2 week
Sponsored by:	The FreeBSD Foundation
2016-06-06 14:01:09 +00:00
bz
69cdb2137c Introduce a per-VNET flag to enable/disable netisr prcessing on that VNET.
Add accessor functions to toggle the state per VNET.
The base system (vnet0) will always enable itself with the normal
registration. We will share the registered protocol handlers in all
VNETs minimising duplication and management.
Upon disabling netisr processing for a VNET drain the netisr queue from
packets for that VNET.

Update netisr consumers to (de)register on a per-VNET start/teardown using
VNET_SYS(UN)INIT functionality.

The change should be transparent for non-VIMAGE kernels.

Reviewed by:	gnn (, hiren)
Obtained from:	projects/vnet
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D6691
2016-06-03 13:57:10 +00:00
gnn
d75e0c471e This change re-adds L2 caching for TCP and UDP, as originally added in D4306
but removed due to other changes in the system. Restore the llentry pointer
to the "struct route", and use it to cache the L2 lookup (ARP or ND6) as
appropriate.

Submitted by:	Mike Karels
Differential Revision:	https://reviews.freebsd.org/D6262
2016-06-02 17:51:29 +00:00
markj
8b17712ca6 Exploit r301213 to fix in6 ifaddr locking in pfxlist_onlink_check().
Reviewed by:	ae, hrs
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D6639
2016-06-02 17:21:57 +00:00
markj
71ff51c027 Always start IPv6 DAD asynchronously.
Otherwise we transmit the first neighbour solicitation in the context of the
caller of nd6_dad_start(), which can easily result in lock recursion. When
DAD is to be started after some delay, we send the first NS from the DAD
callout handler, so just change the implementation to do this in the
non-delayed case as well.

Reviewed by:	ae, hrs
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D6639
2016-06-02 17:17:15 +00:00
bz
fac944a70a The pr_destroy field does not allow us to run the teardown code in a
specific order.  VNET_SYSUNINITs however are doing exactly that.
Thus remove the VIMAGE conditional field from the domain(9) protosw
structure and replace it with VNET_SYSUNINITs.
This also allows us to change some order and to make the teardown functions
file local static.
Also convert divert(4) as it uses the same mechanism ip(4) and ip6(4) use
internally.

Slightly reshuffle the SI_SUB_* fields in kernel.h and add a new ones, e.g.,
for pfil consumers (firewalls), partially for this commit and for others
to come.

Reviewed by:		gnn, tuexen (sctp), jhb (kernel.h)
Obtained from:		projects/vnet
MFC after:		2 weeks
X-MFC:			do not remove pr_destroy
Sponsored by:		The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D6652
2016-06-01 10:14:04 +00:00
tuexen
525f16040f Add PR_CONNREQUIRED for SOCK_STREAM sockets using SCTP.
This is required to signal connetion setup on non-blocking sockets
via becoming writable. This still allows for implicit connection
setup.

MFC after:	1 week
2016-05-30 18:24:23 +00:00
glebius
bbfa6d2853 Plug route reference underleak that happens with FLOWTABLE after r297225.
Submitted by:	Mike Karels <mike karels.net>
2016-05-27 17:31:02 +00:00
markj
9cb221bdb0 Mark the prefix and default router list sysctl handlers MPSAFE.
MFC after:	2 weeks
2016-05-23 20:18:11 +00:00
markj
86c15ee95b Acquire the nd6 lock in the prefix list sysctl handler.
The nd6 lock will be used to synchronize access to the NDP prefix list.

MFC after:	2 weeks
Tested by:	Jason Wolfe (as part of a larger change)
2016-05-23 20:15:08 +00:00
ae
78161c3462 Remove ip6 adjusting from the place where pointer couldn't be changed.
And add comment after calling PFIL hooks, where it could be changed.
2016-05-20 12:17:40 +00:00
ae
eef5384953 Remove ip6 pointer initialization and strange check from the beginning
of ip6_output(). It isn't used until the first time adjusted.
Remove the comment about adjusting where it is actually initialized.
2016-05-20 12:09:10 +00:00
markj
a09fd6097b Move IPv6 malloc tag definitions into the IPv6 code. 2016-05-20 04:45:08 +00:00
ae
0412106b46 Since PFIL can change destination address, use its always actual value
from mbuf when calculating path mtu. Remove now unused finaldst variable.
Also constify dst argument in ip6_getpmtu() and ip6_getpmtu_ctl().

Reviewed by:	melifaro
Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
2016-05-19 12:45:20 +00:00
ae
d1f53cbfea Call RO_RTFREE() when we have detected the change of destination
address, otherwise the old route will be used with new destination.

MFC after:	1 week
2016-05-17 14:06:55 +00:00
markj
a0640b8262 Use Node Information flag names instead of hard-coding their values.
MFC after:	1 week
2016-05-15 03:22:13 +00:00
markj
43beb421e5 Add sysctl descriptions for net.inet6.ip6 and net.inet6.icmp6.
icmp6.redirtimeout, icmp6.nd6_maxnudhint and ip6.rr_prune are left
undocumented as they appear to have no effect. Some existing sysctl
descriptions were modified for consistency and style, and the
ip6.tempvltime and ip6.temppltime handlers were rewritten to be a bit
simpler and to avoid setting the sysctl value before validating it.

MFC after:	3 weeks
2016-05-15 03:18:03 +00:00
markj
c5c6630f07 Remove an always-false error check in the AIFADDR_IN6 handler.
CID:		1250792
MFC after:	1 week
2016-05-15 03:01:40 +00:00
markj
4b9a93dfb3 Remove obsolescent comments from nd6_purge().
MFC after:	1 week
2016-05-09 23:43:12 +00:00
markj
94a1c25725 Clean up callers of nd6_prelist_add().
nd6_prelist_add() sets *newp if and only if it is successful, so there's no
need for code that handles the case where the return value is 0 and
*newp == NULL. Fix some style bugs in nd6_prelist_add() while here.

MFC after:	1 week
2016-05-07 03:41:29 +00:00
markj
557551b31f Remove two useless local variables from prelist_update().
MFC after:	1 week
2016-05-07 03:32:29 +00:00
pfg
d9c9113377 sys/net*: minor spelling fixes.
No functional change.
2016-05-03 18:05:43 +00:00
tuexen
a750782f5b When a client uses UDP encapsulation and lists IP addresses in the INIT
chunk, enable UDP encapsulation for all those addresses.
This helps clients using a userland stack to support multihoming if
they are not behind a NAT.

MFC after: 1 week
2016-05-01 21:48:55 +00:00
tuexen
3e7292aa0b Use correct order of source and destination address and port. 2016-04-29 20:13:35 +00:00
rrs
64e463c093 Complete the UDP tunneling of ICMP msgs to those protocols
interested in having tunneled UDP and finding out about the
ICMP (tested by Michael Tuexen with SCTP.. soon to be using
this feature).

Differential Revision:	http://reviews.freebsd.org/D5875
2016-04-28 15:53:10 +00:00
cem
23a478288f in_lltable_alloc and in6 copy: Don't leak LLE in error path
Fix a memory leak in error conditions introduced in r292978.

Reported by:	Coverity
CIDs:		1347009, 1347010
Sponsored by:	EMC / Isilon Storage Division
2016-04-26 23:13:48 +00:00
loos
cfc8d71705 Fixes the comment to reflect the code.
Sponsored by:	Rubicon Communications (Netgate)
2016-04-25 23:12:39 +00:00
pfg
32dcf3933a Indentation issues.
Contract some lines leftover from r298310.

Mea culpa.
2016-04-20 16:19:44 +00:00
pfg
a7d40a88c9 kernel: use our nitems() macro when it is available through param.h.
No functional change, only trivial cases are done in this sweep,

Discussed in:	freebsd-current
2016-04-19 23:48:27 +00:00
tuexen
f78898772a Address issues found by the XCode code analyzer. 2016-04-18 20:16:41 +00:00
tuexen
42159e8af3 Fix the ICMP6 handling for SCTP.
Keep the IPv4 code in sync.

MFC after:	1 week
2016-04-16 21:34:49 +00:00
pfg
12232f8463 sys/net* : for pointers replace 0 with NULL.
Mostly cosmetical, no functional change.

Found with devel/coccinelle.
2016-04-15 17:30:33 +00:00
ae
3f81fe2ce0 Fix regression introduced in r296986.
Currently we don't keep zoneid in in6_ifaddr structure, because there
is still some code, that doesn't properly initialize sin6_scope_id,
but some functions use sa_equal() for addresses comparison. sa_equal()
compares full sockaddr_in6 structures and such comparison will fail.
For now use zero zoneid in in6ifa_ifwithaddr(). It is safe, because
used address is in embedded form. In future we will use zoneid, so mark it
with XXX comment.

Reported by:	kp
Tested by:	kp
2016-04-08 11:13:24 +00:00
gnn
43026a8c5f Unbreak the RSS/PCBGROUp build. 2016-03-31 00:53:23 +00:00
markj
4081472216 Fix the lladdr copy in in6_lltable_dump_entry() after r292978.
This bug caused "ndp -a" to show the wrong link layer address for neighbour
cache entries.

PR:	208067
2016-03-30 00:03:59 +00:00
markj
db6b45eb6c Modify nd6_llinfo_timer() to acquire the nd6 lock before the LLE lock.
When expiring a neighbour cache entry we may need to look up the associated
default router, which requires the nd6 read lock. To avoid an LOR, the nd6
lock should be acquired first.

X-MFC-With:	r296063
Tested by:	Larry Rosenman <ler@lerctr.org> (previous revision)
2016-03-29 19:23:00 +00:00
gnn
c3d5404bbe FreeBSD previously provided route caching for TCP (and UDP). Re-add
route caching for TCP, with some improvements. In particular, invalidate
the route cache if a new route is added, which might be a better match.
The cache is automatically invalidated if the old route is deleted.

Submitted by:	Mike Karels
Reviewed by:	gnn
Differential Revision:	https://reviews.freebsd.org/D4306
2016-03-24 07:54:56 +00:00
bz
d4cf530887 Mfp4 @180378:
Factor out nd6 and in6_attach initialization to their own files.
  Also move destruction into those files though still called from
  the central initialization.

  Sponsored by:	CK Software GmbH
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Reviewed by:	gnn
Differential Revision:	https://reviews.freebsd.org/D5033
2016-03-22 15:43:47 +00:00
markj
8c873a2b1b Modify defrouter_remove() to perform the router lookup before removal.
This allows some simplification of its callers. No functional change
intended.

Tested by:	Larry Rosenman (as part of a larger change)
MFC after:	1 month
2016-03-17 19:01:44 +00:00
ae
2d399eaa6e Reduce the number of local variables. Remove redundant check that inp
pointer isn't NULL, it is safe, because we are handling IPV6_PKTINFO
socket option in this block of code. Also, use in6ifa_withaddr() instead
of ifa_withaddr().
2016-03-17 11:10:44 +00:00
ae
a3a4fe0039 Change in6_selectsrc() to allow usage of non-local IPv6 addresses in
IPV6_PKTINFO ancillary data when IPV6_BINDANY socket option is set.

Submitted by:	n_hibma
MFC after:	2 weeks
2016-03-17 10:59:30 +00:00
glebius
163857deb4 New way to manage reference counting of mbuf external storage.
The m_ext.ext_cnt pointer becomes a union. It can now hold the refcount
value itself. To tell that m_ext.ext_flags flag EXT_FLAG_EMBREF is used.
The first mbuf to attach a cluster stores the refcount. The further mbufs
to reference the cluster point at refcount in the first mbuf. The first
mbuf is freed only when the last reference is freed.

The benefit over refcounts stored in separate slabs is that now refcounts
of different, unrelated mbufs do not share a cache line.

For EXT_EXTREF mbufs the zone_ext_refcnt is no longer needed, and m_extadd()
becomes void, making widely used M_EXTADD macro safe.

For EXT_SFBUF mbufs the sf_ext_ref() is removed, which was an optimization
exactly against the cache aliasing problem with regular refcounting.

Discussed with:		rrs, rwatson, gnn, hiren, sbruno, np
Reviewed by:		rrs
Differential Revision:	https://reviews.freebsd.org/D5396
Sponsored by:		Netflix
2016-03-01 00:17:14 +00:00
markj
9524b1b571 Lock the NDP default router list and count defrouter references.
This addresses a number of race conditions that can cause crashes as a
result of unsynchronized access to the list.

PR:		206904
Tested by:	Larry Rosenman <ler@lerctr.org>,
		Kevin Bowling <kevin.bowling@kev009.com>
MFC after:	2 months
Differential Revision: https://reviews.freebsd.org/D5315
2016-02-25 20:12:05 +00:00
tuexen
63d9199ac6 Don't leak an address in an error path.
CID:		1351729
MFC after:	3 days
2016-02-23 18:50:34 +00:00
tuexen
3c0af23540 Fix reporting of mapped addressed in getpeername() and getsockname() for
IPv6 SCTP sockets.
This bugs were found because of an issue reported by PVS / D5245.
2016-02-18 21:05:04 +00:00
markj
2bf18fe55d Release the ref acquired in nd6_dad_find() if DAD is already in progress.
MFC after:	1 week
2016-02-18 00:00:51 +00:00
markj
bbc632b86e Use pfxrtr_del() instead of freeing advertising routers directly.
MFC after:	1 week
2016-02-17 23:55:24 +00:00
markj
7f716b8a59 Remove a prototype for the non-existent prelist_del().
MFC after:	1 week
2016-02-17 23:53:24 +00:00
glebius
c905b9e75c Ternary operator has lower priority than OR.
Found by:	PVS-Studio
2016-02-17 21:17:14 +00:00
markj
67a2e23340 Add a missing newline to a log message.
MFC after:	1 week
2016-02-12 21:17:00 +00:00
markj
9ec0a4438f Rename the flags field of struct nd_defrouter to "raflags".
This field contains the flags inherited from the corresponding router
advertisement message and is not for storing private state.

MFC after:	1 week
2016-02-12 21:15:57 +00:00
markj
e659a689f9 Simplify defrtrlist_update() slightly in preparation for future changes.
No functional change intended.

MFC after:	1 week
2016-02-12 21:06:48 +00:00
markj
151140ce44 Remove a bogus comment from nd6_na_input().
The splnet() call that it refers to has been removed, and a lock for the
default router list is in fact needed.

MFC after:	1 week
2016-02-12 21:01:53 +00:00
markj
88e4f89c6d Remove superfluous return statements from the neighbour discovery code.
MFC after:	1 week
2016-02-12 20:55:22 +00:00
markj
0309ab8781 Fix style around allocations from M_IP6NDP.
- Don't cast the return value of malloc(9).
- Use M_ZERO instead of explicitly calling bzero(9).

MFC after:	1 week
2016-02-12 20:52:53 +00:00
markj
7326876626 Remove some unreferenced NDP debug variable definitions.
MFC after:	1 week
2016-02-12 20:46:53 +00:00
dteske
51b30e8967 Merge SVN r295220 (bz) from projects/vnet/
Fix a panic that occurs when a vnet interface is unavailable at the time the
vnet jail referencing said interface is stopped.

Sponsored by:	FIS Global, Inc.
2016-02-11 17:07:19 +00:00
glebius
306a6faf84 These files were getting sys/malloc.h and vm/uma.h with header pollution
via sys/mbuf.h
2016-02-01 17:41:21 +00:00
melifaro
23582454c7 MFP r287070,r287073: split radix implementation and route table structure.
There are number of radix consumers in kernel land (pf,ipfw,nfs,route)
  with different requirements. In fact, first 3 don't have _any_ requirements
  and first 2 does not use radix locking. On the other hand, routing
  structure do have these requirements (rnh_gen, multipath, custom
  to-be-added control plane functions, different locking).
Additionally, radix should not known anything about its consumers internals.

So, radix code now uses tiny 'struct radix_head' structure along with
  internal 'struct radix_mask_head' instead of 'struct radix_node_head'.
  Existing consumers still uses the same 'struct radix_node_head' with
  slight modifications: they need to pass pointer to (embedded)
  'struct radix_head' to all radix callbacks.

Routing code now uses new 'struct rib_head' with different locking macro:
  RADIX_NODE_HEAD prefix was renamed to RIB_ (which stands for routing
  information base).

New net/route_var.h header was added to hold routing subsystem internal
  data. 'struct rib_head' was placed there. 'struct rtentry' will also
  be moved there soon.
2016-01-25 06:33:15 +00:00
melifaro
962b1b7134 Fix rte refcount leak in ip6_forward().
Reviewed by:	ae
MFC after:	2 weeks
Sponsored by:	Yandex LLC
2016-01-20 11:25:30 +00:00
glebius
51f55053b6 Verify the packet length in sctp6_input().
The sctp6_ctlinput() function does not properly check the length of the packet
it receives from the ICMP6 input routine. This means that an attacker can craft
a packet that will cause a kernel panic.

When the kernel receives an ICMP6 error message with one of the types/codes
it handles, it calls icmp6_notify_error() to deliver it to the upper-level
protocol. icmp6_notify_error() cycles through the extension headers (if any)
to find the protocol number of the first non-extension header. It does NOT
verify the length of the non-extension header.

It passes information about the packet (including the actual packet) to the
upper-level protocol's pr_ctlinput function. In the case of SCTP for IPv6,
icmp6_notify_error() calls sctp6_ctlinput().

sctp6_ctlinput() assumes that the incoming packet contains a sufficiently-long
SCTP header and calls m_copydata() to extract a copy of that header. In turn,
m_copydata() assumes that the caller has already verified that the offset and
length parameters are correct. If they are incorrect, it will dereference a
NULL pointer and cause a kernel panic.

In short, no one is sufficiently verifying the input, and the result is a
kernel panic.

Submitted by:	jtl
Security:	SA-16:01.sctp
2016-01-14 10:11:10 +00:00
melifaro
7cc47d54cd Bring RADIX_MPATH support to new routing KPI to ease migration.
Move actual rte selection process from rtalloc_mpath_fib()
  to the rt_path_selectrte() function. Add public
  rt_mpath_select() to use in fibX_lookup_ functions.
2016-01-11 08:45:28 +00:00
melifaro
21632a9bd9 Split in6_selectsrc() into in6_selectsrc_addr() and in6_selectsrc_socket().
in6_selectsrc() has 2 class of users: socket-based one (raw/udp/pcb/etc) and
  socket-less (ND code). The main reason for that change is inability to
  specify non-default FIB for callers w/o socket since (internally) inpcb
  is used to determine fib.

As as result, add 2 wrappers for in6_selectsrc() (making in6_selectsrc()
  static):
1) in6_selectsrc_socket() for the former class. Embed scope_ambiguous check
  along with returning hop limit when needed.
2) in6_selectsrc_addr() for the latter case. Add 'fibnum' argument and
  pass IPv6 address  w/ explicitly specified scope as separate argument.

Reviewed by:	ae (previous version)
2016-01-10 13:40:29 +00:00
melifaro
62e557108e Do not hold ifaddr reference for the whole icmp6_reflect() exec time.
Copy source address, calculate hlim and release refcount instead.
2016-01-10 11:59:55 +00:00
melifaro
fda8b5679f Remove prefix check from in6_addroute().
This check was added in initial? netinet6/ import
  back in 1999 (r53541).
It effectively became unnecessary after 'address/prefix clean-ups'
  KAME commit 90ff8792e676132096a440dd787f99a5a5860ee8 (github) in 2001
  (merged to FreeBSD in r78064) where prefix check was added to
  nd6_prefix_onlink(). Similar IPv4 check (in_addroute() was added
  in r137628).
Additionally, the right plance for this (or similar) check is the prefix
  addition code (nd6_prefix_onlink(), nd6_prefix_onlink_rtrequest(),
  in_addprefix() or rtinit()), but not the generic radix insert routine.
2016-01-09 11:41:37 +00:00
melifaro
14cf7637d1 Remove sys/eventhandler.h from net/route.h
Reviewed by:	ae
2016-01-09 09:34:39 +00:00
melifaro
4fe868c921 Finish r293098: make ip6_getpmtu() and ip6_getpmtu_ctl() use new routing API 2016-01-04 18:32:24 +00:00
melifaro
31d78f6810 Add rib_lookup_info() to provide API for retrieving individual route
entries data in unified format.

There are control plane functions that require information other than
  just next-hop data (e.g. individual rtentry fields like flags or
  prefix/mask). Given that the goal is to avoid rte reference/refcounting,
  re-use rt_addrinfo structure to store most rte fields. If caller wants
  to retrieve key/mask or gateway (which are sockaddrs and are allocated
  separately), it needs to provide sufficient-sized sockaddrs structures
  w/ ther pointers saved in passed rt_addrinfo.

Convert:
  * lltable new records checks (in_lltable_rtcheck(),
    nd6_is_new_addr_neighbor().
  * rtsock pre-add/change route check.
  * IPv6 NS ND-proxy check (RADIX_MPATH code was eliminated because
     1) we don't support RTF_ANNOUNCE ND-proxy for networks and there should
       not be multiple host routes for such hosts 2) if we have multiple
       routes we should inspect them (which is not done). 3) the entire idea
       of abusing KRT as storage for ND proxy seems odd. Userland programs
       should be used for that purpose).
2016-01-04 15:03:20 +00:00
melifaro
113d546f8e Remove 'struct route_int6' argument from in6_selectsrc() and
in6_selectif().

The main task of in6_selectsrc() is to return IPv6 SAS (along with
  output interface used for scope checks). No data-path code uses
  route argument for caching. The only users are icmp6 (reflect code),
  ND6 ns/na generation code. All this fucntions are control-plane, so
  there is no reason to try to 'optimize' something by passing cached
  route into to ip6_output(). Given that, simplify code by eliminating
  in6_selectsrc() 'struct route_in6' argument. Since in6_selectif() is
  used only by in6_selectsrc(), eliminate its 'struct route_in6' argument,
  too. While here, reshape rte-related code inside in6_selectif() to
  free lookup result immediately after saving all the needed fields.
2016-01-03 10:43:23 +00:00
melifaro
c0fd3127f0 Handle IPV6_PATHMTU option by spliting ip6_getpmtu_ctl() from ip6_getpmtu().
Add ro_mtu field to 'struct route' to be able to pass lookup MTU back to
  the caller.

Currently, ip6_getpmtu() has 2 totally different use cases:
1) control plane (IPV6_PATHMTU req), where we just need to calculate MTU
  and return it, w/o any reusability.
2) Actual ip6_output() data path where we (nearly) always use the provided
  route lookup data. If this data is not 'valid' we need to perform another
  lookup and save the result (which cannot be re-used by ip6_output()).

Given that, handle 1) by calling separate function doing rte lookup itself.
  Resulting MTU is calculated by (newly-added) ip6_calcmtu() used by both
  ip6_getpmtu_ctl() and ip6_getpmtu().
For 2) instead of storing ref'ed rte, store mtu (the only needed data
  from the lookup result) inside newly-added ro_mtu field.
  'struct route' was shrinked by 8(or 4 bytes) in r292978. Grow it again
  by 4 bytes. New ro_mtu field will be used in other places like
  ip/tcp_output (EMSGSIZE handling from output routines).

Reviewed by:	ae
2016-01-03 09:54:03 +00:00
melifaro
eceeaeddc0 Use lltable_get_ifp() instead of direct access to lltable fields. 2016-01-01 12:35:33 +00:00
melifaro
93152c67c9 Implement interface link header precomputation API.
Add if_requestencap() interface method which is capable of calculating
  various link headers for given interface. Right now there is support
  for INET/INET6/ARP llheader calculation (IFENCAP_LL type request).
  Other types are planned to support more complex calculation
  (L2 multipath lagg nexthops, tunnel encap nexthops, etc..).

Reshape 'struct route' to be able to pass additional data (with is length)
  to prepend to mbuf.

These two changes permits routing code to pass pre-calculated nexthop data
  (like L2 header for route w/gateway) down to the stack eliminating the
  need for other lookups. It also brings us closer to more complex scenarios
  like transparently handling MPLS nexthops and tunnel interfaces.
  Last, but not least, it removes layering violation introduced by flowtable
  code (ro_lle) and simplifies handling of existing if_output consumers.

ARP/ND changes:
Make arp/ndp stack pre-calculate link header upon installing/updating lle
  record. Interface link address change are handled by re-calculating
  headers for all lles based on if_lladdr event. After these changes,
  arpresolve()/nd6_resolve() returns full pre-calculated header for
  supported interfaces thus simplifying if_output().
Move these lookups to separate ether_resolve_addr() function which ether
  returs error or fully-prepared link header. Add <arp|nd6_>resolve_addr()
  compat versions to return link addresses instead of pre-calculated data.

BPF changes:
Raw bpf writes occupied _two_ cases: AF_UNSPEC and pseudo_AF_HDRCMPLT.
Despite the naming, both of there have ther header "complete". The only
  difference is that interface source mac has to be filled by OS for
  AF_UNSPEC (controlled via BIOCGHDRCMPLT). This logic has to stay inside
  BPF and not pollute if_output() routines. Convert BPF to pass prepend data
  via new 'struct route' mechanism. Note that it does not change
  non-optimized if_output(): ro_prepend handling is purely optional.
Side note: hackish pseudo_AF_HDRCMPLT is supported for ethernet and FDDI.
  It is not needed for ethernet anymore. The only remaining FDDI user is
  dev/pdq mostly untouched since 2007. FDDI support was eliminated from
  OpenBSD in 2013 (sys/net/if_fddisubr.c rev 1.65).

Flowtable changes:
  Flowtable violates layering by saving (and not correctly managing)
  rtes/lles. Instead of passing lle pointer, pass pointer to pre-calculated
  header data from that lle.

Differential Revision:	https://reviews.freebsd.org/D4102
2015-12-31 05:03:27 +00:00
jtl
3504bdc019 Add the appropriate case statement for IPV6_BINDMULTI so the option can be
retrieved with getsockopt().

CID:	1229928
Differential Revision:	https://reviews.freebsd.org/D4737
Reviewed by:	adrian
Sponsored by:	Juniper Networks
2015-12-30 18:08:05 +00:00