Commit Graph

1755 Commits

Author SHA1 Message Date
Andrey V. Elsukov
8428914909 Clear h/w csum flags on mbuf handled by UDP.
When checksums of received IP and UDP header already checked, UDP uses
sbappendaddr_locked() to pass received data to the socket.
sbappendaddr_locked() uses given mbuf as is, and if NIC supports checksum
offloading, mbuf contains csum_data and csum_flags that were calculated
for already stripped headers. Some NICs support only limited checksums
offloading and do not use CSUM_PSEUDO_HDR flag, and csum_data contains
some value that UDP/TCP should use for pseudo header checksum calculation.

When L2TP is used for tunneling with mpd5, ng_ksocket receives mbuf with
filled csum_flags and csum_data, that were calculated for outer headers.
When L2TP header is stripped, a packet that was tunneled goes to the IP
layer and due to presence of csum_flags (without CSUM_PSEUDO_HDR) and
csum_data, the UDP/TCP checksum check fails for this packet.

Reported by:	Irina Liakh <spell at itl ua>
Tested by:	Irina Liakh <spell at itl ua>
MFC after:	1 week
2017-04-13 17:03:57 +00:00
Steven Hartland
4d806fc663 Allow explicitly assigned IPv6 loopback address to be used in jails
If a jail has an explicitly assigned IPv6 loopback address then allow it
to be used instead of remapping requests for the loopback adddress to the
first IPv6 address assigned to the jail.

This fixes issues where applications attempt to detect their bound port
where they requested a loopback address, which was available, but instead
the kernel remapped it to the jails first address.

This is the same fix applied to IPv4 fix by: r316313

Also:
* Correct the description of prison_check_ip6_locked to match the code.

MFC after:	2 weeks
Relnotes:	Yes
Sponsored by:	Multiplay
2017-03-31 09:10:05 +00:00
Mike Karels
8c1960d506 Fix reference count leak with L2 caching.
ip_forward, TCP/IPv6, and probably SCTP leaked references to L2 cache
entry because they used their own routes on the stack, not in_pcb routes.
The original model for route caching was callers that provided a route
structure to ip{,6}input() would keep the route, and this model was used
for L2 caching as well. Instead, change L2 caching to be done by default
only when using a route structure in the in_pcb; the pcb deallocation
code frees L2 as well as L3 cacches. A separate change will add route
caching to TCP/IPv6.

Another suggestion was to have the transport protocols indicate willingness
to use L2 caching, but this approach keeps the changes in the network
level

Reviewed by:    ae gnn
MFC after:      2 weeks
Differential Revision:  https://reviews.freebsd.org/D10059
2017-03-25 15:06:28 +00:00
Alan Somers
559b42968c Constrain IPv6 routes to single FIBs when net.add_addr_allfibs=0
sys/netinet6/icmp6.c
	Use the interface's FIB for source address selection in ICMPv6 error
	responses.

sys/netinet6/in6.c
	In in6_newaddrmsg, announce arrival of local addresses on the
	interface's FIB only.  In in6_lltable_rtcheck, use a per-fib ND6
	cache instead of a single cache.

sys/netinet6/in6_src.c
	In in6_selectsrc, use the caller's fib instead of the default fib.
	In in6_selectsrc_socket, remove a superfluous check.

sys/netinet6/nd6.c
	In nd6_lle_event, use the interface's fib for routing socket
	messages.  In nd6_is_new_addr_neighbor, check all FIBs when trying
	to determine whether an address is a neighbor.  Also, simplify the
	code for point to point interfaces.

sys/netinet6/nd6.h
sys/netinet6/nd6.c
sys/netinet6/nd6_rtr.c
	Make defrouter_select fib-aware, and make all of its callers pass in
	the interface fib.

sys/netinet6/nd6_nbr.c
	When inputting a Neighbor Solicitation packet, consider the
	interface fib instead of the default fib for DAD.  Output NS and
	Neighbor Advertisement packets on the correct fib.

sys/netinet6/nd6_rtr.c
	Allow installing the same host route on different interfaces in
	different FIBs.  If rt_add_addr_allfibs=0, only install or delete
	the prefix route on the interface fib.

tests/sys/netinet/fibs_test.sh
	Clear some expected failures, but add a skip for the newly revealed
	BUG217871.

PR:		196361
Submitted by:	Erick Turnquist <jhujhiti@adjectivism.org>
Reported by:	Jason Healy <jhealy@logn.net>
Reviewed by:	asomers
MFC after:	3 weeks
Sponsored by:	Spectra Logic Corp
Differential Revision:	https://reviews.freebsd.org/D9451
2017-03-17 16:50:37 +00:00
Ermal Luçi
dce33a45c9 The patch provides the same socket option as Linux IP_ORIGDSTADDR.
Unfortunately they will have different integer value due to Linux value being already assigned in FreeBSD.

The patch is similar to IP_RECVDSTADDR but also provides the destination port value to the application.

This allows/improves implementation of transparent proxies on UDP sockets due to having the whole information on forwarded packets.

Reviewed by:	adrian, aw
Approved by:	ae (mentor)
Sponsored by:	rsync.net
Differential Revision:	D9235
2017-03-06 04:01:58 +00:00
Warner Losh
fbbd9655e5 Renumber copyright clause 4
Renumber cluase 4 to 3, per what everybody else did when BSD granted
them permission to remove clause 3. My insistance on keeping the same
numbering for legal reasons is too pedantic, so give up on that point.

Submitted by:	Jan Schaumann <jschauma@stevens.edu>
Pull Request:	https://github.com/freebsd/freebsd/pull/96
2017-02-28 23:42:47 +00:00
Andrey V. Elsukov
9907aba370 When IPv6 fragments reassembly is complete, update mbuf's csum_data
and csum_flags using information from all fragments. This fixes
dropping of reassembled packets due to wrong checksum when the IPv6
checksum offloading is enabled on a network card.

Obtained from:	Yandex LLC
MFC after:	1 week
Sponsored by:	Yandex LLC
2017-02-28 22:58:19 +00:00
Andrey V. Elsukov
627c036f65 Remove IPsec related PCB code from SCTP.
The inpcb structure has inp_sp pointer that is initialized by
ipsec_init_pcbpolicy() function. This pointer keeps strorage for IPsec
security policies associated with a specific socket.
An application can use IP_IPSEC_POLICY and IPV6_IPSEC_POLICY socket
options to configure these security policies. Then ip[6]_output()
uses inpcb pointer to specify that an outgoing packet is associated
with some socket. And IPSEC_OUTPUT() method can use a security policy
stored in the inp_sp. For inbound packet the protocol-specific input
routine uses IPSEC_CHECK_POLICY() method to check that a packet conforms
to inbound security policy configured in the inpcb.

SCTP protocol doesn't specify inpcb for ip[6]_output() when it sends
packets. Thus IPSEC_OUTPUT() method does not consider such packets as
associated with some socket and can not apply security policies
from inpcb, even if they are configured. Since IPSEC_CHECK_POLICY()
method is called from protocol-specific input routine, it can specify
inpcb pointer and associated with socket inbound policy will be
checked. But there are two problems:
1. Such check is asymmetric, becasue we can not apply security policy
from inpcb for outgoing packet.
2. IPSEC_CHECK_POLICY() expects that caller holds INPCB lock and
access to inp_sp is protected. But for SCTP this is not correct,
becasue SCTP uses own locks to protect inpcb.

To fix these problems remove IPsec related PCB code from SCTP.
This imply that IP_IPSEC_POLICY and IPV6_IPSEC_POLICY socket options
will be not applicable to SCTP sockets. To be able correctly check
inbound security policies for SCTP, mark its protocol header with
the PR_LASTHDR flag.

Reported by:	tuexen
Reviewed by:	tuexen
Differential Revision:	https://reviews.freebsd.org/D9538
2017-02-13 11:37:52 +00:00
Ermal Luçi
c10c5b1eba Committed without approval from mentor.
Reported by:	gnn
2017-02-12 06:56:33 +00:00
Ermal Luçi
70d81c5e91 Use proper value for socket option on IPv6
Reported-by: ohartmann@walstatt.org
2017-02-10 06:20:27 +00:00
Ermal Luçi
4616026faf Revert r313527
Heh svn is not git
2017-02-10 05:58:16 +00:00
Ermal Luçi
c0fadfdbbf Correct missed variable name.
Reported-by: ohartmann@walstatt.org
2017-02-10 05:51:39 +00:00
Ermal Luçi
ed55edceef The patch provides the same socket option as Linux IP_ORIGDSTADDR.
Unfortunately they will have different integer value due to Linux value being already assigned in FreeBSD.

The patch is similar to IP_RECVDSTADDR but also provides the destination port value to the application.

This allows/improves implementation of transparent proxies on UDP sockets due to having the whole information on forwarded packets.

Sponsored-by: rsync.net
Differential Revision: D9235
Reviewed-by: adrian
2017-02-10 05:16:14 +00:00
Andrey V. Elsukov
fcf596178b Merge projects/ipsec into head/.
Small summary
 -------------

o Almost all IPsec releated code was moved into sys/netipsec.
o New kernel modules added: ipsec.ko and tcpmd5.ko. New kernel
  option IPSEC_SUPPORT added. It enables support for loading
  and unloading of ipsec.ko and tcpmd5.ko kernel modules.
o IPSEC_NAT_T option was removed. Now NAT-T support is enabled by
  default. The UDP_ENCAP_ESPINUDP_NON_IKE encapsulation type
  support was removed. Added TCP/UDP checksum handling for
  inbound packets that were decapsulated by transport mode SAs.
  setkey(8) modified to show run-time NAT-T configuration of SA.
o New network pseudo interface if_ipsec(4) added. For now it is
  build as part of ipsec.ko module (or with IPSEC kernel).
  It implements IPsec virtual tunnels to create route-based VPNs.
o The network stack now invokes IPsec functions using special
  methods. The only one header file <netipsec/ipsec_support.h>
  should be included to declare all the needed things to work
  with IPsec.
o All IPsec protocols handlers (ESP/AH/IPCOMP protosw) were removed.
  Now these protocols are handled directly via IPsec methods.
o TCP_SIGNATURE support was reworked to be more close to RFC.
o PF_KEY SADB was reworked:
  - now all security associations stored in the single SPI namespace,
    and all SAs MUST have unique SPI.
  - several hash tables added to speed up lookups in SADB.
  - SADB now uses rmlock to protect access, and concurrent threads
    can do SA lookups in the same time.
  - many PF_KEY message handlers were reworked to reflect changes
    in SADB.
  - SADB_UPDATE message was extended to support new PF_KEY headers:
    SADB_X_EXT_NEW_ADDRESS_SRC and SADB_X_EXT_NEW_ADDRESS_DST. They
    can be used by IKE daemon to change SA addresses.
o ipsecrequest and secpolicy structures were cardinally changed to
  avoid locking protection for ipsecrequest. Now we support
  only limited number (4) of bundled SAs, but they are supported
  for both INET and INET6.
o INPCB security policy cache was introduced. Each PCB now caches
  used security policies to avoid SP lookup for each packet.
o For inbound security policies added the mode, when the kernel does
  check for full history of applied IPsec transforms.
o References counting rules for security policies and security
  associations were changed. The proper SA locking added into xform
  code.
o xform code was also changed. Now it is possible to unregister xforms.
  tdb_xxx structures were changed and renamed to reflect changes in
  SADB/SPDB, and changed rules for locking and refcounting.

Reviewed by:	gnn, wblock
Obtained from:	Yandex LLC
Relnotes:	yes
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D9352
2017-02-06 08:49:57 +00:00
Andriy Voskoboinyk
2bbd06fc33 Garbage collect IFT_IEEE80211 (but leave the define for possible reuse)
This interface type ("a parent interface of wlanX") is not used since
r287197

Reviewed by:	adrian, glebius
Differential Revision:	https://reviews.freebsd.org/D9308
2017-01-28 17:08:40 +00:00
Luiz Otavio O Souza
338e227ac0 After the in_control() changes in r257692, an existing address is
(intentionally) deleted first and then completely added again (so all the
events, announces and hooks are given a chance to run).

This cause an issue with CARP where the existing CARP data structure is
removed together with the last address for a given VHID, which will cause
a subsequent fail when the address is later re-added.

This change fixes this issue by adding a new flag to keep the CARP data
structure when an address is not being removed.

There was an additional issue with IPv6 CARP addresses, where the CARP data
structure would never be removed after a change and lead to VHIDs which
cannot be destroyed.

Reviewed by:	glebius
Obtained from:	pfSense
MFC after:	2 weeks
Sponsored by:	Rubicon Communications, LLC (Netgate)
2017-01-25 19:04:08 +00:00
Hans Petter Selasky
f3e7afe2d7 Implement kernel support for hardware rate limited sockets.
- Add RATELIMIT kernel configuration keyword which must be set to
enable the new functionality.

- Add support for hardware driven, Receive Side Scaling, RSS aware, rate
limited sendqueues and expose the functionality through the already
established SO_MAX_PACING_RATE setsockopt(). The API support rates in
the range from 1 to 4Gbytes/s which are suitable for regular TCP and
UDP streams. The setsockopt(2) manual page has been updated.

- Add rate limit function callback API to "struct ifnet" which supports
the following operations: if_snd_tag_alloc(), if_snd_tag_modify(),
if_snd_tag_query() and if_snd_tag_free().

- Add support to ifconfig to view, set and clear the IFCAP_TXRTLMT
flag, which tells if a network driver supports rate limiting or not.

- This patch also adds support for rate limiting through VLAN and LAGG
intermediate network devices.

- How rate limiting works:

1) The userspace application calls setsockopt() after accepting or
making a new connection to set the rate which is then stored in the
socket structure in the kernel. Later on when packets are transmitted
a check is made in the transmit path for rate changes. A rate change
implies a non-blocking ifp->if_snd_tag_alloc() call will be made to the
destination network interface, which then sets up a custom sendqueue
with the given rate limitation parameter. A "struct m_snd_tag" pointer is
returned which serves as a "snd_tag" hint in the m_pkthdr for the
subsequently transmitted mbufs.

2) When the network driver sees the "m->m_pkthdr.snd_tag" different
from NULL, it will move the packets into a designated rate limited sendqueue
given by the snd_tag pointer. It is up to the individual drivers how the rate
limited traffic will be rate limited.

3) Route changes are detected by the NIC drivers in the ifp->if_transmit()
routine when the ifnet pointer in the incoming snd_tag mismatches the
one of the network interface. The network adapter frees the mbuf and
returns EAGAIN which causes the ip_output() to release and clear the send
tag. Upon next ip_output() a new "snd_tag" will be tried allocated.

4) When the PCB is detached the custom sendqueue will be released by a
non-blocking ifp->if_snd_tag_free() call to the currently bound network
interface.

Reviewed by:		wblock (manpages), adrian, gallatin, scottl (network)
Differential Revision:	https://reviews.freebsd.org/D3687
Sponsored by:		Mellanox Technologies
MFC after:		3 months
2017-01-18 13:31:17 +00:00
Mark Johnston
762d16d9e4 Improve some of the sysctl descriptions added in r299827.
Submitted by:	Marie Helene Kvello-Aune <marieheleneka@gmail.com>
		(original version)
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D5336
2017-01-16 19:35:19 +00:00
Maxim Sobolev
339efd75a4 Add a new socket option SO_TS_CLOCK to pick from several different clock
sources to return timestamps when SO_TIMESTAMP is enabled. Two additional
clock sources are:

o nanosecond resolution realtime clock (equivalent of CLOCK_REALTIME);
o nanosecond resolution monotonic clock (equivalent of CLOCK_MONOTONIC).

In addition to this, this option provides unified interface to get bintime
(equivalent of using SO_BINTIME), except it also supported with IPv6 where
SO_BINTIME has never been supported. The long term plan is to depreciate
SO_BINTIME and move everything to using SO_TS_CLOCK.

Idea for this enhancement has been briefly discussed on the Net session
during dev summit in Ottawa last June and the general input was positive.

This change is believed to benefit network benchmarks/profiling as well
as other scenarios where precise time of arrival measurement is necessary.

There are two regression test cases as part of this commit: one extends unix
domain test code (unix_cmsg) to test new SCM_XXX types and another one
implementis totally new test case which exchanges UDP packets between two
processes using both conventional methods (i.e. calling clock_gettime(2)
before recv(2) and after send(2)), as well as using setsockopt()+recv() in
receive path. The resulting delays are checked for sanity for all supported
clock types.

Reviewed by:    adrian, gnn
Differential Revision:  https://reviews.freebsd.org/D9171
2017-01-16 17:46:38 +00:00
Mark Johnston
8cd3b2042c Release the ND6 list lock before making a prefix off-link in nd6_timer().
Reported by:	Jim <BM-2cWfdfG5CJsquqkJyry7hZT9LypbSEWEkQ@bitmessage.ch>
X-MFC With:	r306829
2017-01-08 18:46:00 +00:00
Michael Tuexen
b7b84c0e02 Whitespace changes.
The toolchain for processing the sources has been updated. No functional
change.

MFC after:	3 days
2016-12-26 11:06:41 +00:00
Mark Johnston
62280740d6 Remove a bogus KASSERT from nd6_prefix_unlink().
The caller may unlink a prefix before purging referencing addresses. An
identical assertion in nd6_prefix_del() verifies that the addresses are
purged before the prefix is freed.

PR:		215372
X-MFC With:	r306829
2016-12-19 19:21:28 +00:00
Andrey V. Elsukov
ad9f4d6ab6 ip[6]_tryforward does inbound and outbound packet firewall processing.
This can lead to change of mbuf pointer (packet filter could do m_pullup(),
NAT, etc). Also in case of change of destination address, tryforward can
decide that packet should be handled by local system. In this case modified
mbuf can be returned to the ip[6]_input(). To handle this correctly, check
M_FASTFWD_OURS flag after return from ip[6]_tryforward. And if it is present,
update variables that depend from mbuf pointer and skip another inbound
firewall processing.

No objection from:	#network
MFC after:	3 weeks
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D8764
2016-12-19 11:02:49 +00:00
Andrey V. Elsukov
8a030e9c6e Modify IPv6 statistic accounting in ip6_input().
Add rcvif local variable to keep inbound interface pointer. Count
ifs6_in_discard errors in all "goto bad" cases. Now it will count
errors even if mbuf was freed. Modify all places where m->m_pkthdr.rcvif
is used to use local rcvif variable.

Obtained from:	Yandex LLC
MFC after:	1 month
2016-12-12 11:26:59 +00:00
Andrey V. Elsukov
5a1842a24a Add ip6_tryforward() - a run to completion forwarding implementation
for IPv6.

It gets performance benefits from reduced number of checks. It doesn't
copy mbuf to be able send ICMPv6 error message, because it keeps mbuf
unchanged until the moment, when the route decision has been made.
It doesn't do IPsec checks, and when some IPsec security policies present,
ip6_input() uses normal slow path.

Reviewed by:	bz, gnn
Obtained from:	Yandex LLC
MFC after:	1 month
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D8527
2016-12-12 10:57:32 +00:00
Michael Tuexen
5b495f17a5 Whitespace changes.
The tools using to generate the sources has been updated and produces
different whitespaces. Commit this seperately to avoid intermixing
these with real code changes.

MFC after:	3 days
2016-12-06 10:21:25 +00:00
Michael Tuexen
3e1465754f Make ICMPv6 hard error handling for TCP consistent with the ICMPv4
handling. Ensure that:
* Protocol unreachable errors are handled by indicating ECONNREFUSED
  to the TCP user for both IPv4 and IPv6. These were ignored for IPv6.
* Communication prohibited errors are handled by indicating ECONNREFUSED
  to the TCP user for both IPv4 and IPv6. These were ignored for IPv6.
* Hop Limited exceeded errors are handled by indicating EHOSTUNREACH
  to the TCP user for both IPv4 and IPv6.
  For IPv6 the TCP connected was dropped but errno wasn't set.

Reviewed by: gallatin, rrs
MFC after: 1 month
Sponsored by: Netflix
Differential Revision: 7904
2016-10-21 10:32:57 +00:00
George V. Neville-Neil
aec9c8d5a5 Limit the number of mbufs that can be allocated for IPV6_2292PKTOPTIONS
(and IPV6_PKTOPTIONS).

PR:		100219
Submitted by:	Joseph Kong
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D5157
2016-10-17 23:25:31 +00:00
Gleb Smirnoff
cc94f0c2d7 - Revert r300854, r303657 which tried to fix regression from r297225.
- Fix the regression proper way using RO_RTFREE().

Submitted by:	ae
2016-10-13 20:15:47 +00:00
Mark Johnston
d748f7efcd Lock the ND prefix list and add refcounting for prefixes.
This change extends the nd6 lock to protect the ND prefix list as well
as the list of advertising routers associated with each prefix. To handle
cases where the nd6 lock must be dropped while iterating over either the
prefix or default router lists, a generation counter is used to track
modifications to the lists. Additionally, a new mutex is used to serialize
prefix on-link/off-link transitions. This mutex must be acquired before
the nd6 lock and is held while updating the routing table in
nd6_prefix_onlink() and nd6_prefix_offlink().

Reviewed by:	ae, tuexen (SCTP bits)
Tested by:	Jason Wolfe <jason@llnw.com>,
		Larry Rosenman <ler@lerctr.org>
MFC after:	2 months
Differential Revision:	https://reviews.freebsd.org/D8125
2016-10-07 21:10:53 +00:00
Mark Johnston
c26158449e Reduce the number of conditional statements in nd6_prefix_onlink().
MFC after:	1 week
2016-10-07 21:03:18 +00:00
Mark Johnston
7b0e84b7c8 Combine several checks in nd6_prefix_offlink() into one.
MFC after:	1 week
2016-10-07 21:02:30 +00:00
Mark Johnston
7782accf13 Fix whitespace around prototypes in nd6_rtr.c.
MFC after:	1 week
2016-10-07 00:36:18 +00:00
Mark Johnston
07c1f95976 Fix a typo.
MFC after:	1 week
2016-10-07 00:35:28 +00:00
Mark Johnston
a88d6d7e07 Shorten and simplify some of the loops in pfxlist_onlink_check().
No functional change intended.

MFC after:	1 week
2016-10-07 00:34:57 +00:00
Mark Johnston
f7d91d8cdd Use a const reference to prefixes in nd6_is_new_addr_neighbor().
MFC after:	1 week
2016-10-07 00:26:36 +00:00
Mark Johnston
0ed7d74424 nd6_dad_timer(): don't assert that the address is tentative.
It appears that this assertion can be tripped in some cases when
multiple interfaces are on the same link. Until this is resolved, revert a
part of r306305 and simply log a message if the DAD timer fires on a
non-tentative address.

Reported by:	jhb
X-MFC With:	r306305
2016-10-01 01:30:34 +00:00
Andrey V. Elsukov
5a03e7819a Fix bug introduced in r274300.
In icmp6_reflect() use original source address of erroneous packet as
destination address for source selection algorithm when original
destination address is not one of our own.

Reported by:	Mark Kamichoff <prox at prolixium com>
Tested by:	Mark Kamichoff <prox at prolixium com>
MFC after:	1 week
2016-09-29 19:57:37 +00:00
Mark Johnston
970fe0938e Convert checks in nd6_dad_start() and nd6_dad_timer() to assertions.
In particular, these functions can assume they are operating on tentative
addresses.

MFC after:	2 weeks
2016-09-24 21:40:24 +00:00
Mark Johnston
0bbf244e9f Rename ndpr_refcnt to ndpr_addrcnt.
This field counts derived addresses and is not a true refcount for prefix
objects, so the previous name was misleading.

MFC after:	1 week
2016-09-24 01:14:25 +00:00
Mark Johnston
2d12d25c6a Reduce code duplication around NDP message handlers in icmp6_input().
No functional change intended.

MFC after:	2 weeks
2016-09-20 18:08:17 +00:00
Kevin Lo
c3bef61e58 Remove the 4.3BSD compatible macro m_copy(), use m_copym() instead.
Reviewed by:	gnn
Differential Revision:	https://reviews.freebsd.org/D7878
2016-09-15 07:41:48 +00:00
Mike Karels
0f5687f2ae Fix L2 caching for UDP over IPv6
ip6_output() was missing cache invalidation code analougous to
ip_output.c. r304545 disabled L2 caching for UDP/IPv6 as a workaround.
This change adds the missing cache invalidation code and reverts
r304545.

Reviewed by:	gnn
Approved by:	gnn (mentor)
Tested by:	peter@, Mike Andrews
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D7591
2016-08-24 00:52:30 +00:00
Bjoern A. Zeeb
77ecef378a Remove the kernel optoion for IPSEC_FILTERTUNNEL, which was deprecated
more than 7 years ago in favour of a sysctl in r192648.
2016-08-21 18:55:30 +00:00
Mike Karels
db727c1bd7 Disable L2 caching for UDP over IPv6
The ip6_output routine is missing L2 cache invalication as done
in ip_output.  Even with that code, some problems with UDP over
IPv6 have been reported.  Diabling L2 cache for that problem works
around the problem for now.

PR:		211872 211926
Reviewed by:	gnn
Approved by:	gnn (mentor)
MFC after:	immediate
2016-08-20 20:46:53 +00:00
Andrey V. Elsukov
d8caf56e9e Add ipfw_nat64 module that implements stateless and stateful NAT64.
The module works together with ipfw(4) and implemented as its external
action module.

Stateless NAT64 registers external action with name nat64stl. This
keyword should be used to create NAT64 instance and to address this
instance in rules. Stateless NAT64 uses two lookup tables with mapped
IPv4->IPv6 and IPv6->IPv4 addresses to perform translation.

A configuration of instance should looks like this:
 1. Create lookup tables:
 # ipfw table T46 create type addr valtype ipv6
 # ipfw table T64 create type addr valtype ipv4
 2. Fill T46 and T64 tables.
 3. Add rule to allow neighbor solicitation and advertisement:
 # ipfw add allow icmp6 from any to any icmp6types 135,136
 4. Create NAT64 instance:
 # ipfw nat64stl NAT create table4 T46 table6 T64
 5. Add rules that matches the traffic:
 # ipfw add nat64stl NAT ip from any to table(T46)
 # ipfw add nat64stl NAT ip from table(T64) to 64:ff9b::/96
 6. Configure DNS64 for IPv6 clients and add route to 64:ff9b::/96
    via NAT64 host.

Stateful NAT64 registers external action with name nat64lsn. The only
one option required to create nat64lsn instance - prefix4. It defines
the pool of IPv4 addresses used for translation.

A configuration of instance should looks like this:
 1. Add rule to allow neighbor solicitation and advertisement:
 # ipfw add allow icmp6 from any to any icmp6types 135,136
 2. Create NAT64 instance:
 # ipfw nat64lsn NAT create prefix4 A.B.C.D/28
 3. Add rules that matches the traffic:
 # ipfw add nat64lsn NAT ip from any to A.B.C.D/28
 # ipfw add nat64lsn NAT ip6 from any to 64:ff9b::/96
 4. Configure DNS64 for IPv6 clients and add route to 64:ff9b::/96
    via NAT64 host.

Obtained from:	Yandex LLC
Relnotes:	yes
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D6434
2016-08-13 16:09:49 +00:00
Stephen J. Kiernan
0ce1624d0e Move IPv4-specific jail functions to new file netinet/in_jail.c
_prison_check_ip4 renamed to prison_check_ip4_locked

Move IPv6-specific jail functions to new file netinet6/in6_jail.c
_prison_check_ip6 renamed to prison_check_ip6_locked

Add appropriate prototypes to sys/sys/jail.h

Adjust kern_jail.c to call prison_check_ip4_locked and
prison_check_ip6_locked accordingly.

Add netinet/in_jail.c and netinet6/in6_jail.c to the list of files that
need to be built when INET and INET6, respectively, are configured in the
kernel configuration file.

Reviewed by:	jtl
Approved by:	sjg (mentor)
Sponsored by:	Juniper Networks, Inc.
Differential Revision:	https://reviews.freebsd.org/D6799
2016-08-09 02:16:21 +00:00
Andrey V. Elsukov
723758b7ce Fix NULL pointer dereference.
ro pointer can be NULL when IPSec consumes mbuf.

PR:		211486
MFC after:	3 days
2016-08-02 12:18:06 +00:00
Andrew Gallatin
d4c22202e6 Rework IPV6 TCP path MTU discovery to match IPv4
- Re-write tcp_ctlinput6() to closely mimic the IPv4 tcp_ctlinput()

- Now that tcp_ctlinput6() updates t_maxseg, we can allow ip6_output()
  to send TCP packets without looking at the tcp host cache for every
  single transmit.

- Make the icmp6 code mimic the IPv4 code & avoid returning
  PRC_HOSTDEAD because it is so expensive.

Without these changes in place, every TCP6 pmtu discovery or host
unreachable ICMP resulted in a call to in6_pcbnotify() which walks the
tcbinfo table with the write lock held.  Because the tcbinfo table is
shared between IPv4 and IPv6, this causes huge scalabilty issues on
servers with lots of (~100K) TCP connections, to the point where even
a small percent of IPv6 traffic had a disproportionate impact on
overall throughput.

Reviewed by:	bz, rrs, ae (all earlier versions), lstewart (in Netflix's tree)
Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D7272
2016-08-01 17:02:21 +00:00
Stephen J. Kiernan
4ac21b4f09 Prepare for network stack as a module
- Move cr_canseeinpcb to sys/netinet/in_prot.c in order to separate the
   INET and INET6-specific code from the rest of the prot code (It is only
   used by the network stack, so it makes sense for it to live with the
   other network stack code.)
 - Move cr_canseeinpcb prototype from sys/systm.h to netinet/in_systm.h
 - Rename cr_seeotheruids to cr_canseeotheruids and cr_seeothergids to
   cr_canseeothergids, make them non-static, and add prototypes (so they
   can be seen/called by in_prot.c functions.)
 - Remove sw_csum variable from ip6_forward in ip6_forward.c, as it is an
   unused variable.

Reviewed by:	gnn, jtl
Approved by:	sjg (mentor)
Sponsored by:	Juniper Networks, Inc.
Differential Revision:	https://reviews.freebsd.org/D2901
2016-07-27 20:34:09 +00:00
Mike Karels
ea17754c5a Fix per-connection L2 caching in fast path
r301217 re-added per-connection L2 caching from a previous change,
but it omitted caching in the fast path.  Add it.

Reviewed By: gallatin
Approved by: gnn (mentor)
Differential Revision: https://reviews.freebsd.org/D7239
2016-07-22 02:11:49 +00:00
Andrey V. Elsukov
b867e84e95 Add ipfw_nptv6 module that implements Network Prefix Translation for IPv6
as defined in RFC 6296. The module works together with ipfw(4) and
implemented as its external action module. When it is loaded, it registers
as eaction and can be used in rules. The usage pattern is similar to
ipfw_nat(4). All matched by rule traffic goes to the NPT module.

Reviewed by:	hrs
Obtained from:	Yandex LLC
MFC after:	1 month
Relnotes:	yes
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D6420
2016-07-18 19:46:31 +00:00
Andrey V. Elsukov
7aeccebc0d Add net.inet6.ip6.intr_queue_maxlen sysctl. It can be used to
change netisr queue limit for IPv6 at runtime.

Obtained from:	Yandex LLC
MFC after:	2 weeks
Sponsored by:	Yandex LLC
2016-07-15 17:09:30 +00:00
Dimitry Andric
be39349169 Fix a page fault in ip6_setpktopt(), occurring when the pflog module is
loaded, and syncthing is started, which uses setsockopt(IPV6_PKGINFO).

This is because pflog interfaces do not normally have an IPv6 address,
causing the ND_IFINFO() macro to dereference a NULL pointer.

Reviewed by:	ae
PR:		210943
MFC after:	3 days
2016-07-13 19:41:19 +00:00
Michael Tuexen
55b8cd93ef Don't consider the socket when processing an incoming ICMP/ICMP6 packet,
which was triggered by an SCTP packet. Whether a socket exists, is just
not relevant.

Approved by: re (kib)
MFC after: 1 week
2016-06-23 09:13:15 +00:00
Andrey V. Elsukov
a844d49cc6 Fix the NULL pointer dereference for unresolved link layer entries in
the netinet6 code. Copy link layer address only when corresponding entry
has LLE_VALID flag.

PR:		210379
Approved by:	re (kib)
2016-06-22 11:29:21 +00:00
Bjoern A. Zeeb
89856f7e2d Get closer to a VIMAGE network stack teardown from top to bottom rather
than removing the network interfaces first. This change is rather larger
and convoluted as the ordering requirements cannot be separated.

Move the pfil(9) framework to SI_SUB_PROTO_PFIL, move Firewalls and
related modules to their own SI_SUB_PROTO_FIREWALL.
Move initialization of "physical" interfaces to SI_SUB_DRIVERS,
move virtual (cloned) interfaces to SI_SUB_PSEUDO.
Move Multicast to SI_SUB_PROTO_MC.

Re-work parts of multicast initialisation and teardown, not taking the
huge amount of memory into account if used as a module yet.

For interface teardown we try to do as many of them as we can on
SI_SUB_INIT_IF, but for some this makes no sense, e.g., when tunnelling
over a higher layer protocol such as IP. In that case the interface
has to go along (or before) the higher layer protocol is shutdown.

Kernel hhooks need to go last on teardown as they may be used at various
higher layers and we cannot remove them before we cleaned up the higher
layers.

For interface teardown there are multiple paths:
(a) a cloned interface is destroyed (inside a VIMAGE or in the base system),
(b) any interface is moved from a virtual network stack to a different
network stack ("vmove"), or (c) a virtual network stack is being shut down.
All code paths go through if_detach_internal() where we, depending on the
vmove flag or the vnet state, make a decision on how much to shut down;
in case we are destroying a VNET the individual protocol layers will
cleanup their own parts thus we cannot do so again for each interface as
we end up with, e.g., double-frees, destroying locks twice or acquiring
already destroyed locks.
When calling into protocol cleanups we equally have to tell them
whether they need to detach upper layer protocols ("ulp") or not
(e.g., in6_ifdetach()).

Provide or enahnce helper functions to do proper cleanup at a protocol
rather than at an interface level.

Approved by:		re (hrs)
Obtained from:		projects/vnet
Reviewed by:		gnn, jhb
Sponsored by:		The FreeBSD Foundation
MFC after:		2 weeks
Differential Revision:	https://reviews.freebsd.org/D6747
2016-06-21 13:48:49 +00:00
Pedro F. Giffuni
268aab1cfc Remove the SIOCSIFALIFETIME_IN6 ioctl.
The SIOCSIFALIFETIME_IN6 provided by the kame project is unused,
it can't really be used safely and has been completely removed from
NetBSD and OpenBSD.

Obtained from:	NetBSD (kern/35897)
PR:		210148 (exp-run)
Reviewed by:	ae, hrs
Relnotes:	yes
Approved by:	re (glebius)
Differential Revision:	https://reviews.freebsd.org/D5491
2016-06-13 22:31:16 +00:00
Andrey V. Elsukov
4c10540274 Cleanup unneded include "opt_ipfw.h".
It was used for conditional build IPFIREWALL_FORWARD support.
But IPFIREWALL_FORWARD option was removed a long time ago.
2016-06-09 05:48:34 +00:00
Bjoern A. Zeeb
74b44794c6 Make KASSERT message more useful by printing the variables on which
we assert.

Obtained from:	projects/vnet
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2016-06-06 22:34:12 +00:00
Bjoern A. Zeeb
99a0c4062d Move the callout_reset() to the end of the work not having it stick
before we do anything.

Obtained from:	projects/vnet
MFC after:	2 week
Sponsored by:	The FreeBSD Foundation
2016-06-06 14:01:09 +00:00
Bjoern A. Zeeb
484149def8 Introduce a per-VNET flag to enable/disable netisr prcessing on that VNET.
Add accessor functions to toggle the state per VNET.
The base system (vnet0) will always enable itself with the normal
registration. We will share the registered protocol handlers in all
VNETs minimising duplication and management.
Upon disabling netisr processing for a VNET drain the netisr queue from
packets for that VNET.

Update netisr consumers to (de)register on a per-VNET start/teardown using
VNET_SYS(UN)INIT functionality.

The change should be transparent for non-VIMAGE kernels.

Reviewed by:	gnn (, hiren)
Obtained from:	projects/vnet
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D6691
2016-06-03 13:57:10 +00:00
George V. Neville-Neil
6d76822688 This change re-adds L2 caching for TCP and UDP, as originally added in D4306
but removed due to other changes in the system. Restore the llentry pointer
to the "struct route", and use it to cache the L2 lookup (ARP or ND6) as
appropriate.

Submitted by:	Mike Karels
Differential Revision:	https://reviews.freebsd.org/D6262
2016-06-02 17:51:29 +00:00
Mark Johnston
1b28988b44 Exploit r301213 to fix in6 ifaddr locking in pfxlist_onlink_check().
Reviewed by:	ae, hrs
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D6639
2016-06-02 17:21:57 +00:00
Mark Johnston
0973ca723c Always start IPv6 DAD asynchronously.
Otherwise we transmit the first neighbour solicitation in the context of the
caller of nd6_dad_start(), which can easily result in lock recursion. When
DAD is to be started after some delay, we send the first NS from the DAD
callout handler, so just change the implementation to do this in the
non-delayed case as well.

Reviewed by:	ae, hrs
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D6639
2016-06-02 17:17:15 +00:00
Bjoern A. Zeeb
3f58662dd9 The pr_destroy field does not allow us to run the teardown code in a
specific order.  VNET_SYSUNINITs however are doing exactly that.
Thus remove the VIMAGE conditional field from the domain(9) protosw
structure and replace it with VNET_SYSUNINITs.
This also allows us to change some order and to make the teardown functions
file local static.
Also convert divert(4) as it uses the same mechanism ip(4) and ip6(4) use
internally.

Slightly reshuffle the SI_SUB_* fields in kernel.h and add a new ones, e.g.,
for pfil consumers (firewalls), partially for this commit and for others
to come.

Reviewed by:		gnn, tuexen (sctp), jhb (kernel.h)
Obtained from:		projects/vnet
MFC after:		2 weeks
X-MFC:			do not remove pr_destroy
Sponsored by:		The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D6652
2016-06-01 10:14:04 +00:00
Michael Tuexen
3d48d25be7 Add PR_CONNREQUIRED for SOCK_STREAM sockets using SCTP.
This is required to signal connetion setup on non-blocking sockets
via becoming writable. This still allows for implicit connection
setup.

MFC after:	1 week
2016-05-30 18:24:23 +00:00
Gleb Smirnoff
6351b3857b Plug route reference underleak that happens with FLOWTABLE after r297225.
Submitted by:	Mike Karels <mike karels.net>
2016-05-27 17:31:02 +00:00
Mark Johnston
65fdf52123 Mark the prefix and default router list sysctl handlers MPSAFE.
MFC after:	2 weeks
2016-05-23 20:18:11 +00:00
Mark Johnston
cc51be7b81 Acquire the nd6 lock in the prefix list sysctl handler.
The nd6 lock will be used to synchronize access to the NDP prefix list.

MFC after:	2 weeks
Tested by:	Jason Wolfe (as part of a larger change)
2016-05-23 20:15:08 +00:00
Andrey V. Elsukov
1d4d43c0e7 Remove ip6 adjusting from the place where pointer couldn't be changed.
And add comment after calling PFIL hooks, where it could be changed.
2016-05-20 12:17:40 +00:00
Andrey V. Elsukov
9fdab4c052 Remove ip6 pointer initialization and strange check from the beginning
of ip6_output(). It isn't used until the first time adjusted.
Remove the comment about adjusting where it is actually initialized.
2016-05-20 12:09:10 +00:00
Mark Johnston
5e0a6f31e5 Move IPv6 malloc tag definitions into the IPv6 code. 2016-05-20 04:45:08 +00:00
Andrey V. Elsukov
f0937b2cf5 Since PFIL can change destination address, use its always actual value
from mbuf when calculating path mtu. Remove now unused finaldst variable.
Also constify dst argument in ip6_getpmtu() and ip6_getpmtu_ctl().

Reviewed by:	melifaro
Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
2016-05-19 12:45:20 +00:00
Andrey V. Elsukov
4ee7e5a63d Call RO_RTFREE() when we have detected the change of destination
address, otherwise the old route will be used with new destination.

MFC after:	1 week
2016-05-17 14:06:55 +00:00
Mark Johnston
c81c183dae Use Node Information flag names instead of hard-coding their values.
MFC after:	1 week
2016-05-15 03:22:13 +00:00
Mark Johnston
82366c228b Add sysctl descriptions for net.inet6.ip6 and net.inet6.icmp6.
icmp6.redirtimeout, icmp6.nd6_maxnudhint and ip6.rr_prune are left
undocumented as they appear to have no effect. Some existing sysctl
descriptions were modified for consistency and style, and the
ip6.tempvltime and ip6.temppltime handlers were rewritten to be a bit
simpler and to avoid setting the sysctl value before validating it.

MFC after:	3 weeks
2016-05-15 03:18:03 +00:00
Mark Johnston
415f6c24dd Remove an always-false error check in the AIFADDR_IN6 handler.
CID:		1250792
MFC after:	1 week
2016-05-15 03:01:40 +00:00
Mark Johnston
df890b8e73 Remove obsolescent comments from nd6_purge().
MFC after:	1 week
2016-05-09 23:43:12 +00:00
Mark Johnston
20b5f02214 Clean up callers of nd6_prelist_add().
nd6_prelist_add() sets *newp if and only if it is successful, so there's no
need for code that handles the case where the return value is 0 and
*newp == NULL. Fix some style bugs in nd6_prelist_add() while here.

MFC after:	1 week
2016-05-07 03:41:29 +00:00
Mark Johnston
83631b16a7 Remove two useless local variables from prelist_update().
MFC after:	1 week
2016-05-07 03:32:29 +00:00
Pedro F. Giffuni
a4641f4eaa sys/net*: minor spelling fixes.
No functional change.
2016-05-03 18:05:43 +00:00
Michael Tuexen
ec70917ffa When a client uses UDP encapsulation and lists IP addresses in the INIT
chunk, enable UDP encapsulation for all those addresses.
This helps clients using a userland stack to support multihoming if
they are not behind a NAT.

MFC after: 1 week
2016-05-01 21:48:55 +00:00
Michael Tuexen
7ae2ff0dba Use correct order of source and destination address and port. 2016-04-29 20:13:35 +00:00
Randall Stewart
abb901c5d7 Complete the UDP tunneling of ICMP msgs to those protocols
interested in having tunneled UDP and finding out about the
ICMP (tested by Michael Tuexen with SCTP.. soon to be using
this feature).

Differential Revision:	http://reviews.freebsd.org/D5875
2016-04-28 15:53:10 +00:00
Conrad Meyer
2769d06203 in_lltable_alloc and in6 copy: Don't leak LLE in error path
Fix a memory leak in error conditions introduced in r292978.

Reported by:	Coverity
CIDs:		1347009, 1347010
Sponsored by:	EMC / Isilon Storage Division
2016-04-26 23:13:48 +00:00
Luiz Otavio O Souza
b0ab3725db Fixes the comment to reflect the code.
Sponsored by:	Rubicon Communications (Netgate)
2016-04-25 23:12:39 +00:00
Pedro F. Giffuni
63b6b7a74a Indentation issues.
Contract some lines leftover from r298310.

Mea culpa.
2016-04-20 16:19:44 +00:00
Pedro F. Giffuni
02abd40029 kernel: use our nitems() macro when it is available through param.h.
No functional change, only trivial cases are done in this sweep,

Discussed in:	freebsd-current
2016-04-19 23:48:27 +00:00
Michael Tuexen
b1deed45e6 Address issues found by the XCode code analyzer. 2016-04-18 20:16:41 +00:00
Michael Tuexen
b9dd6a90b6 Fix the ICMP6 handling for SCTP.
Keep the IPv4 code in sync.

MFC after:	1 week
2016-04-16 21:34:49 +00:00
Pedro F. Giffuni
155d72c498 sys/net* : for pointers replace 0 with NULL.
Mostly cosmetical, no functional change.

Found with devel/coccinelle.
2016-04-15 17:30:33 +00:00
Andrey V. Elsukov
0b519fe44f Fix regression introduced in r296986.
Currently we don't keep zoneid in in6_ifaddr structure, because there
is still some code, that doesn't properly initialize sin6_scope_id,
but some functions use sa_equal() for addresses comparison. sa_equal()
compares full sockaddr_in6 structures and such comparison will fail.
For now use zero zoneid in in6ifa_ifwithaddr(). It is safe, because
used address is in embedded form. In future we will use zoneid, so mark it
with XXX comment.

Reported by:	kp
Tested by:	kp
2016-04-08 11:13:24 +00:00
George V. Neville-Neil
ce223fb715 Unbreak the RSS/PCBGROUp build. 2016-03-31 00:53:23 +00:00
Mark Johnston
47d2d39111 Fix the lladdr copy in in6_lltable_dump_entry() after r292978.
This bug caused "ndp -a" to show the wrong link layer address for neighbour
cache entries.

PR:	208067
2016-03-30 00:03:59 +00:00
Mark Johnston
435bece4c5 Modify nd6_llinfo_timer() to acquire the nd6 lock before the LLE lock.
When expiring a neighbour cache entry we may need to look up the associated
default router, which requires the nd6 read lock. To avoid an LOR, the nd6
lock should be acquired first.

X-MFC-With:	r296063
Tested by:	Larry Rosenman <ler@lerctr.org> (previous revision)
2016-03-29 19:23:00 +00:00
George V. Neville-Neil
84cc0778d0 FreeBSD previously provided route caching for TCP (and UDP). Re-add
route caching for TCP, with some improvements. In particular, invalidate
the route cache if a new route is added, which might be a better match.
The cache is automatically invalidated if the old route is deleted.

Submitted by:	Mike Karels
Reviewed by:	gnn
Differential Revision:	https://reviews.freebsd.org/D4306
2016-03-24 07:54:56 +00:00
Bjoern A. Zeeb
9901091eba Mfp4 @180378:
Factor out nd6 and in6_attach initialization to their own files.
  Also move destruction into those files though still called from
  the central initialization.

  Sponsored by:	CK Software GmbH
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Reviewed by:	gnn
Differential Revision:	https://reviews.freebsd.org/D5033
2016-03-22 15:43:47 +00:00
Mark Johnston
ff63037da1 Modify defrouter_remove() to perform the router lookup before removal.
This allows some simplification of its callers. No functional change
intended.

Tested by:	Larry Rosenman (as part of a larger change)
MFC after:	1 month
2016-03-17 19:01:44 +00:00
Andrey V. Elsukov
3e16fab37b Reduce the number of local variables. Remove redundant check that inp
pointer isn't NULL, it is safe, because we are handling IPV6_PKTINFO
socket option in this block of code. Also, use in6ifa_withaddr() instead
of ifa_withaddr().
2016-03-17 11:10:44 +00:00