Commit Graph

2055 Commits

Author SHA1 Message Date
Bjoern A. Zeeb
5f3e375ed8 in6_fib: return nh_ia in the ext interface as we do for IPv4
Like for IPv4 add nh_ia to the ext interface and return rt_ifa
in order to be used for, e.g., packet/octets accounting purposes.

Reviewed by:	melifaro
MFC after:	1 week
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D23873
2020-03-03 09:50:33 +00:00
Bjoern A. Zeeb
f6428cdb1f fib6_rte_to_nh_*: return a link-local gw address with scope embedded
In fib6_rte_to_nh_* when returning a link-local gateway address
currently we do clear the scope. That could be recovered using
the ifp returned as well, but the code in general seems to
expect a link-local address with scope embeedded as otherwise
the "dst" (gw) passed to the output routines will not include
scope and not send the packet out (the right interface).

Do not clear the scope when returning a link-local address and
allow packets to go out (the right interface).

Remove the (now) extra scope recovery in the IPv6 fast-fwd code.

Sponsored by:	Netflix
Reviewed by:	melifaro, ae
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D23872
2020-03-03 09:45:16 +00:00
Bjoern A. Zeeb
f1db666a61 mld6: initialize oifp to avoid bogus results/panics in edge cases
In certain cases (probably not during normal operation but observed in
the lab during development) ip6_ouput() could return without error
and ifpp (&oifp) not updated.
Given oifp was never initialized we would take the later branch
as oifp was not NULL, and when calling icmp6_ifstat_inc() we would
panic dereferencing a garbage pointer.
For code stability initialize oifp to NULL before first use to always
have a deterministic value and not rely on a called function to behave
and always and for ever do the work for us as we hope for.

MFC after:	3 days
Sponsored by:	Netflix
2020-02-28 11:16:41 +00:00
Pawel Biernacki
7029da5c36 Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE.  All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by:	kib (mentor, blanket)
Commented by:	kib, gallatin, melifaro
Differential Revision:	https://reviews.freebsd.org/D23718
2020-02-26 14:26:36 +00:00
Bjoern A. Zeeb
3db6053160 ip6_output: fix regression introduced in r358167 for ipv6 fragmentation
When moving the calculations for the optlen into the if (opt) block
which deals with possible extension headers I failed to initialise
unfragpartlen to the ipv6 header length if there were no extension
headers present.  Correct that mistake to make IPv6 fragment length
calculcations work again.

Reported by:	hselasky, kp
OKed by:	hselasky, kp
MFC after:	3 days
X-MFC with:	r358167
PR:		244393
2020-02-25 15:03:41 +00:00
Bjoern A. Zeeb
3459050c9a Fix IPv6 checksums when exthdrs are present.
In two places in ip6_output we are doing (delayed) checksum calculations.
The initial logic came from SCTP in r205075,205104 and later I copied
and adjusted it for the TCP|UDP case in r235958.
The problem was that the original SCTP offsets were already wrong for any
case with extension headers present given IPv6 extension headers are not
part of the pseudo checksum calculations.
The later changes do not help in case there is checksum offloading as for
extension headers (incl. fragments) we do currrently never offload as we
have no infrastructure to know whether the NIC can handle these cases.

Correct the offsets for delayed checksum calculations and properly handle
mbuf flags.  In addition harmonize the almost identical duplicate code.

While here eliminate the now unneeded variable hlen and add an always
missing mtod() call in the 1-b and 3 cases after the introduction of
the mb_unmapped_to_ext() calls.

Reported by:	Francis Dupont (fdupont isc.org)
PR:		243675
MFC after:	6 days
Reviewed by:	markj (earlier version), gallatin
Differential Revision:	https://reviews.freebsd.org/D23760
2020-02-24 19:12:20 +00:00
Pawel Biernacki
295a18d184 Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (14 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Approved by:	kib (mentor, blanket)
Differential Revision:	https://reviews.freebsd.org/D23639
2020-02-24 10:47:18 +00:00
Bjoern A. Zeeb
a1a6c01e41 ip6_output: improve extension header handling
Move IPv6 source address checks from after extension header heandling
to the top of the function. If we do not pass these checks there is
no reason to do a lot of work upfront.

Fold extension header preparations and length calculations together into
a single branch and macro rather than doing them sequentially.
Likewise move extension header concatination into a single branch block
only doing it if we recorded any extension header length length.

Reviewed by:	melifaro (earlier version), markj, gallatin
Sponsored by:	Netflix (partially, originally)
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D23740
2020-02-20 10:56:12 +00:00
Michael Tuexen
868b51f234 Epochify SCTP. 2020-02-18 21:25:17 +00:00
Bjoern A. Zeeb
7c1daefe2c ip6_output: update comments.
Clear up some comments and improve to panic messages.

No functional changes.

MFC after:	3 days
2020-02-18 11:28:00 +00:00
Hans Petter Selasky
bacb11c9ed Fix kernel panic while trying to read multicast stream.
When VIMAGE is enabled make sure the "m_pkthdr.rcvif" pointer is set
for all mbufs being input by the IGMP/MLD6 code. Else there will be a
NULL-pointer dereference in the netisr code when trying to set the
VNET based on the incoming mbuf. Add an assert to catch this when
queueing mbufs on a netisr to make debugging of similar cases easier.

Found by:	Vladislav V. Prodan
PR:		244002
Reviewed by:	bz@
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-02-17 09:46:32 +00:00
Navdeep Parhar
c53c867eb3 Fix NOINET builds. 2020-01-31 02:23:48 +00:00
Gleb Smirnoff
e617b21d2f Enter network epoch when calling in_pcbconnect() for IPv6 mapped to IPv4
UDP sockets.  This is miss from r356983.

Reported by:	https://syzkaller.appspot.com/bug?id=73c7a2e3f0783f9947459065e5c2f25fe8f82f54
2020-01-22 17:06:55 +00:00
Alexander V. Chernikov
34a5582c47 Bring back redirect route expiration.
Redirect (and temporal) route expiration was broken a while ago.
This change brings route expiration back, with unified IPv4/IPv6 handling code.

It introduces net.inet.icmp.redirtimeout sysctl, allowing to set
 an expiration time for redirected routes. It defaults to 10 minutes,
 analogues with net.inet6.icmp6.redirtimeout.

Implementation uses separate file, route_temporal.c, as route.c is already
 bloated with tons of different functions.
Internally, expiration is implemented as an per-rnh callout scheduled when
 route with non-zero rt_expire time is added or rt_expire is changed.
 It does not add any overhead when no temporal routes are present.

Callout traverses entire routing tree under wlock, scheduling expired routes
 for deletion and calculating the next time it needs to be run. The rationale
 for such implemention is the following: typically workloads requiring large
 amount of routes have redirects turned off already, while the systems with
 small amount of routes will not inhibit large overhead during tree traversal.

This changes also fixes netstat -rn display of route expiration time, which
 has been broken since the conversion from kread() to sysctl.

Reviewed by:	bz
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D23075
2020-01-22 13:53:18 +00:00
Gleb Smirnoff
b955545386 Make ip6_output() and ip_output() require network epoch.
All callers that before may called into these functions
without network epoch now must enter it.
2020-01-22 05:51:22 +00:00
Gleb Smirnoff
bab98355f9 Add some documenting NET_EPOCH_ASSERTs. 2020-01-22 02:37:47 +00:00
Gleb Smirnoff
f6a2a6b163 Unroll macro that is used just once. Not a functional change. 2020-01-22 02:35:39 +00:00
Alexander V. Chernikov
16c2f24169 Document requirements for the 'struct route' variations.
MFC after:	2 weeks
2020-01-21 12:00:34 +00:00
Gleb Smirnoff
2a4bd982d0 Introduce NET_EPOCH_CALL() macro and use it everywhere where we free
data based on the network epoch.   The macro reverses the argument
order of epoch_call(9) - first function, then its argument. NFC
2020-01-15 06:05:20 +00:00
Gleb Smirnoff
97168be809 Mechanically substitute assertion of in_epoch(net_epoch_preempt) to
NET_EPOCH_ASSERT(). NFC
2020-01-15 05:45:27 +00:00
Michael Tuexen
fe1274ee39 Fix race when accepting TCP connections.
When expanding a SYN-cache entry to a socket/inp a two step approach was
taken:
1) The local address was filled in, then the inp was added to the hash
   table.
2) The remote address was filled in and the inp was relocated in the
   hash table.
Before the epoch changes, a write lock was held when this happens and
the code looking up entries was holding a corresponding read lock.
Since the read lock is gone away after the introduction of the
epochs, the half populated inp was found during lookup.
This resulted in processing TCP segments in the context of the wrong
TCP connection.
This patch changes the above procedure in a way that the inp is fully
populated before inserted into the hash table.

Thanks to Paul <devgs@ukr.net> for reporting the issue on the net@
mailing list and for testing the patch!

Reviewed by:		rrs@
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D22971
2020-01-12 17:52:32 +00:00
Bjoern A. Zeeb
c6feea3b89 nd6_rtr: constantly use __func__ for nd6log()
Over time one or two hard coded function names did not match the
actual function anymore.  Consistently use __func__ for nd6log() calls
and re-wrap/re-format some messages for consitency.

MFC after:	2 weeks
2020-01-12 17:41:09 +00:00
Bjoern A. Zeeb
25ebfe3350 nd6_rtr: make nd6_prefix_onlink() static
nd6_prefix_onlink() is not used anywhere outside nd6_rtr.c.  Stop
exporting it and make it file local static.
2020-01-12 16:58:21 +00:00
Bjoern A. Zeeb
e1891232fc in6_mcast: make in6_joingroup_locked() static
in6_joingroup_locked() is only used file-local. No need to export it
hance make it static.
2020-01-11 18:55:12 +00:00
Alexander V. Chernikov
ead85fe415 Add fibnum, family and vnet pointer to each rib head.
Having metadata such as fibnum or vnet in the struct rib_head
 is handy as it eases building functionality in the routing space.
This change is required to properly bring back route redirect support.

Reviewed by:	bz
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D23047
2020-01-09 17:21:00 +00:00
Bjoern A. Zeeb
334fc5822b vnet: virtualise more network stack sysctls.
Virtualise tcp_always_keepalive, TCP and UDP log_in_vain.  All three are
set in the netoptions startup script, which we would love to run for VNETs
as well [1].

While virtualising the log_in_vain sysctls seems pointles at first for as
long as the kernel message buffer is not virtualised, it at least allows
an administrator to debug the base system or an individual jail if needed
without turning the logging on for all jails running on a system.

PR:		243193 [1]
MFC after:	2 weeks
2020-01-08 23:30:26 +00:00
Alexander V. Chernikov
e02d3fe70c Fix rtsock route message generation for interface addresses.
Reviewed by:	olivier
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D22974
2020-01-07 21:16:30 +00:00
Gleb Smirnoff
e00ee1a9f4 In r343631 error code for a packet blocked by a firewall was
changed from EACCES to EPERM.  This change was not intentional,
so fix that.  Return EACCESS if a firewall forbids sending.

Noticed by:	ae
2020-01-01 17:32:20 +00:00
Alexander V. Chernikov
bdb214a4a4 Remove useless code from in6_rmx.c
The code in questions walks IPv6 tree every 60 seconds and looks into
 the routes with non-zero expiration time (typically, redirected routes).
For each such route it sets RTF_PROBEMTU flag at the expiration time.
No other part of the kernel checks for RTF_PROBEMTU flag.

RTF_PROBEMTU was defined 21 years ago, 30 Jun 1999, as RTF_PROTO1.
RTF_PROTO1 is a de-facto standard indication of a route installed
 by a routing daemon for a last decade.

Reviewed by:	bz, ae
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D22865
2019-12-18 22:10:56 +00:00
Hans Petter Selasky
a4c5668d12 Leave multicast group before reaping and committing state for both
IPv4 and IPv6.

This fixes a regression issue after r349369. When trying to exit a
multicast group before closing the socket, a multicast leave packet
should be sent.

Differential Revision:	https://reviews.freebsd.org/D22848
PR: 242677
Reviewed by:	bz (network)
Tested by:	Aleksandr Fedorov <aleksandr.fedorov@itglobal.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2019-12-18 12:06:34 +00:00
Bjoern A. Zeeb
74ff87cd16 Update comment.
Update the comment related to SIIT and v4mapped addresses being rejected
by us when coming from the wire given we have supported IPv6-only kernels
for a few years now.
See also draft-itojun-v6ops-v4mapped-harmful.

Suggested by:	melifaro
MFC after:	2 weeks
2019-12-06 16:53:42 +00:00
Bjoern A. Zeeb
b745e7623c ip6_input: remove redundant v4mapped check
In ip6_input() we apply the same v4mapped address check twice. The only
case which skipps the first one is M_FASTFWD_OURS which should have passed
the check on the firstinput pass and passed the firewall.
Remove the 2nd redundant check.

Reviewed by:	kp, melifaro
MFC after:	2 weeks
Sponsored by:	Netflix (originally)
Differential Revision:	https://reviews.freebsd.org/D22462
2019-12-06 16:42:58 +00:00
Kristof Provost
200424235e Remove useless NULL check
Coverity points out that we've already dereferenced m by the time we check, so
there's no reason to keep the check. Moreover, it's safe to pass NULL to
m_freem() anyway.

CID:		1019092
2019-12-05 16:50:54 +00:00
Bjoern A. Zeeb
0700d2c3f0 Make icmp6_reflect() static.
icmp6_reflect() is not used anywhere outside icmp6.c, no reason to
export it.

Sponsored by:	Netflix
2019-12-03 14:46:38 +00:00
Hans Petter Selasky
5b64b824b9 Use refcount from "in_joingroup_locked()" when joining multicast
groups. Do not acquire additional references. This makes the IPv4 IGMP
code in line with the IPv6 MLD code.

Background:
The IPv4 multicast code puts an extra reference on the in_multi struct
when joining groups.  This becomes visible when using daemons like
igmpproxy from ports, that multicast entries do not disappear from the
output of ifmcstat(8) when multicast streams are disconnected.

This fixes a regression issue after r349762.

While at it factor the ip_mfilter_insert() and ip6_mfilter_insert() calls
to avoid repeated "is_new" check.

Differential Revision:	https://reviews.freebsd.org/D22595
Tested by:	Guido van Rooij <guido@gvr.org>
Reviewed by:	rgrimes (network)
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2019-12-03 08:46:59 +00:00
Michael Tuexen
e25b0dab9a Update the hostcache also for PTB messages received for SCTP/IPv6.
The corresponding code for SCTP/IPv4 was introduced in
https://svnweb.freebsd.org/base?view=revision&revision=317597

Submitted by:		Julius Flohr
MFC after:		1 week
Differential Revision:	https://reviews.freebsd.org/D22605
2019-12-01 16:14:44 +00:00
Bjoern A. Zeeb
a4adf6cc65 Fix m_pullup() problem after removing PULLDOWN_TESTs and KAME EXT_*macros.
r354748-354750 replaced the KAME macros with m_pulldown() calls.
Contrary to the rest of the network stack m_len checks before m_pulldown()
were not put in placed (see r354748).
Put these m_len checks in place for now (to go along with the style of the
network stack since the initial commits).  These are not put in for
performance but to avoid an error scenario (even though it also will help
performance at the moment as it avoid allocating an extra mbuf; not because
of the unconditional function call).

The observed error case went like this:
(1) an mbuf with M_EXT arrives and we call m_pullup() unconditionally on it.
(2) m_pullup() will call m_get() unless the requested length is larger than
MHLEN (in which case it'll m_freem() the perfectly fine mbuf) and migrate the
requested length of data and pkthdr into the new mbuf.
(3) If m_get() succeeds, a further m_pullup() call going over MHLEN will fail.
This was observed with failing auto-configuration as an RA packet of
200 bytes exceeded MHLEN and the m_pullup() called from nd6_ra_input()
dropped the mbuf.
(Re-)adding the m_len checks before m_pullup() calls avoids this problems
with mbufs using external storage for now.

MFC after:	3 weeks
Sponsored by:	Netflix
2019-12-01 00:22:04 +00:00
Ryan Libby
6afe56f9c3 in6_joingroup_locked: need if_addr_lock around in6m_disconnect_locked
It looks like the call that requires the lock was introduced in r337866.

Reviewed by:	hselasky
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D20739
2019-11-25 22:25:10 +00:00
Bjoern A. Zeeb
f8d4f9bce9 in6: move include
Move the include for sysctl.h out of the middle of the file to the
includes at the beginning.  This is will make it easier to add new
sysctls.

No functional changes.

MFC after:	3 weeks
Sponsored by:	Netflix
2019-11-19 21:14:15 +00:00
Bjoern A. Zeeb
3c5018ca10 nd6: sysctl
Move the SYSCTL_DECL to the top of the file.  Move the sysctl function
before SYSCTL_PROC so that we don't need an extra function declaration in
the middle of the file.

No functional changes.

MFC after:	3 weeks
Sponsored by:	Netflix
2019-11-19 21:08:18 +00:00
Bjoern A. Zeeb
6db6527385 nd6: make nd6_timer_ch static
nd6_timer_ch is only used in file local context.  There is no need to
export it, so make it static.

MFC after:	3 weeks
Sponsored by:	Netflix
2019-11-19 20:54:17 +00:00
Bjoern A. Zeeb
f77a6dbd1e nd6_rtr: re-sort functions
Resort functions within file in a way that they depend on each other as
that makes it easier to rework various things.
Also allows us to remove file local function declarations.

No functional changes.

MFC after:	3 weeks
Sponsored by:	Netflix
2019-11-19 20:34:33 +00:00
Bjoern A. Zeeb
b2b7a4b2ca mld: fix epoch assertion
in6ifa_ifpforlinklocal() asserts the net epoch.  The test case from r354832
revealed code paths where we call into the function without having
acquired the net epoch first and consequently we hit the assert.
This happens in certain MLD states during VNET shutdown and most people
normaly not notice this.

For correctness acquire the net epoch around calls to
mld_v1_transmit_report() in all cases to avoid the assertion firing.

MFC after:	2 weeks
Sponsored by:	Netflix
2019-11-19 14:53:13 +00:00
Bjoern A. Zeeb
32af08ecad icmpv6: Fix mbuf change in mld
After r354748 mld_input() can change the mbuf.  The new pointer
is never returned to icmp6_input() and when passed to
icmp6_rip6_input() the mbuf may no longer valid leading to
a panic.
Pass a pointer to the mbuf to mld_input() so we can return an
updated version in the non-error case.

Add a test sending an MLD packet case which will trigger this bug.

Pointyhat to:	bz
Reported by:	gallatin, thj
MFC After:	2 weeks
X-MFC with:	r354748
Sponsored by:	Netflix
2019-11-18 21:59:47 +00:00
Bjoern A. Zeeb
808c432f62 nd6: retire defrouter_select(), use _fib() variant.
Burn bridges and replace the last two calls of defrouter_select() with
defrouter_select_fib().  That allows us to retire defrouter_select()
and make it more clear in the calling code that it applies to all FIBs.

Sponsored by:	Netflix
2019-11-16 00:17:35 +00:00
Bjoern A. Zeeb
f592d0c377 nd6_rtr:
Pull in the TAILQ_HEAD() as it is not needed outside nd6_rtr.c.
Rename the TAILQ_HEAD() struct and the nd_defrouter variable from
"nd_" to "nd6_" as they are not part of the RFC 3542 API which uses "ND_".

Ideally I'd like to also rename the struct nd_defrouter {} to "nd6_*"
but given that is used externally there is more work to do.

No functional changes.

MFC after:	3 weeks
Sponsored by:	Netflix
2019-11-16 00:02:36 +00:00
Bjoern A. Zeeb
63abacc204 netinet*: replace IP6_EXTHDR_GET()
In a few places we have IP6_EXTHDR_GET() left in upper layer protocols.
The IP6_EXTHDR_GET() macro might perform an m_pulldown() in case the data
fragment is not contiguous.

Convert these last remaining instances into m_pullup()s instead.
In CARP, for example, we will a few lines later call m_pullup() anyway,
the IPsec code coming from OpenBSD would otherwise have done the m_pullup()
and are copying the data a bit later anyway, so pulling it in seems no
better or worse.

Note: this leaves very few m_pulldown() cases behind in the tree and we
might want to consider removing them as well to make mbuf management
easier again on a path to variable size mbufs, especially given
m_pulldown() still has an issue not re-checking M_WRITEABLE().

Reviewed by:	gallatin
MFC after:	8 weeks
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D22335
2019-11-15 21:44:17 +00:00
Bjoern A. Zeeb
a61b5cfbbf netinet6: Remove PULLDOWN_TESTs.
Remove the KAME introduced PULLDOWN_TESTs which did not even
have a compile-time option in sys/conf to turn them on for a
custom kernel build. They made the code a lot harder to read
or more complicated in a few cases.

Convert the IP6_EXTHDR_CHECK() calls into FreeBSD looking code.
Rather than throwing the packet away if it would not fit the
KAME mbuf expectations, convert the macros to m_pullup() calls.
Do not do any extra manual conditional checks upfront as to
whether the m_len would suffice (*), simply let m_pullup() do
its work (incl. an early check).

Remove extra m_pullup() calls where earlier in the function or
the only caller has already done the pullup.

Discussed with:	rwatson (*)
Reviewed by:	ae
MFC after:	8 weeks
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D22334
2019-11-15 21:40:40 +00:00
Bjoern A. Zeeb
e20b5bc485 nd6: simplify code
We are taking the same actions in both cases of the branch inside the block.
Simplify that code as the extra branch is not needed.

MFC after:	3 weeks
Sponsored by:	Netflix
2019-11-15 13:45:38 +00:00
Bjoern A. Zeeb
b3a25d2993 nd6: remove unused structs and defines
Remove a collections of unused structs and #defines to make it easier
to understand what is actually in use.

Sponsored by:	Netflix
2019-11-13 14:28:07 +00:00
Bjoern A. Zeeb
d64df9a2b2 nd6: make nd6_alloc() file static
nd6_alloc() is a function used only locally.  Make it static and no
longer export it.  Keeps the KPI smaller.

Sponsored by:	Netflix
2019-11-13 13:53:17 +00:00
Bjoern A. Zeeb
ad675b3279 nd6 defrouter: consolidate nd_defrouter manipulations in nd6_rtr.c
Move the nd_defrouter along with the sysctl handler from nd6.c to
nd6_rtr.c and make the variable file static.  Provide (temporary)
new accessor functions for code manipulating nd_defrouter from nd6.c,
and stop exporting functions no longer needed outside nd6_rtr.c.
This also shuffles a few functions around in nd6_rtr.c without
functional changes.

Given all nd_defrouter logic is now in one place we can tidy up the
code, locking and, and other open items.

MFC after:	3 weeks
X-MFC:		keep exporting the functions
Sponsored by:	Netflix
2019-11-13 12:05:48 +00:00
Bjoern A. Zeeb
a8fe77d877 netinet*: update *mp to pass the proper value back
In ip6_[direct_]input() we are looping over the extension headers
to deal with the next header.  We pass a pointer to an mbuf pointer
to the handling functions.  In certain cases the mbuf can be updated
there and we need to pass the new one back.  That missing in
dest6_input() and route6_input().  In tcp6_input() we should also
update it before we call tcp_input().

In addition to that mark the mbuf NULL all the times when we return
that we are done with handling the packet and no next header should
be checked (IPPROTO_DONE).  This will eventually allow us to assert
proper behaviour and catch the above kind of errors more easily,
expecting *mp to always be set.

This change is extracted from a larger patch and not an exhaustive
change across the entire stack yet.

PR:			240135
Reported by:		prabhakar.lakhera gmail.com
MFC after:		3 weeks
Sponsored by:		Netflix
2019-11-12 15:46:28 +00:00
Gleb Smirnoff
c17cd08f53 It is unclear why in6_pcblookup_local() would require write access
to the PCB hash.  The function doesn't modify the hash. It always
asserted write lock historically, but with epoch conversion this
fails in some special cases.

Reviewed by:	rwatson, bz
Reported-by:	syzbot+0b0488ca537e20cb2429@syzkaller.appspotmail.com
2019-11-11 06:28:25 +00:00
Bjoern A. Zeeb
c1131de6f1 frag6: properly handle atomic fragments according to RFCs.
RFC 8200 says:
	"If the fragment is a whole datagram (that is, both the Fragment
         Offset field and the M flag are zero), then it does not need
         any further reassembly and should be processed as a fully
         reassembled packet (i.e., updating Next Header, adjust Payload
         Length, removing the Fragment header, etc.).  .."

That means we should remove the fragment header and make all the adjustments
rather than just skipping over the fragment header.  The difference should
be noticeable in that a properly handled atomic fragment triggering an ICMPv6
message at an upper layer (e.g. dest unreach, unreachable port) will not
include the fragment header.

Update the test cases to also test for an unfragmentable part.  That is
needed so that the next header is properly updated (not just lengths).

MFC after:	3 weeks
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D22155
2019-11-08 14:36:44 +00:00
Gleb Smirnoff
2435e507de Now with epoch synchronized PCB lookup tables we can greatly simplify
locking in udp_output() and udp6_output().

First, we select if we need read or write lock in PCB itself, we take
the lock and enter network epoch.  Then, we proceed for the rest of
the function.  In case if we need to modify PCB hash, we would take
write lock on it for a short piece of code.

We could exit the epoch before allocating an mbuf, but with this
patch we are keeping it all the way into ip_output()/ip6_output().
Today this creates an epoch recursion, since ip_output() enters epoch
itself.  However, once all protocols are reviewed, ip_output() and
ip6_output() would require epoch instead of entering it.

Note: I'm not 100% sure that in udp6_output() the epoch is required.
We don't do PCB hash lookup for a bound socket.  And all branches of
in6_select_src() don't require epoch, at least they lack assertions.
Today inet6 address list is protected by rmlock, although it is CKLIST.
AFAIU, the future plan is to protect it by network epoch.  That would
require epoch in in6_select_src().  Anyway, in future ip6_output()
would require epoch, udp6_output() would need to enter it.
2019-11-07 21:01:36 +00:00
Gleb Smirnoff
d797164a86 Since r353292 on input path we are always in network epoch, when
we lookup PCBs.  Thus, do not enter epoch recursively in
in_pcblookup_hash() and in6_pcblookup_hash().  Same applies to
tcp_ctlinput() and tcp6_ctlinput().

This leaves several sysctl(9) handlers that return PCB credentials
unprotected.  Add epoch enter/exit to all of them.

Differential Revision:	https://reviews.freebsd.org/D22197
2019-11-07 20:49:56 +00:00
Gleb Smirnoff
cf377af6e2 Remove unnecessary recursive epoch enter via INP_INFO_RLOCK
macro in icmp6_rip6_input().  It shall always run in the
network epoch.
2019-11-07 20:43:12 +00:00
Gleb Smirnoff
f42347c39a Remove unnecessary recursive epoch enter via INP_INFO_RLOCK
macro in raw input functions for IPv4 and IPv6.  They shall
always run in the network epoch.
2019-11-07 20:40:44 +00:00
Gleb Smirnoff
8d28524a90 Remove unnecessary recursive epoch enter via INP_INFO_RLOCK
macro in udp6_input().  It shall always run in the network epoch.
2019-11-07 20:38:53 +00:00
Bjoern A. Zeeb
503f4e4736 netinet*: variable cleanup
In preparation for another change factor out various variable cleanups.
These mainly include:
(1) do not assign values to variables during declaration:  this makes
    the code more readable and does allow for better grouping of
    variable declarations,
(2) do not assign values to variables before need; e.g., if a variable
    is only used in the 2nd half of a function and we have multiple
    return paths before that, then do not set it before it is needed, and
(3) try to avoid assigning the same value multiple times.

MFC after:	3 weeks
Sponsored by:	Netflix
2019-11-07 18:29:51 +00:00
Gleb Smirnoff
751d8d156a Widen network epoch coverage in nd6_prefix_onlink() as
in6ifa_ifpforlinklocal() requires the epoch.

Reported by:	bz
Reviewed by:	bz
2019-11-07 17:00:20 +00:00
Gleb Smirnoff
d6dbfed81e In nd6_timer() enter the network epoch earlier. The defrouter_del() may
call into leaf functions that require epoch.  Since the function is already
run in non-sleepable context, it should be safe to cover it whole with epoch.

Reported by:	syzcaller
2019-11-04 17:35:37 +00:00
Bjoern A. Zeeb
6e6b5143f5 Properly set VNET when nuking recvif from fragment queues.
In theory the eventhandler invoke should be in the same VNET as
the the current interface. We however cannot guarantee that for
all cases in the future.

So before checking if the fragmentation handling for this VNET
is active, switch the VNET to the VNET of the interface to always
get the one we want.

Reviewed by:	hselasky
MFC after:	3 weeks
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D22153
2019-10-25 18:54:06 +00:00
Bjoern A. Zeeb
702828f643 frag6: do not leak counter in error cases
When allocating the IPv6 fragement packet queue entry we do checks
against counters and if we pass we increment one of the counters
to claim the spot.  Right after that we have two cases (malloc and MAC)
which can both fail in which case we free the entry but never released
our claim on the counter.  In theory this can lead to not accepting new
fragments after a long time, especially if it would be MAC "refusing"
them.
Rather than immediately subtracting the value in the error case, only
increment it after these two cases so we can no longer leak it.

MFC after:	3 weeks
Sponsored by:	Netflix
2019-10-25 16:29:09 +00:00
Bjoern A. Zeeb
619456bb59 frag6: prevent overwriting initial fragoff=0 packet meta-data.
When we receive the packet with the first fragmented part (fragoff=0)
we remember the length of the unfragmentable part and the next header
(and should probably also remember ECN) as meta-data on the reassembly
queue.
Someone replying this packet so far could change these 2 (3) values.
While changing the next header seems more severe, for a full size
fragmented UDP packet, for example, adding an extension header to the
unfragmentable part would go unnoticed (as the framented part would be
considered an exact duplicate) but make reassembly fail.
So do not allow updating the meta-data after we have seen the first
fragmented part anymore.

The frag6_20 test case is added which failed before triggering an
ICMPv6 "param prob" due to the check for each queued fragment for
a max-size violation if a fragoff=0 packet was received.

MFC after:	3 weeks
Sponsored by:	Netflix
2019-10-24 22:07:45 +00:00
Bjoern A. Zeeb
cd188da20f frag6: handling of overlapping fragments to conform to RFC 8200
While the comment was updated in r350746, the code was not.
RFC8200 says that unless fragment overlaps are exact (same fragment
twice) not only the current fragment but the entire reassembly queue
for this packet must be silently discarded, which we now do if
fragment offset and fragment length do not match.

Obtained from:	jtl
MFC after:	3 weeks
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D16850
2019-10-24 20:22:52 +00:00
Michael Tuexen
4a91aa8fc9 Ensure that the flags indicating IPv4/IPv6 are not changed by failing
bind() calls. This would lead to inconsistent state resulting in a panic.
A fix for stable/11 was committed in
https://svnweb.freebsd.org/base?view=revision&revision=338986
An accelerated MFC is planned as discussed with emaste@.

Reported by:		syzbot+2609a378d89264ff5a42@syzkaller.appspotmail.com
Obtained from:		jtl@
MFC after:		1 day
Sponsored by:		Netflix, Inc.
2019-10-24 20:05:10 +00:00
Bjoern A. Zeeb
53707abd41 frag6: export another counter read-only by sysctl
Similar to the system global counter also export the per-VNET counter
"frag6_nfragpackets" detailing the current number of fragment packets
in this VNET's reassembly queues.
The read-only counter is helpful for in-VNET statistical monitoring and
for test-cases.

MFC after:	3 weeks
Sponsored by:	Netflix
2019-10-24 20:00:37 +00:00
Bjoern A. Zeeb
dda02192f9 frag6: fix counter leak in error case and optimise code
In case the first fragmented part (off=0) arrives we check for the
maximum packet size for each fragmented part we already queued with the
addition of the unfragmentable part from the first one.

For one we do not have to enter the loop at all if this is the first
fragmented part to arrive, and we can skip the check.

Should we encounter an error case we send an ICMPv6 message for any
fragment exceeding the maximum length limit.  While dequeueing the
original packet and freeing it, statistics were not updated and leaked
both the reassembly queue count for the fragment and the global
fragment count.  Found by code inspection and confirmed by tightening
test cases checking more statistical and system counters.

While here properly wrap a line.

MFC after:	3 weeks
Sponsored by:	Netflix
2019-10-24 19:57:18 +00:00
Bjoern A. Zeeb
e5fffe9a69 frag6.c: do not leak packet queue entry in error case
When we are checking for the maximum reassembled packet size of the
fragmentable part and run into the error case (packet too big),
we are leaking the packet queue enntry if this was a first fragment
to arrive.
Properly cleanup, removing the queue entry from the bucket, decrementing
counters, and freeing the memory.

MFC after:	3 weeks
Sponsored by:	Netflix
2019-10-24 19:47:32 +00:00
Bjoern A. Zeeb
30809ba9e3 frag6: leave a note about upper layer header checks TBD
Per sepcification the upper layer header needs to be within the first
fragment.  The check was not done so far and there is an open review for
related work, so just leave a note as to where to put it.
Move the extraction of frag offset up to this as it is needed to determine
whether this is a first fragment or not.

MFC after:	3 weeks
Sponsored by:	Netflix
2019-10-24 12:16:15 +00:00
Bjoern A. Zeeb
7715d794ef frag6: check global limits before hash and lock
Check whether we are accepting more fragments (based on global limits)
before doing expensive operations of calculating the hash and taking the
bucket lock.   This slightly increases a "race" between check time and
incrementing counters (which is already there) possibly allowing a few
more fragments than the maximum limits.  However, when under attack,
we rather save this CPU time for other packets/work.

MFC after:		3 weeks
Sponsored by:		Netflix
2019-10-24 11:58:24 +00:00
Bjoern A. Zeeb
efdfee93c0 frag6: small improvements
Rather than walking the mbuf chain manually use m_last() which doing
exactly that for us.
Defer initializing srcifp for longer as there are multiple exit paths
out of the function which do not need it set.  Initialize before taking
the lock though.
Rename the mtx lock to match the type better.

MFC after:	3 weeks
Sponsored by:	Netflix
2019-10-24 08:15:40 +00:00
Bjoern A. Zeeb
da89a0fe94 frag6: remove IP6_REASS_MBUF macro
The IP6_REASS_MBUF() macro did some pointer gynmastics to end up with the
same type as it gets in [*(cast **)&].  Spelling it out instead saves all
this and makes the code more readable and less obfuscated directly using
the structure field.

MFC after:	3 weeks
Sponsored by:	Netflix
2019-10-24 07:53:10 +00:00
Bjoern A. Zeeb
f1664f3258 frag6: add "big picture"
Add some ASCII relation of how the bits plug together.  The terminology
difference of "fragmented packets" and "fragment packets" is subtle.
While here clear up more whitespace and comments.

No functional change.

MFC after:	3 weeks
Sponsored by:	Netflix
2019-10-23 23:10:12 +00:00
Bjoern A. Zeeb
21f08a074d frag6: replace KAME hand-rolled queues with queue(9) TAILQs
Remove the KAME custom circular queue for fragments and fragmented packets
and replace them with a standard TAILQ.
This make the code a lot more understandable and maintainable and removes
further hand-rolled code from the the tree using a standard interface instead.

Hide the still public structures under #ifdef _KERNEL as there is no
use for them in user space.
The naming is a bit confusing now as struct ip6q and the ip6q[] buckets
array are not the same anymore;  sadly struct ip6q is also used by the
MAC framework and we cannot rename it.

Submitted by:	jtl (initally)
MFC after:	3 weeks
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D16847 (jtl's original)
2019-10-23 23:01:18 +00:00
Bjoern A. Zeeb
3c7165b35e frag6: whitespace changes
Remove trailing white space, add a blank line, and compress a comment.
No functional changes.

MFC after:	10 days
Sponsored by:	Netflix
2019-10-23 20:37:15 +00:00
Gleb Smirnoff
be0c32e2ff Execute nd6_dad_timer() in the network epoch, since nd6_dad_duplicated()
requires it.
Make nd6_dad_starttimer() require network epoch.  Two calls out of three
happen from nd6_dad_timer().  Enter epoch in the remaining one.
2019-10-22 16:06:33 +00:00
Bjoern A. Zeeb
67a10c4644 frag6: fix vnet teardown leak
When shutting down a VNET we did not cleanup the fragmentation hashes.
This has multiple problems: (1) leak memory but also (2) leak on the
global counters, which might eventually lead to a problem on a system
starting and stopping a lot of vnets and dealing with a lot of IPv6
fragments that the counters/limits would be exhausted and processing
would no longer take place.

Unfortunately we do not have a useable variable to indicate when
per-VNET initialization of frag6 has happened (or when destroy happened)
so introduce a boolean to flag this. This is needed here as well as
it was in r353635 for ip_reass.c in order to avoid tripping over the
already destroyed locks if interfaces go away after the frag6 destroy.

While splitting things up convert the TRY_LOCK to a LOCK operation in
now frag6_drain_one().  The try-lock was derived from a manual hand-rolled
implementation and carried forward all the time.  We no longer can afford
not to get the lock as that would mean we would continue to leak memory.

Assert that all the buckets are empty before destroying to lock to
ensure long-term stability of a clean shutdown.

Reported by:	hselasky
Reviewed by:	hselasky
MFC after:	3 weeks
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D22054
2019-10-21 08:48:47 +00:00
Bjoern A. Zeeb
65456706c0 frag6: add read-only sysctl for nfrags.
Add a read-only sysctl exporting the global number of fragments
(base system and all vnets).  This is helpful to (a) know how many
fragments are currently being processed, (b) if there are possible
leaks, (c) if vnet teardown is not working correctly, and lastly
(d) it can be used as part of test-suits to ensure (a) to (c).

MFC after:	3 weeks
Sponsored by:	Netflix
2019-10-21 08:36:15 +00:00
Hans Petter Selasky
a55383e720 Fix panic in network stack due to use after free when receiving
partial fragmented packets before a network interface is detached.

When sending IPv4 or IPv6 fragmented packets and a fragment is lost
before the network device is freed, the mbuf making up the fragment
will remain in the temporary hashed fragment list and cause a panic
when it times out due to accessing a freed network interface
structure.


1) Make sure the m_pkthdr.rcvif always points to a valid network
interface. Else the rcvif field should be set to NULL.

2) Use the rcvif of the last received fragment as m_pkthdr.rcvif for
the fully defragged packet, instead of the first received fragment.

Panic backtrace for IPv6:

panic()
icmp6_reflect() # tries to access rcvif->if_afdata[AF_INET6]->xxx
icmp6_error()
frag6_freef()
frag6_slowtimo()
pfslowtimo()
softclock_call_cc()
softclock()
ithread_loop()

Reviewed by:	bz
Differential Revision:	https://reviews.freebsd.org/D19622
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2019-10-16 09:11:49 +00:00
Gleb Smirnoff
5f5ec65aaf in6ifa_llaonifp() is never called from fast path, so do not require
epoch being entered.
2019-10-14 15:33:53 +00:00
Michael Tuexen
583b625ba8 Remove line not needed.
Submitted by:		markj@
MFC after:		3 days
2019-10-13 09:35:03 +00:00
Gleb Smirnoff
ef2e580e56 Don't cover in6_ifattach() with network epoch, as it may call into
network drivers ioctls, that may sleep.

PR:		241223
2019-10-13 04:25:16 +00:00
Mark Johnston
49c5659e1c Add a missing include of opt_sctp.h.
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2019-10-12 22:58:33 +00:00
Gleb Smirnoff
1e4f4e56b9 ip6_output() has a complex set of gotos, and some can jump out of
the epoch section towards return statement. Since entering epoch
is cheap, it is easier to cover the whole function with epoch,
rather than try to properly maintain its state.
2019-10-09 17:02:28 +00:00
Gleb Smirnoff
3af7f97c4e Revert changes to rip6_bind() from r353292. This function is always
called in syscall context, so it must enter epoch itself.  This
changeset originates from early version of the patch, and somehow
slipped to the final version.

Reported by:	pho
2019-10-09 05:52:07 +00:00
Mark Johnston
cb49ec5431 Improve locking in the IPV6_V6ONLY socket option handler.
Acquire the inp lock before checking whether the socket is already bound,
and around updates to the inp_vflag field.

MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21867
2019-10-07 23:35:23 +00:00
Gleb Smirnoff
b8a6e03fac Widen NET_EPOCH coverage.
When epoch(9) was introduced to network stack, it was basically
dropped in place of existing locking, which was mutexes and
rwlocks. For the sake of performance mutex covered areas were
as small as possible, so became epoch covered areas.

However, epoch doesn't introduce any contention, it just delays
memory reclaim. So, there is no point to minimise epoch covered
areas in sense of performance. Meanwhile entering/exiting epoch
also has non-zero CPU usage, so doing this less often is a win.

Not the least is also code maintainability. In the new paradigm
we can assume that at any stage of processing a packet, we are
inside network epoch. This makes coding both input and output
path way easier.

On output path we already enter epoch quite early - in the
ip_output(), in the ip6_output().

This patch does the same for the input path. All ISR processing,
network related callouts, other ways of packet injection to the
network stack shall be performed in net_epoch. Any leaf function
that walks network configuration now asserts epoch.

Tricky part is configuration code paths - ioctls, sysctls. They
also call into leaf functions, so some need to be changed.

This patch would introduce more epoch recursions (see EPOCH_TRACE)
than we had before. They will be cleaned up separately, as several
of them aren't trivial. Note, that unlike a lock recursion the
epoch recursion is safe and just wastes a bit of resources.

Reviewed by:	gallatin, hselasky, cy, adrian, kristof
Differential Revision:	https://reviews.freebsd.org/D19111
2019-10-07 22:40:05 +00:00
Michael Tuexen
e7a541b0b9 When processing an incoming IPv6 packet over the loopback interface which
contains Hop-by-Hop options, the mbuf chain is potentially changed in
ip6_hopopts_input(), called by ip6_input_hbh().
This can happen, because of the the use of IP6_EXTHDR_CHECK, which might
call m_pullup().
So provide the updated pointer back to the called of ip6_input_hbh() to
avoid using a freed mbuf chain in`ip6_input()`.

Reviewed by:		markj@
MFC after:		3 days
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D21664
2019-09-19 10:22:29 +00:00
John Baldwin
b2e60773c6 Add kernel-side support for in-kernel TLS.
KTLS adds support for in-kernel framing and encryption of Transport
Layer Security (1.0-1.2) data on TCP sockets.  KTLS only supports
offload of TLS for transmitted data.  Key negotation must still be
performed in userland.  Once completed, transmit session keys for a
connection are provided to the kernel via a new TCP_TXTLS_ENABLE
socket option.  All subsequent data transmitted on the socket is
placed into TLS frames and encrypted using the supplied keys.

Any data written to a KTLS-enabled socket via write(2), aio_write(2),
or sendfile(2) is assumed to be application data and is encoded in TLS
frames with an application data type.  Individual records can be sent
with a custom type (e.g. handshake messages) via sendmsg(2) with a new
control message (TLS_SET_RECORD_TYPE) specifying the record type.

At present, rekeying is not supported though the in-kernel framework
should support rekeying.

KTLS makes use of the recently added unmapped mbufs to store TLS
frames in the socket buffer.  Each TLS frame is described by a single
ext_pgs mbuf.  The ext_pgs structure contains the header of the TLS
record (and trailer for encrypted records) as well as references to
the associated TLS session.

KTLS supports two primary methods of encrypting TLS frames: software
TLS and ifnet TLS.

Software TLS marks mbufs holding socket data as not ready via
M_NOTREADY similar to sendfile(2) when TLS framing information is
added to an unmapped mbuf in ktls_frame().  ktls_enqueue() is then
called to schedule TLS frames for encryption.  In the case of
sendfile_iodone() calls ktls_enqueue() instead of pru_ready() leaving
the mbufs marked M_NOTREADY until encryption is completed.  For other
writes (vn_sendfile when pages are available, write(2), etc.), the
PRUS_NOTREADY is set when invoking pru_send() along with invoking
ktls_enqueue().

A pool of worker threads (the "KTLS" kernel process) encrypts TLS
frames queued via ktls_enqueue().  Each TLS frame is temporarily
mapped using the direct map and passed to a software encryption
backend to perform the actual encryption.

(Note: The use of PHYS_TO_DMAP could be replaced with sf_bufs if
someone wished to make this work on architectures without a direct
map.)

KTLS supports pluggable software encryption backends.  Internally,
Netflix uses proprietary pure-software backends.  This commit includes
a simple backend in a new ktls_ocf.ko module that uses the kernel's
OpenCrypto framework to provide AES-GCM encryption of TLS frames.  As
a result, software TLS is now a bit of a misnomer as it can make use
of hardware crypto accelerators.

Once software encryption has finished, the TLS frame mbufs are marked
ready via pru_ready().  At this point, the encrypted data appears as
regular payload to the TCP stack stored in unmapped mbufs.

ifnet TLS permits a NIC to offload the TLS encryption and TCP
segmentation.  In this mode, a new send tag type (IF_SND_TAG_TYPE_TLS)
is allocated on the interface a socket is routed over and associated
with a TLS session.  TLS records for a TLS session using ifnet TLS are
not marked M_NOTREADY but are passed down the stack unencrypted.  The
ip_output_send() and ip6_output_send() helper functions that apply
send tags to outbound IP packets verify that the send tag of the TLS
record matches the outbound interface.  If so, the packet is tagged
with the TLS send tag and sent to the interface.  The NIC device
driver must recognize packets with the TLS send tag and schedule them
for TLS encryption and TCP segmentation.  If the the outbound
interface does not match the interface in the TLS send tag, the packet
is dropped.  In addition, a task is scheduled to refresh the TLS send
tag for the TLS session.  If a new TLS send tag cannot be allocated,
the connection is dropped.  If a new TLS send tag is allocated,
however, subsequent packets will be tagged with the correct TLS send
tag.  (This latter case has been tested by configuring both ports of a
Chelsio T6 in a lagg and failing over from one port to another.  As
the connections migrated to the new port, new TLS send tags were
allocated for the new port and connections resumed without being
dropped.)

ifnet TLS can be enabled and disabled on supported network interfaces
via new '[-]txtls[46]' options to ifconfig(8).  ifnet TLS is supported
across both vlan devices and lagg interfaces using failover, lacp with
flowid enabled, or lacp with flowid enabled.

Applications may request the current KTLS mode of a connection via a
new TCP_TXTLS_MODE socket option.  They can also use this socket
option to toggle between software and ifnet TLS modes.

In addition, a testing tool is available in tools/tools/switch_tls.
This is modeled on tcpdrop and uses similar syntax.  However, instead
of dropping connections, -s is used to force KTLS connections to
switch to software TLS and -i is used to switch to ifnet TLS.

Various sysctls and counters are available under the kern.ipc.tls
sysctl node.  The kern.ipc.tls.enable node must be set to true to
enable KTLS (it is off by default).  The use of unmapped mbufs must
also be enabled via kern.ipc.mb_use_ext_pgs to enable KTLS.

KTLS is enabled via the KERN_TLS kernel option.

This patch is the culmination of years of work by several folks
including Scott Long and Randall Stewart for the original design and
implementation; Drew Gallatin for several optimizations including the
use of ext_pgs mbufs, the M_NOTREADY mechanism for TLS records
awaiting software encryption, and pluggable software crypto backends;
and John Baldwin for modifications to support hardware TLS offload.

Reviewed by:	gallatin, hselasky, rrs
Obtained from:	Netflix
Sponsored by:	Netflix, Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D21277
2019-08-27 00:01:56 +00:00
Bjoern A. Zeeb
1540a98e36 frag6: move public structure into file local space.
Move ip6asfrag and the accompanying IP6_REASS_MBUF macro from
ip6_var.h into frag6.c as they are not used outside frag6.c.
Sadly struct ip6q is all over the mac framework so we have to
leave it public.

This reduces the public KPI space.

MFC after:		3 months
X-MFC:			possibly MFC the #define only to stable branches
Sponsored by:		Netflix
2019-08-08 10:59:54 +00:00
Bjoern A. Zeeb
5778b399f1 frag6.c: cleanup varaibles and return statements.
Consitently put () around return values.
Do not assign variables at the time of variable declaration.
Sort variables.  Rename ia to ia6, remove/reuse some variables used only
once or twice for temporary calculations.

No functional changes intended.

MFC after:		3 months
Sponsored by:		Netflix
2019-08-08 10:15:47 +00:00
Bjoern A. Zeeb
23d374aa14 frag6.c: initial comment and whitespace cleanup.
Cleanup some comments (start with upper case, ends in punctuation,
use width and do not consume vertical space).  Update comments to
RFC8200.  Some whitespace changes.

No functional changes.

MFC after:		3 months
Sponsored by:		Netflix
2019-08-08 09:42:57 +00:00
Ed Maste
7f8c266da5 Correct ICMPv6/MLDv2 out-of-bounds memory access
Previously the ICMPv6 input path incorrectly handled cases where an
MLDv2 listener query packet was internally fragmented across multiple
mbufs.

admbugs:	921
Submitted by:	jtl
Reported by:	CJD of Apple
Approved by:	so
MFC after:	0 minutes
Security:	CVE-2019-5608
2019-08-06 17:11:30 +00:00
Michael Tuexen
94962f6ba0 Improve consistency. No functional change.
MFC after:		3 days
2019-08-05 13:22:15 +00:00
Bjoern A. Zeeb
9cb1a47af2 frag6.c: rename ip6q[] to ipq6b[] and consistently use "bucket"
The hash buckets array is called ip6q.  The data structure ip6q is a
description of different object, the one the array holds these days
(since r337776).  To clear some of this confusion, rename the array
to ip6qb.

When iterating over all buckets or addressing them directly, we
use at least the variables i, hash, and bucket.  To keep the
terminology consistent use the variable name "bucket" and always
make it an uint32_t and not sometimes an int.

No functional behaviour changes intended.

MFC after:		3 months
Sponsored by:		Netflix
2019-08-05 11:01:12 +00:00
Bjoern A. Zeeb
c00464a245 frag6.c: re-order functions within file
Re-order functions within the file in preparation for an upcoming
code simplification.

No functional changes.

MFC after:		3 months
Sponsored by:		Netflix
2019-08-05 09:49:24 +00:00
Bjoern A. Zeeb
f349c821f5 frag6.c: fix includes
Bring back systm.h after r350532 and banish errno.h, time.h, and
machine/atomic.h.

Reported by:	bde (Thank you!)
Pointyhat to:	bz
MFC after:	12 weeks
X-MFC:		with r350532
Sponsored by:	Netflix
2019-08-03 16:56:44 +00:00