Commit Graph

6099 Commits

Author SHA1 Message Date
Michael Tuexen
93899d10b4 The handling of RST segments in the SYN-RCVD state exists in the
code paths. Both are not consistent and the one on the syn cache code
does not conform to the relevant specifications (Page 69 of RFC 793
and Section 4.2 of RFC 5961).

This patch fixes this:
* The sequence numbers checks are fixed as specified on
  page Page 69 RFC 793.
* The sysctl variable net.inet.tcp.insecure_rst is now honoured
  and the behaviour as specified in Section 4.2 of RFC 5961.

Approved by:		re (gjb@)
Reviewed by:		bz@, glebius@, rrs@,
Differential Revision:	https://reviews.freebsd.org/D17595
Sponsored by:		Netflix, Inc.
2018-10-18 19:21:18 +00:00
Jonathan T. Looney
ac75e35d85 In r338102, the TCP reassembly code was substantially restructured. Prior
to this change, the code sometimes used a temporary stack variable to hold
details of a TCP segment. r338102 stopped using the variable to hold
segments, but did not actually remove the variable.

Because the variable is no longer used, we can safely remove it.

Approved by:	re (gjb)
2018-10-16 14:41:09 +00:00
Bjoern A. Zeeb
4ba16a92c7 In udp_input() when walking the pcblist we can come across
an inp marked FREED after the epoch(9) changes.
Check once we hold the lock and skip the inp if it is the case.

Contrary to IPv6 the locking of the inp is outside the multicast
section and hence a single check seems to suffice.

PR:		232192
Reviewed by:	mmacy, markj
Approved by:	re (kib)
Differential Revision:	https://reviews.freebsd.org/D17540
2018-10-12 22:51:45 +00:00
Bjoern A. Zeeb
3afdfcaf33 r217592 moved the check for imo in udp_input() into the conditional block
but leaving the variable assignment outside the block, where it is no longer
used. Move both the variable and the assignment one block further in.

This should result in no functional changes. It will however make upcoming
changes slightly easier to apply.

Reviewed by:		markj, jtl, tuexen
Approved by:		re (kib)
Differential Revision:	https://reviews.freebsd.org/D17525
2018-10-12 11:30:46 +00:00
Jonathan T. Looney
13c6ba6d94 There are three places where we return from a function which entered an
epoch section without exiting that epoch section. This is bad for two
reasons: the epoch section won't exit, and we will leave the epoch tracker
from the stack on the epoch list.

Fix the epoch leak by making sure we exit epoch sections before returning.

Reviewed by:	ae, gallatin, mmacy
Approved by:	re (gjb, kib)
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D17450
2018-10-09 13:26:06 +00:00
Michael Tuexen
3535cdc43e Avoid truncating unrecognised parameters when reporting them.
This resulted in sending malformed packets.

Approved by:		re (kib@)
MFC after:		1 week
2018-10-07 15:13:47 +00:00
Michael Tuexen
3924dfa721 Ensure that the ips_localout counter is incremented for
locally generated SCTP packets sent over IPv4. This make
the behaviour consistent with IPv6.

Reviewed by:		ae@, bz@, jtl@
Approved by:		re (kib@)
MFC after:		1 week
Differential Revision:	https://reviews.freebsd.org/D17406
2018-10-07 11:26:15 +00:00
Tom Jones
b6e870116f Convert UDP length to host byte order
When getting the number of bytes to checksum make sure to convert the UDP
length to host byte order when the entire header is not in the first mbuf.

Reviewed by: jtl, tuexen, ae
Approved by: re (gjb), jtl (mentor)
Differential Revision:  https://reviews.freebsd.org/D17357
2018-10-05 12:51:30 +00:00
Ryan Stone
083a010c62 Hold a write lock across udp_notify()
With the new route cache feature udp_notify() will modify the inp when it
needs to invalidate the route cache.  Ensure that we hold a write lock on
the inp before calling the function to ensure that multiple threads don't
race while trying to invalidate the cache (which previously lead to a page
fault).

Differential Revision: https://reviews.freebsd.org/D17246
Reviewed by: sbruno, bz, karels
Sponsored by: Dell EMC Isilon
Approved by:	re (gjb)
2018-10-04 22:03:58 +00:00
Michael Tuexen
15a087e551 Mitigate providing a timing signal if the COOKIE or AUTH
validation fails.
Thanks to jmg@ for reporting the issue, which was discussed in
https://admbugs.freebsd.org/show_bug.cgi?id=878

Approved by:            re (TBD@)
MFC after:              1 week
2018-10-01 14:05:31 +00:00
Michael Tuexen
9d2e3f14c4 After allocating chunks set the fields in a consistent way.
This removes two assignments for the flags field being done
twice and adds one, which was missing.
Thanks to Felix Weinrank for reporting the issue he found
by using fuzz testing of the userland stack.

Approved by:            re (kib@)
MFC after:              1 week
2018-10-01 13:09:18 +00:00
Andrey V. Elsukov
384a5c3c28 Add INP_INFO_WUNLOCK_ASSERT() macro and use it instead of
INP_INFO_UNLOCK_ASSERT() in TCP-related code. For encapsulated traffic
it is possible, that the code is running in net_epoch_preempt section,
and INP_INFO_UNLOCK_ASSERT() is very strict assertion for such case.

PR:		231428
Reviewed by:	mmacy, tuexen
Approved by:	re (kib)
Differential Revision:	https://reviews.freebsd.org/D17335
2018-10-01 10:46:00 +00:00
Michael Tuexen
1b084a5e5e Plug mbuf leak in the SCTP input path in an error case.
Approved by:            re (kib@)
MFC after:              1 week
CID:			749312
2018-09-30 21:54:02 +00:00
Michael Tuexen
66bcf0b333 Plug mbuf leaks in the SCTP output path in error cases.
Approved by:            re (kib@)
MFC after:              1 week
CID:			1395307
2018-09-30 21:31:33 +00:00
Michael Tuexen
8184648425 Fix the handling of ancillary data for SCTP socket. Implement
sctp_process_cmsgs_for_init() and sctp_findassociation_cmsgs()
similar to sctp_find_cmsg() to improve consistency and avoid
the signed/unsigned issues in sctp_process_cmsgs_for_init()
and sctp_findassociation_cmsgs().

Thanks to andrew@ for reporting the problem he found using
syzcaller.

Approved by:            re (kib@)
MFC after:              1 week
2018-09-30 16:21:31 +00:00
Michael Tuexen
ae0a9a8850 Increment the corresponding UDP stats counter (udps_opackets) when
sending UDP encapsulated SCTP packets.
This is consistent with the behaviour that when such packets are received,
the corresponding UDP stats counter (udps_ipackets) is incremented.
Thanks to Peter Lei for making me aware of this inconsistency.

Approved by:            re (kib@)
MFC after:              1 week
2018-09-30 12:16:06 +00:00
Michael Tuexen
3552f16d82 Fix typo in comment.
Reported by:		@danfe
Approved by:		re (kib@)
MFC after:		1 week
X-MFC:			r338941
2018-09-28 19:47:32 +00:00
Michael Tuexen
0277ec9c43 Whitespace changes and fixing a typo. No functional change.
Approved by:	re (kib@)
MFC after:	1 week
2018-09-26 10:24:50 +00:00
Michael Tuexen
078a49a077 Remove the unused parameter 'locked' from the function
syncache_respond(). There is no functional change. The
parameter became unused in r313330, but wasn't removed.

Approved by:		re (kib@)
MFC after:		1 month
Sponsored by:		Netflix, Inc.
2018-09-23 16:37:32 +00:00
Andrey V. Elsukov
76b09d1823 Add new field max_hdrsize to struct encap_config.
It is currently unused and reserved for future use to keep KBI/KPI.
Also add several spare pointers to be able extend structure if it
will be needed.

Approved by:	re (gjb)
2018-09-20 19:45:27 +00:00
Michael Tuexen
ba4704a278 Remove unused code.
Approved by:	re (kib@)
MFC after:	1 week
2018-09-18 10:53:07 +00:00
Michael Tuexen
a8a8a8a808 Fix TCP Fast Open for the TCP RACK stack.
* Fix a bug where the SYN handling during established state was
  applied to a front state.
* Move a check for retransmission after the timer handling.
  This was suppressing timer based retransmissions.
* Fix an off-by one byte in the sequence number of retransmissions.
* Apply fixes corresponding to
  https://svnweb.freebsd.org/changeset/base/336934

Reviewed by:		rrs@
Approved by:		re (kib@)
MFC after:		1 month
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D16912
2018-09-12 10:27:58 +00:00
Mark Johnston
54af3d0dac Fix synchronization of LB group access.
Lookups are protected by an epoch section, so the LB group linkage must
be a CK_LIST rather than a plain LIST.  Furthermore, we were not
deferring LB group frees, so in_pcbremlbgrouphash() could race with
readers and cause a use-after-free.

Reviewed by:	sbruno, Johannes Lundberg <johalun0@gmail.com>
Tested by:	gallatin
Approved by:	re (gjb)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17031
2018-09-10 19:00:29 +00:00
Mark Johnston
a7026c7fd9 Use ratecheck(9) in in_pcbinslbgrouphash().
Reviewed by:	bz, Johannes Lundberg <johalun0@gmail.com>
Approved by:	re (kib)
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D17065
2018-09-07 21:11:41 +00:00
Bjoern A. Zeeb
113c4fad55 The inp_lle field to struct inpcb, along with two "valid" flags
for the rt and lle cache were added in r191129 (2009).
To my best knowledge they have never been used and route caching
has converted the inp_rt field from that commit to inp_route
rendering this field and these flags obsolete.

Convert the pointer into a spare pointer to not change the size of
the structure anymore (and to have a spare pointer) and mark the
two fields as unused.

Reviewed by:	markj, karels
Approved by:	re (gjb)
Differential Revision:	https://reviews.freebsd.org/D17062
2018-09-06 19:55:40 +00:00
Bjoern A. Zeeb
6d2b0c0166 Make tcp_hpts.c compile a LINT kernel with options RSS and PCBGROUPS added by
adding the missing include files and changing a the type of cpuid which
would otherwise cause a false comparison with NETISR_CPUID_NONE.

Reviewed by:	rrs
Approved by:	re (marius)
Differential Revision:	https://reviews.freebsd.org/D16891
2018-09-06 16:11:24 +00:00
Mark Johnston
49365eb433 Define sctp probes only when SCTP is configured.
Otherwise the "depends_on provider" guard in sctp.d does not work as
intended.

Reported by:	mjg
Reviewed by:	tuexen
Approved by:	re (gjb)
Differential Revision:	https://reviews.freebsd.org/D17057
2018-09-06 14:15:03 +00:00
Mark Johnston
8be02ee4da Fix style bugs in in_pcblookup_lbgroup().
No functional change intended.

Reviewed by:	bz, Johannes Lundberg <johalun0@gmail.com>
Approved by:	re (rgrimes)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17030
2018-09-05 15:04:11 +00:00
Eugene Grosbein
d5d21ad932 Fix "ipfw fwd" to work for incoming IPv4 packets when ip_tryforward() chooses
fast forwarding path, as it already works for IPv6 and for both of them
on old slow path.

PR:			231143
Reviewed by:		ae
Approved by:		re (gjb)
MFC after:		1 week
Differential Revision:	https://reviews.freebsd.org/D17039
2018-09-05 13:59:36 +00:00
Mark Johnston
73ad0b6abf Use the correct malloc type in in_pcblbgroup_free().
Approved by:	re (kib)
Sponsored by:	The FreeBSD Foundation
2018-09-03 17:39:09 +00:00
Michael Tuexen
c6c0be2765 Fix a shadowed variable warning.
Thanks to Peter Lei for reporting the issue.

Approved by:		re(kib@)
MFH:			1 month
Sponsored by:		Netflix, Inc.
2018-08-24 10:50:19 +00:00
Michael Tuexen
90ab3571d8 Use arc4rand() instead of read_random() in the SCTP and TCP code.
This was suggested by jmg@.

Reviewed by:		delphij@, jmg@, jtl@
MFC after:		1 month
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D16860
2018-08-23 19:10:45 +00:00
Michael Tuexen
4ba1513d1a Don't use the explicit number 32 for the length of the secrets,
use sizeof() or explicit #definesi instead. No functional change.
This was suggested by jmg@.

MFC after:		1 month
XMFC with:		r338053
Sponsored by:		Netflix, Inc.
2018-08-23 06:03:59 +00:00
Michael Tuexen
1e88cc8b59 Add support for send, receive and state-change DTrace providers for
SCTP. They are based on what is specified in the Solaris DTrace manual
for Solaris 11.4.

Reviewed by:		0mp, dteske, markj
Relnotes:		yes
Differential Revision:	https://reviews.freebsd.org/D16839
2018-08-22 21:23:32 +00:00
Matt Macy
d3878608d7 in_mcast: fix copy paste error when clearing flag 2018-08-22 04:09:55 +00:00
Michael Tuexen
5dff1c3845 Enabling the IPPROTO_IPV6 level socket option IPV6_USE_MIN_MTU on a TCP
socket resulted in sending fragmented IPV6 packets.

This is fixes by reducing the MSS to the appropriate value. In addtion,
if the socket option is set before the handshake happens, announce this
MSS to the peer. This is not stricly required, but done since TCP
is conservative.

PR:			173444
Reviewed by:		bz@, rrs@
MFC after:		1 month
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D16796
2018-08-21 14:12:30 +00:00
Michael Tuexen
7d4dcc36a8 Fix the inheritance of IPv6 level socket options on TCP sockets.
This was broken for IPv6 listening socket, which are not IPV6_ONLY,
and the accepted TCP connection was using IPv4.

Reviewed by:		bz@, rrs@
MFC after:		1 month
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D16792
2018-08-21 14:07:36 +00:00
Michael Tuexen
6ef849e601 Whitespace change. 2018-08-21 13:37:06 +00:00
Michael Tuexen
1a0b021677 Refactor the SHUTDOWN_PENDING state handling.
This is not a functional change but a preperation for the upcoming
DTrace support. It is necessary to change the state in one
logical operation, even if it involves clearing the sub state
SHUTDOWN_PENDING.

MFC after:		1 month
2018-08-21 13:25:32 +00:00
Bjoern A. Zeeb
10b070c166 GC inc_isipv6; it was added for "temp" compatibility in 2001, r86764
and does not seem to be used.
2018-08-20 20:06:36 +00:00
Randall Stewart
c28440db29 This change represents a substantial restructure of the way we
reassembly inbound tcp segments. The old algorithm just blindly
dropped in segments without coalescing. This meant that every
segment could take up greater and greater room on the linked list
of segments. This of course is now subject to a tighter limit (100)
of segments which in a high BDP situation will cause us to be a
lot more in-efficent as we drop segments beyond 100 entries that
we receive. What this restructure does is cause the reassembly
buffer to coalesce segments putting an emphasis on the two
common cases (which avoid walking the list of segments) i.e.
where we add to the back of the queue of segments and where we
add to the front. We also have the reassembly buffer supporting
a couple of debug options (black box logging as well as counters
for code coverage). These are compiled out by default but can
be added by uncommenting the defines.

Sponsored by:	Netflix Inc.
Differential Revision:	https://reviews.freebsd.org/D16626
2018-08-20 12:43:18 +00:00
Michael Tuexen
8e02b4e00c Don't expose the uptime via the TCP timestamps.
The TCP client side or the TCP server side when not using SYN-cookies
used the uptime as the TCP timestamp value. This patch uses in all
cases an offset, which is the result of a keyed hash function taking
the source and destination addresses and port numbers into account.
The keyed hash function is the same a used for the initial TSN.

Reviewed by:		rrs@
MFC after:		1 month
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D16636
2018-08-19 14:56:10 +00:00
Navdeep Parhar
32d2623ae2 Add the ability to look up the 3b PCP of a VLAN interface. Use it in
toe_l2_resolve to fill up the complete vtag and not just the vid.

Reviewed by:	kib@
MFC after:	1 week
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D16752
2018-08-16 23:46:38 +00:00
Matt Macy
f9be038601 Fix in6_multi double free
This is actually several different bugs:
- The code is not designed to handle inpcb deletion after interface deletion
  - add reference for inpcb membership
- The multicast address has to be removed from interface lists when the refcount
  goes to zero OR when the interface goes away
  - decouple list disconnect from refcount (v6 only for now)
- ifmultiaddr can exist past being on interface lists
  - add flag for tracking whether or not it's enqueued
- deferring freeing moptions makes the incpb cleanup code simpler but opens the
  door wider still to races
  - call inp_gcmoptions synchronously after dropping the the inpcb lock

Fundamentally multicast needs a rewrite - but keep applying band-aids for now.

Tested by: kp
Reported by: novel, kp, lwhsu
2018-08-15 20:23:08 +00:00
Luiz Otavio O Souza
59b2022f94 Late style follow up on r312770.
Submitted by:	glebius
X-MFC with:	r312770
MFC after:	3 days
2018-08-15 15:44:30 +00:00
Jonathan T. Looney
a967df1c8f Lower the default limits on the IPv4 reassembly queue.
In particular, try to ensure that no bucket will have a reassembly
queue larger than approximately 100 items. This limits the cost to
find the correct reassembly queue when processing an incoming
fragment.

Due to the low limits on each bucket's length, increase the size of
the hash table from 64 to 1024.

Reviewed by:	jhb
Security:	FreeBSD-SA-18:10.ip
Security:	CVE-2018-6923
2018-08-14 17:30:46 +00:00
Jonathan T. Looney
ff790bbad0 Implement a limit on on the number of IPv4 reassembly queues per bucket.
There is a hashing algorithm which should distribute IPv4 reassembly
queues across the available buckets in a relatively even way. However,
if there is a flaw in the hashing algorithm which allows a large number
of IPv4 fragment reassembly queues to end up in a single bucket, a per-
bucket limit could help mitigate the performance impact of this flaw.

Implement such a limit, with a default of twice the maximum number of
reassembly queues divided by the number of buckets. Recalculate the
limit any time the maximum number of reassembly queues changes.
However, allow the user to override the value using a sysctl
(net.inet.ip.maxfragbucketsize).

Reviewed by:	jhb
Security:	FreeBSD-SA-18:10.ip
Security:	CVE-2018-6923
2018-08-14 17:23:05 +00:00
Jonathan T. Looney
7b9c5eb0a5 Add a global limit on the number of IPv4 fragments.
The IP reassembly fragment limit is based on the number of mbuf clusters,
which are a global resource. However, the limit is currently applied
on a per-VNET basis. Given enough VNETs (or given sufficient customization
of enough VNETs), it is possible that the sum of all the VNET limits
will exceed the number of mbuf clusters available in the system.

Given the fact that the fragment limit is intended (at least in part) to
regulate access to a global resource, the fragment limit should
be applied on a global basis.

VNET-specific limits can be adjusted by modifying the
net.inet.ip.maxfragpackets and net.inet.ip.maxfragsperpacket
sysctls.

To disable fragment reassembly globally, set net.inet.ip.maxfrags to 0.
To disable fragment reassembly for a particular VNET, set
net.inet.ip.maxfragpackets to 0.

Reviewed by:	jhb
Security:	FreeBSD-SA-18:10.ip
Security:	CVE-2018-6923
2018-08-14 17:19:49 +00:00
Jonathan T. Looney
5d9bd45518 Improve hashing of IPv4 fragments.
Currently, IPv4 fragments are hashed into buckets based on a 32-bit
key which is calculated by (src_ip ^ ip_id) and combined with a random
seed. However, because an attacker can control the values of src_ip
and ip_id, it is possible to construct an attack which causes very
deep chains to form in a given bucket.

To ensure more uniform distribution (and lower predictability for
an attacker), calculate the hash based on a key which includes all
the fields we use to identify a reassembly queue (dst_ip, src_ip,
ip_id, and the ip protocol) as well as a random seed.

Reviewed by:	jhb
Security:	FreeBSD-SA-18:10.ip
Security:	CVE-2018-6923
2018-08-14 17:15:47 +00:00
Michael Tuexen
0f1346f7f4 Remove a set but not used warning showing up in usrsctp. 2018-08-14 08:32:33 +00:00