code paths. Both are not consistent and the one on the syn cache code
does not conform to the relevant specifications (Page 69 of RFC 793
and Section 4.2 of RFC 5961).
This patch fixes this:
* The sequence numbers checks are fixed as specified on
page Page 69 RFC 793.
* The sysctl variable net.inet.tcp.insecure_rst is now honoured
and the behaviour as specified in Section 4.2 of RFC 5961.
Approved by: re (gjb@)
Reviewed by: bz@, glebius@, rrs@,
Differential Revision: https://reviews.freebsd.org/D17595
Sponsored by: Netflix, Inc.
to this change, the code sometimes used a temporary stack variable to hold
details of a TCP segment. r338102 stopped using the variable to hold
segments, but did not actually remove the variable.
Because the variable is no longer used, we can safely remove it.
Approved by: re (gjb)
an inp marked FREED after the epoch(9) changes.
Check once we hold the lock and skip the inp if it is the case.
Contrary to IPv6 the locking of the inp is outside the multicast
section and hence a single check seems to suffice.
PR: 232192
Reviewed by: mmacy, markj
Approved by: re (kib)
Differential Revision: https://reviews.freebsd.org/D17540
but leaving the variable assignment outside the block, where it is no longer
used. Move both the variable and the assignment one block further in.
This should result in no functional changes. It will however make upcoming
changes slightly easier to apply.
Reviewed by: markj, jtl, tuexen
Approved by: re (kib)
Differential Revision: https://reviews.freebsd.org/D17525
epoch section without exiting that epoch section. This is bad for two
reasons: the epoch section won't exit, and we will leave the epoch tracker
from the stack on the epoch list.
Fix the epoch leak by making sure we exit epoch sections before returning.
Reviewed by: ae, gallatin, mmacy
Approved by: re (gjb, kib)
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D17450
locally generated SCTP packets sent over IPv4. This make
the behaviour consistent with IPv6.
Reviewed by: ae@, bz@, jtl@
Approved by: re (kib@)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D17406
When getting the number of bytes to checksum make sure to convert the UDP
length to host byte order when the entire header is not in the first mbuf.
Reviewed by: jtl, tuexen, ae
Approved by: re (gjb), jtl (mentor)
Differential Revision: https://reviews.freebsd.org/D17357
With the new route cache feature udp_notify() will modify the inp when it
needs to invalidate the route cache. Ensure that we hold a write lock on
the inp before calling the function to ensure that multiple threads don't
race while trying to invalidate the cache (which previously lead to a page
fault).
Differential Revision: https://reviews.freebsd.org/D17246
Reviewed by: sbruno, bz, karels
Sponsored by: Dell EMC Isilon
Approved by: re (gjb)
This removes two assignments for the flags field being done
twice and adds one, which was missing.
Thanks to Felix Weinrank for reporting the issue he found
by using fuzz testing of the userland stack.
Approved by: re (kib@)
MFC after: 1 week
INP_INFO_UNLOCK_ASSERT() in TCP-related code. For encapsulated traffic
it is possible, that the code is running in net_epoch_preempt section,
and INP_INFO_UNLOCK_ASSERT() is very strict assertion for such case.
PR: 231428
Reviewed by: mmacy, tuexen
Approved by: re (kib)
Differential Revision: https://reviews.freebsd.org/D17335
sctp_process_cmsgs_for_init() and sctp_findassociation_cmsgs()
similar to sctp_find_cmsg() to improve consistency and avoid
the signed/unsigned issues in sctp_process_cmsgs_for_init()
and sctp_findassociation_cmsgs().
Thanks to andrew@ for reporting the problem he found using
syzcaller.
Approved by: re (kib@)
MFC after: 1 week
sending UDP encapsulated SCTP packets.
This is consistent with the behaviour that when such packets are received,
the corresponding UDP stats counter (udps_ipackets) is incremented.
Thanks to Peter Lei for making me aware of this inconsistency.
Approved by: re (kib@)
MFC after: 1 week
syncache_respond(). There is no functional change. The
parameter became unused in r313330, but wasn't removed.
Approved by: re (kib@)
MFC after: 1 month
Sponsored by: Netflix, Inc.
It is currently unused and reserved for future use to keep KBI/KPI.
Also add several spare pointers to be able extend structure if it
will be needed.
Approved by: re (gjb)
* Fix a bug where the SYN handling during established state was
applied to a front state.
* Move a check for retransmission after the timer handling.
This was suppressing timer based retransmissions.
* Fix an off-by one byte in the sequence number of retransmissions.
* Apply fixes corresponding to
https://svnweb.freebsd.org/changeset/base/336934
Reviewed by: rrs@
Approved by: re (kib@)
MFC after: 1 month
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D16912
Lookups are protected by an epoch section, so the LB group linkage must
be a CK_LIST rather than a plain LIST. Furthermore, we were not
deferring LB group frees, so in_pcbremlbgrouphash() could race with
readers and cause a use-after-free.
Reviewed by: sbruno, Johannes Lundberg <johalun0@gmail.com>
Tested by: gallatin
Approved by: re (gjb)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D17031
Reviewed by: bz, Johannes Lundberg <johalun0@gmail.com>
Approved by: re (kib)
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D17065
for the rt and lle cache were added in r191129 (2009).
To my best knowledge they have never been used and route caching
has converted the inp_rt field from that commit to inp_route
rendering this field and these flags obsolete.
Convert the pointer into a spare pointer to not change the size of
the structure anymore (and to have a spare pointer) and mark the
two fields as unused.
Reviewed by: markj, karels
Approved by: re (gjb)
Differential Revision: https://reviews.freebsd.org/D17062
adding the missing include files and changing a the type of cpuid which
would otherwise cause a false comparison with NETISR_CPUID_NONE.
Reviewed by: rrs
Approved by: re (marius)
Differential Revision: https://reviews.freebsd.org/D16891
Otherwise the "depends_on provider" guard in sctp.d does not work as
intended.
Reported by: mjg
Reviewed by: tuexen
Approved by: re (gjb)
Differential Revision: https://reviews.freebsd.org/D17057
No functional change intended.
Reviewed by: bz, Johannes Lundberg <johalun0@gmail.com>
Approved by: re (rgrimes)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D17030
fast forwarding path, as it already works for IPv6 and for both of them
on old slow path.
PR: 231143
Reviewed by: ae
Approved by: re (gjb)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D17039
use sizeof() or explicit #definesi instead. No functional change.
This was suggested by jmg@.
MFC after: 1 month
XMFC with: r338053
Sponsored by: Netflix, Inc.
SCTP. They are based on what is specified in the Solaris DTrace manual
for Solaris 11.4.
Reviewed by: 0mp, dteske, markj
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D16839
socket resulted in sending fragmented IPV6 packets.
This is fixes by reducing the MSS to the appropriate value. In addtion,
if the socket option is set before the handshake happens, announce this
MSS to the peer. This is not stricly required, but done since TCP
is conservative.
PR: 173444
Reviewed by: bz@, rrs@
MFC after: 1 month
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D16796
This was broken for IPv6 listening socket, which are not IPV6_ONLY,
and the accepted TCP connection was using IPv4.
Reviewed by: bz@, rrs@
MFC after: 1 month
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D16792
This is not a functional change but a preperation for the upcoming
DTrace support. It is necessary to change the state in one
logical operation, even if it involves clearing the sub state
SHUTDOWN_PENDING.
MFC after: 1 month
reassembly inbound tcp segments. The old algorithm just blindly
dropped in segments without coalescing. This meant that every
segment could take up greater and greater room on the linked list
of segments. This of course is now subject to a tighter limit (100)
of segments which in a high BDP situation will cause us to be a
lot more in-efficent as we drop segments beyond 100 entries that
we receive. What this restructure does is cause the reassembly
buffer to coalesce segments putting an emphasis on the two
common cases (which avoid walking the list of segments) i.e.
where we add to the back of the queue of segments and where we
add to the front. We also have the reassembly buffer supporting
a couple of debug options (black box logging as well as counters
for code coverage). These are compiled out by default but can
be added by uncommenting the defines.
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D16626
The TCP client side or the TCP server side when not using SYN-cookies
used the uptime as the TCP timestamp value. This patch uses in all
cases an offset, which is the result of a keyed hash function taking
the source and destination addresses and port numbers into account.
The keyed hash function is the same a used for the initial TSN.
Reviewed by: rrs@
MFC after: 1 month
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D16636
toe_l2_resolve to fill up the complete vtag and not just the vid.
Reviewed by: kib@
MFC after: 1 week
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D16752
This is actually several different bugs:
- The code is not designed to handle inpcb deletion after interface deletion
- add reference for inpcb membership
- The multicast address has to be removed from interface lists when the refcount
goes to zero OR when the interface goes away
- decouple list disconnect from refcount (v6 only for now)
- ifmultiaddr can exist past being on interface lists
- add flag for tracking whether or not it's enqueued
- deferring freeing moptions makes the incpb cleanup code simpler but opens the
door wider still to races
- call inp_gcmoptions synchronously after dropping the the inpcb lock
Fundamentally multicast needs a rewrite - but keep applying band-aids for now.
Tested by: kp
Reported by: novel, kp, lwhsu
In particular, try to ensure that no bucket will have a reassembly
queue larger than approximately 100 items. This limits the cost to
find the correct reassembly queue when processing an incoming
fragment.
Due to the low limits on each bucket's length, increase the size of
the hash table from 64 to 1024.
Reviewed by: jhb
Security: FreeBSD-SA-18:10.ip
Security: CVE-2018-6923
There is a hashing algorithm which should distribute IPv4 reassembly
queues across the available buckets in a relatively even way. However,
if there is a flaw in the hashing algorithm which allows a large number
of IPv4 fragment reassembly queues to end up in a single bucket, a per-
bucket limit could help mitigate the performance impact of this flaw.
Implement such a limit, with a default of twice the maximum number of
reassembly queues divided by the number of buckets. Recalculate the
limit any time the maximum number of reassembly queues changes.
However, allow the user to override the value using a sysctl
(net.inet.ip.maxfragbucketsize).
Reviewed by: jhb
Security: FreeBSD-SA-18:10.ip
Security: CVE-2018-6923
The IP reassembly fragment limit is based on the number of mbuf clusters,
which are a global resource. However, the limit is currently applied
on a per-VNET basis. Given enough VNETs (or given sufficient customization
of enough VNETs), it is possible that the sum of all the VNET limits
will exceed the number of mbuf clusters available in the system.
Given the fact that the fragment limit is intended (at least in part) to
regulate access to a global resource, the fragment limit should
be applied on a global basis.
VNET-specific limits can be adjusted by modifying the
net.inet.ip.maxfragpackets and net.inet.ip.maxfragsperpacket
sysctls.
To disable fragment reassembly globally, set net.inet.ip.maxfrags to 0.
To disable fragment reassembly for a particular VNET, set
net.inet.ip.maxfragpackets to 0.
Reviewed by: jhb
Security: FreeBSD-SA-18:10.ip
Security: CVE-2018-6923
Currently, IPv4 fragments are hashed into buckets based on a 32-bit
key which is calculated by (src_ip ^ ip_id) and combined with a random
seed. However, because an attacker can control the values of src_ip
and ip_id, it is possible to construct an attack which causes very
deep chains to form in a given bucket.
To ensure more uniform distribution (and lower predictability for
an attacker), calculate the hash based on a key which includes all
the fields we use to identify a reassembly queue (dst_ip, src_ip,
ip_id, and the ip protocol) as well as a random seed.
Reviewed by: jhb
Security: FreeBSD-SA-18:10.ip
Security: CVE-2018-6923