Commit Graph

7010 Commits

Author SHA1 Message Date
Michael Tuexen
3808ab732e sctp: remove some set, but unused variables
Thanks to pkasting for submitting the patch for the userland stack.

MFC after:	3 days
2021-08-09 15:58:46 +02:00
Wojciech Macek
d9d59bb1af ipsec: Handle ICMP NEEDFRAG message.
It will be needed for upcoming PMTU implementation in ipsec.
    For now simply create/update an entry in tcp hostcache when needed.
    The code is based on https://people.freebsd.org/~ae/ipsec_transport_mode_ctlinput.diff

Authored by: 		Kornel Duleba <mindal@semihalf.com>
Differential revision: 	https://reviews.freebsd.org/D30992
Reviewed by: 		tuxen
Sponsored by: 		Stormshield
Obtained from:	 	Semihalf
2021-08-09 12:01:46 +02:00
Alexander V. Chernikov
9748eb7427 Simplify nhop operations in ip_output().
Consistently use `nh` instead of always dereferencing
 ro->ro_nh inside the if block.
Always use nexthop mtu, as it provides guarantee that mtu is accurate.
Pass `nh` pointer to rt_update_ro_flags() to allow upcoming uses
 of updating ro flags based on different nexthop.

Differential Revision: https://reviews.freebsd.org/D31451
Reviewed by:	kp
MFC after: 2 weeks
2021-08-08 09:19:27 +00:00
Gordon Bergling
04389c855e Fix some common typos in comments
- s/configuraiton/configuration/
- s/specifed/specified/
- s/compatiblity/compatibility/

MFC after:	5 days
2021-08-08 10:16:06 +02:00
Michael Tuexen
112899c6af sctp: improve input validation of mapped addresses in sctp_connectx()
MFC after:	3 days
2021-08-07 15:12:09 +02:00
Michael Tuexen
b732091a76 sctp: improve input validation of mapped addresses in send()
Reported by:	syzbot+35528f275f2eea6317cc@syzkaller.appspotmail.com
Reported by:	syzbot+ac29916d5f16d241553d@syzkaller.appspotmail.com
MFC after:	3 days
2021-08-07 14:50:40 +02:00
Hans Petter Selasky
bb5cd80e8b Update the TCP LRO code to handle both encrypted and un-encrypted traffic.
Encrypted and un-encrypted traffic needs to be coalesced separately.
Split the 16-bit lro_type field in the address information into two
8-bit fields, and then use the last 8-bit field for flags, which among
other indicate if the received mbuf is encrypted or un-encrypted.

Differential Revision:	https://reviews.freebsd.org/D31377
Reviewed by:	gallatin
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2021-08-06 11:28:44 +02:00
Andrew Gallatin
739de953ec ktls: Move KERN_TLS ifdef to tcp_var.h
This allows us to remove stubs in ktls.h and allows us
to sort the function prototypes.

Reviewed by: jhb
Sponsored by: Netflix
2021-08-05 19:17:35 -04:00
Alexander V. Chernikov
8482aa7748 Use lltable calculated header when sending lle holdchain after successful lle resolution.
Subscribers: imp, ae, bz

Differential Revision: https://reviews.freebsd.org/D31391
2021-08-05 20:44:36 +00:00
Michael Tuexen
3f1f6b6ef7 tcp, udp: improve input validation in handling bind()
Reported by:		syzbot+24fcfd8057e9bc339295@syzkaller.appspotmail.com
Reported by:		syzbot+6e90ceb5c89285b2655b@syzkaller.appspotmail.com
Reviewed by:		markj, rscheff
MFC after:		3 days
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D31422
2021-08-05 13:48:44 +02:00
Alexander V. Chernikov
f3a3b06121 [lltable] Unify datapath feedback mechamism.
Use newly-create llentry_request_feedback(),
 llentry_mark_used() and llentry_get_hittime() to
 request datapatch usage check and fetch the results
 in the same fashion both in IPv4 and IPv6.

While here, simplify llentry_provide_feedback() wrapper
 by eliminating 1 condition check.

MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D31390
2021-08-04 22:52:43 +00:00
Konstantin Kukushkin
a61c24ddb7 udp: Fix soroverflow SOCKBUF unlocking
We hold the SOCKBUF_LOCK so use soroverflow_locked here.
This bug may manifest as a non-killable process stuck in [*so_rcv].

Approved by:	scottl
Reviewed by:	Roy Marples <roy@marples.name>
Fixes:	7045b1603b
MFC after:  10 days
Differential Revision:	https://reviews.freebsd.org/D31374
2021-08-01 08:07:33 -07:00
Roy Marples
7045b1603b socket: Implement SO_RERROR
SO_RERROR indicates that receive buffer overflows should be handled as
errors. Historically receive buffer overflows have been ignored and
programs could not tell if they missed messages or messages had been
truncated because of overflows. Since programs historically do not
expect to get receive overflow errors, this behavior is not the
default.

This is really really important for programs that use route(4) to keep
in sync with the system. If we loose a message then we need to reload
the full system state, otherwise the behaviour from that point is
undefined and can lead to chasing bogus bug reports.

Reviewed by:	philip (network), kbowling (transport), gbe (manpages)
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D26652
2021-07-28 09:35:09 -07:00
Bryan Drewery
a573243370 netdump: Fix leaking debugnet state on errors.
Reviewed by:	cem, markj
Sponsored by:	Dell EMC
Differential Revision: https://reviews.freebsd.org/D31319
2021-07-27 09:06:23 -07:00
Mark Johnston
ba21825202 rip: Add missing minimum length validation in rip_output()
If the socket is configured such that the sender is expected to supply
the IP header, then we need to verify that it actually did so.

Reported by:	syzkaller+KMSAN
Reviewed by:	donner
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31302
2021-07-26 16:39:37 -04:00
Kristof Provost
8e1864ed07 pf: syncookie support
Import OpenBSD's syncookie support for pf. This feature help pf resist
TCP SYN floods by only creating states once the remote host completes
the TCP handshake rather than when the initial SYN packet is received.

This is accomplished by using the initial sequence numbers to encode a
cookie (hence the name) in the SYN+ACK response and verifying this on
receipt of the client ACK.

Reviewed by:	kbowling
Obtained from:	OpenBSD
MFC after:	1 week
Sponsored by:	Modirum MDPay
Differential Revision:	https://reviews.freebsd.org/D31138
2021-07-20 10:36:13 +02:00
Michael Tuexen
a730d82378 tcp: fix RACK and BBR when using VIMAGE enabled kernel
Fix a bug in VNET handling, which occurs when using specific NICs.
PR:			257195
Reviewed by:		rrs
MFC after:		3 days
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D31212
2021-07-20 00:29:18 +02:00
Randall Stewart
db4d2d7222 tcp: When rack or bbr get a pullup failure in the common code, don't free the NULL mbuf.
There is a bug in the error path where rack_bbr_common does a m_pullup() and the pullup fails.
There is a stray mfree(m) after m is set to NULL. This is not a good idea :-)

Reviewed by: tuexen
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D31194
2021-07-16 13:59:57 -04:00
Randall Stewart
1d171e5ab9 tcp: Lro needs to validate that it does not go beyond the end of the mbuf as it parses.
Currently the LRO parser, if given a packet that say has ETH+IP header but the TCP header
is in the next mbuf (split), would walk garbage. Lets make sure we keep track as we
parse of the length and return NULL anytime we exceed the length of the mbuf.

Reviewed by: tuexen, hselasky
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D31195
2021-07-16 06:07:13 -04:00
Randall Stewart
ca1a7e1021 tcp: TCP_LRO getting bad checksums and sending it in to TCP incorrectly.
In reviewing tcp_lro.c we have a possibility that some drives may send a mbuf into
LRO without making sure that the checksum passes. Some drivers actually are
aware of this and do not call lro when the csum failed, others do not do this and
thus could end up sending data up that we think has a checksum passing when
it does not.

This change will fix that situation by properly verifying that the mbuf
has the correct markings (CSUM VALID bits as well as csum in mbuf header
is set to 0xffff).

Reviewed by: tuexen, hselasky, gallatin
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D31155
2021-07-13 12:45:15 -04:00
Stefan Eßer
58080fbca0 libalias: fix divide by zero causing panic
The packet_limit can fall to 0, leading to a divide by zero abort in
the "packets % packet_limit".

An possible solution would be to apply a lower limit of 1 after the
calculation of packet_limit, but since any number modulo 1 gives 0,
the more efficient solution is to skip the modulo operation for
packet_limit <= 1.

Since this is a fix for a panic observed in stable/12, merging this
fix to stable/12 and stable/13 before expiry of the 3 day waiting
period might be justified, if it works for the reporter of the issue.

Reported by:	Karl Denninger <karl@denninger.net>
MFC after:	3 days
2021-07-10 13:08:18 +02:00
Michael Tuexen
105b68b42d sctp: Fix errno in case of association setup failures
Do not report always ETIMEDOUT, but only when appropriate. In
other cases report ECONNABORTED.

MFC after:	3 days
2021-07-09 23:19:25 +02:00
Michael Tuexen
ce64352a70 sctp: provide consistent stream information in case of early errors
While there, make sure the function is called correctly.

MFC after:	3 days
2021-07-09 14:16:59 +02:00
Michael Tuexen
84992a3251 sctp: provide sac_error also for ABORT chunk being sent
Thanks to Florent Castelli for bringing this issue up for the
userland stack and providing an initial patch.

MFC:		3 days
2021-07-09 13:46:27 +02:00
Randall Stewart
7312e4e5cf tcp: Fix 32 bit platform breakage
This fixes the incorrect use of a sysctl add to u64. It
was for a useconds time, but on 32 bit platforms its
not a u64. Instead use the long directive.

Reviewed by: tuexen
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D31107
2021-07-08 08:16:45 -04:00
Andrew Gallatin
b1e806c0ed tcp: fix alternate stack build with LINT-NO{INET,INET6,IP}
When fixing another bug, I noticed that the alternate
TCP stacks do not build when various combinations of
ipv4 and ipv6 are disabled.

Reviewed by:	rrs, tuexen
Differential Revision:	https://reviews.freebsd.org/D31094
Sponsored by: Netflix
2021-07-07 13:02:08 -04:00
Randall Stewart
d7955cc0ff tcp: HPTS performance enhancements
HPTS drives both rack and bbr, and yet there have been many complaints
about performance. This bit of work restructures hpts to help reduce CPU
overhead. It does this by now instead of relying on the timer/callout to
drive it instead use user return from a system call as well as lro flushes
to drive hpts. The timer becomes a backstop that dynamically adjusts
based on how "late" we are.

Reviewed by: tuexen, glebius
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D31083
2021-07-07 07:22:35 -04:00
Randall Stewart
e834f9a44a tcp: Address goodput and TLP edge cases.
There are several cases where we make a goodput measurement and we are running
out of data when we decide to make the measurement. In reality we should not make
such a measurement if there is no chance we can have "enough" data. There is also
some corner case TLP's that end up not registering as a TLP like they should, we
fix this by pushing the doing_tlp setup to the actual timeout that knows it did
a TLP. This makes it so we always have the appropriate flag on the sendmap
indicating a TLP being done as well as count correctly so we make no more
that two TLP's.

In addressing the goodput lets also add a "quality" metric that can be viewed via
blackbox logs so that a casual observer does not have to figure out how good
of a measurement it is. This is needed due to the fact that we may still make
a measurement that is of a poorer quality as we run out of data but still have
a minimal amount of data to make a measurement.

Reviewed by: tuexen
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D31076
2021-07-06 15:26:37 -04:00
Andrew Gallatin
28d0a740dd ktls: auto-disable ifnet (inline hw) kTLS
Ifnet (inline) hw kTLS NICs typically keep state within
a TLS record, so that when transmitting in-order,
they can continue encryption on each segment sent without
DMA'ing extra state from the host.

This breaks down when transmits are out of order (eg,
TCP retransmits).  In this case, the NIC must re-DMA
the entire TLS record up to and including the segment
being retransmitted.  This means that when re-transmitting
the last 1448 byte segment of a TLS record, the NIC will
have to re-DMA the entire 16KB TLS record. This can lead
to the NIC running out of PCIe bus bandwidth well before
it saturates the network link if a lot of TCP connections have
a high retransmoit rate.

This change introduces a new sysctl (kern.ipc.tls.ifnet_max_rexmit_pct),
where TCP connections with higher retransmit rate will be
switched to SW kTLS so as to conserve PCIe bandwidth.

Reviewed by:	hselasky, markj, rrs
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D30908
2021-07-06 10:28:32 -04:00
Lutz Donnerhacke
4060e77f49 libalias: Remove a stray directive
Removal of a preprocessor line was missed during development.
Do it now and MFC it together with the other patches.

MFC after:	2 days
2021-07-04 17:54:45 +02:00
Lutz Donnerhacke
2f4d91f9cb libalias: Rewrite HISTORY
Fix the history entry (wrong year) and add the missing recent work.
MFC together with the other patches.

MFC after:	2 days
2021-07-04 17:46:47 +02:00
Lutz Donnerhacke
f284553444 libalias: Fix API bug on initialization
The kernel part of ipfw(8) does initialize LibAlias uncondistionally
with an zeroized port range (allowed ports from 0 to 0).  During
restucturing of libalias, port ranges are used everytime and are
therefor initialized with different values than zero.  The secondary
initialization from ipfw (and probably others) overrides the new
default values and leave the instance in an unfunctional state.  The
obvious solution is to detect such reinitializations and use the new
default value instead.

MFC after:	3 days
2021-07-03 23:03:07 +02:00
Lutz Donnerhacke
b50a4dce18 libalias: Avoid uninitialized expiration
The expiration time of direct address mappings is explicitly
uninitialized.  Expire times are always compared during housekeeping.
Despite the uninitialized value does not harm, it's simpler to just
set it to a reasonable default.  This was detected during valgrinding
the test suite.

MFC after:	3 days
2021-07-03 01:09:18 +02:00
Lutz Donnerhacke
25392fac94 libalias: Fix splay comparsion bug
Comparing elements in a tree requires transitiviy.  If a < b and b < c
then a must be smaller than c.  This way the tree elements are always
pairwise comparable.

Tristate comparsion functions returning values lower, equal, or
greater than zero, are usually implemented by a simple subtraction of
the operands.  If the size of the operands are equal to the size of
the result, integer modular arithmetics kick in and violates the
transitivity.

Example:
Working on byte with 0, 120, and 240. Now computing the differences:
  120 -   0 = 120
  240 - 120 = 120
  240 -   0 = -16

MFC after:	3 days
2021-07-03 00:31:53 +02:00
Michael Tuexen
c7f048ab35 sctp: initialize sequence numbers for ECN correctly
MFC after:	3 days
Reported by:	Junseok Yang (for the userland stack)
2021-06-27 20:14:48 +02:00
Michael Tuexen
6587a2bd1e sctp: Fix length check for ECNE chunks
MFC after:	3 days
2021-06-27 16:10:39 +02:00
Michael Tuexen
870af3f4dc tcp: tolerate missing timestamps
Some TCP stacks negotiate TS support, but do not send TS at all
or not for keep-alive segments. Since this includes modern widely
deployed stacks, tolerate the violation of RFC 7323 per default.

Reviewed by:		rgrimes, rrs, rscheff
MFC after:		3 days
Differential Revision:	https://reviews.freebsd.org/D30740
Sponsored by:		Netflix, Inc.
2021-06-27 16:03:57 +02:00
Randall Stewart
9e4d9e4c4d tcp: Preparation for allowing hardware TLS to be able to kick a tcp connection that is retransmitting too much out of hardware and back to software.
Hardware TLS is now supported in some interface cards and it works well. Except that
when we have connections that retransmit a lot we get into trouble with all the retransmits.
This prep step makes way for change that Drew will be making so that we can "kick out" a
session from hardware TLS.

Reviewed by: mtuexen, gallatin
Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D30895
2021-06-25 09:30:54 -04:00
Randall Stewart
66aec14a53 tcp: Rack not being very friendly with V6:4 socket and having a connection from V4
There were two bugs that prevented V4 sockets from connecting to
a rack server running a V4/V6 socket. As well as a bug that stops the
mapped v4 in V6 address from working.

Reviewed by: mtuexen
Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D30885
2021-06-24 14:42:21 -04:00
Wojciech Macek
17ac6d94db ip_mroute: initialize vif ifnet properly
Use if_alloc to ensure all fields of ifnet are allocated
properly

Reported by:   Damien Deville
Sponsored by:  Stormshield
Obtained from: Semihalf
Reviewed by:   mw
Differential revision: https://reviews.freebsd.org/D30608
2021-06-23 10:13:52 +02:00
Lutz Donnerhacke
f70c98a2f5 libalias: Fix compile time warning about unused functions
Compiling libalias results in warnings about unused functions.
Those warnings are caused by clang's heuristic to consider an inline
function as in use, iff the declaration is in a *.c file.
Declarations in *.h files do not emit those warnings.

Hence the declarations must be moved to an extra *.h file.

MFC after:	3 days
Differential Revision: https://reviews.freebsd.org/D30844
2021-06-23 10:06:04 +02:00
John Baldwin
a7f6c6fd94 toe: Read-lock the inp in toe_4tuple_check().
tcp_twcheck now expects a read lock on the inp for the SYN case
instead of a write lock.

Reviewed by:	np
Fixes:		1db08fbe3f tcp_input: always request read-locking of PCB for any pure SYN segment.
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D30782
2021-06-22 16:31:01 -07:00
Gleb Smirnoff
c4804b6b0b Unbreak TFO, that was broken with 8d5719aa74. These two assignments
are unneccessary and used to be there before TFO as an invariant.  With
TFO and after 8d5719aa74 the "so" value is still needed.

Reported & tested by:	tuexen
Fixes:	8d5719aa74
2021-06-22 16:03:44 -07:00
Lutz Donnerhacke
d261e57dea libalias: Switch to efficient data structure for incoming traffic
Current data structure is using a hash of unordered lists.  Those
unordered lists are quite efficient, because the least recently
inserted entries are most likely to be used again.  In order to avoid
long search times in other cases, the lists are hashed into many
buckets.  Unfortunatly a search for a miss needs an exhaustive
inspection and a careful definition of the hash.

Splay trees offer a similar feature: Almost O(1) for access of the
least recently used entries, and amortized O(ln(n)) for almost all
other cases.  Get rid of the hash.

Now the data structure should able to quickly react to external
packets without eating CPU cycles for breakfast, preventing a DoS.

PR:		192888
Discussed with:	Dimitry Luhtionov
MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D30536
2021-06-19 22:12:28 +02:00
Lutz Donnerhacke
935fc93af1 libalias: Switch to efficient data structure for outgoing traffic
Current data structure is using a hash of unordered lists.  Those
unordered lists are quite efficient, because the least recently
inserted entries are most likely to be used again.  In order to avoid
long search times in other cases, the lists are hashed into many
buckets.  Unfortunatly a search for a miss needs an exhaustive
inspection and a careful definition of the hash.

Splay trees offer a similar feature - almost O(1) for access of the
least recently used entries), and amortized O(ln(n) - for almost all
other cases.  Get rid of the hash.

Discussed with:	Dimitry Luhtionov
MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D30516
2021-06-19 22:09:44 +02:00
Lutz Donnerhacke
d989935b5b libalias: Restructure - Finalize
Note, that the restructuring is done.

MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D30582
2021-06-19 21:58:56 +02:00
Lutz Donnerhacke
fe83900f9f libalias: Restructure - Remove temporary state deleteAllLinks from global struct
The entry deleteAllLinks in the struct libalias is only used to signal
a state between internal calls.  It's not used between API calls.

MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D30604
2021-06-19 21:55:11 +02:00
Lutz Donnerhacke
9efcad61d8 libalias: Restructure - Use AliasRange instead of PORT_BASE
Get rid of PORT_BASE, replace by AliasRange. Simplify code.
Factor out the search for a new port. Improves the perfomance a bit.

Discussed with:	Dimitry Luhtionov
MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D30581
2021-06-19 21:40:09 +02:00
Lutz Donnerhacke
1178dda53d libalias: Restructure - Table for PPTP
Let PPTP use its own data structure.
Regroup and rename other lists, which are not PPTP.

MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D30580
2021-06-19 21:26:31 +02:00
Lutz Donnerhacke
7b44ff4c52 libalias: Restructure - Group expire handling entries
Reorder the internal structure semantically.

MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D30575
2021-06-19 21:12:27 +02:00