Commit Graph

6992 Commits

Author SHA1 Message Date
Randall Stewart
1d171e5ab9 tcp: Lro needs to validate that it does not go beyond the end of the mbuf as it parses.
Currently the LRO parser, if given a packet that say has ETH+IP header but the TCP header
is in the next mbuf (split), would walk garbage. Lets make sure we keep track as we
parse of the length and return NULL anytime we exceed the length of the mbuf.

Reviewed by: tuexen, hselasky
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D31195
2021-07-16 06:07:13 -04:00
Randall Stewart
ca1a7e1021 tcp: TCP_LRO getting bad checksums and sending it in to TCP incorrectly.
In reviewing tcp_lro.c we have a possibility that some drives may send a mbuf into
LRO without making sure that the checksum passes. Some drivers actually are
aware of this and do not call lro when the csum failed, others do not do this and
thus could end up sending data up that we think has a checksum passing when
it does not.

This change will fix that situation by properly verifying that the mbuf
has the correct markings (CSUM VALID bits as well as csum in mbuf header
is set to 0xffff).

Reviewed by: tuexen, hselasky, gallatin
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D31155
2021-07-13 12:45:15 -04:00
Stefan Eßer
58080fbca0 libalias: fix divide by zero causing panic
The packet_limit can fall to 0, leading to a divide by zero abort in
the "packets % packet_limit".

An possible solution would be to apply a lower limit of 1 after the
calculation of packet_limit, but since any number modulo 1 gives 0,
the more efficient solution is to skip the modulo operation for
packet_limit <= 1.

Since this is a fix for a panic observed in stable/12, merging this
fix to stable/12 and stable/13 before expiry of the 3 day waiting
period might be justified, if it works for the reporter of the issue.

Reported by:	Karl Denninger <karl@denninger.net>
MFC after:	3 days
2021-07-10 13:08:18 +02:00
Michael Tuexen
105b68b42d sctp: Fix errno in case of association setup failures
Do not report always ETIMEDOUT, but only when appropriate. In
other cases report ECONNABORTED.

MFC after:	3 days
2021-07-09 23:19:25 +02:00
Michael Tuexen
ce64352a70 sctp: provide consistent stream information in case of early errors
While there, make sure the function is called correctly.

MFC after:	3 days
2021-07-09 14:16:59 +02:00
Michael Tuexen
84992a3251 sctp: provide sac_error also for ABORT chunk being sent
Thanks to Florent Castelli for bringing this issue up for the
userland stack and providing an initial patch.

MFC:		3 days
2021-07-09 13:46:27 +02:00
Randall Stewart
7312e4e5cf tcp: Fix 32 bit platform breakage
This fixes the incorrect use of a sysctl add to u64. It
was for a useconds time, but on 32 bit platforms its
not a u64. Instead use the long directive.

Reviewed by: tuexen
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D31107
2021-07-08 08:16:45 -04:00
Andrew Gallatin
b1e806c0ed tcp: fix alternate stack build with LINT-NO{INET,INET6,IP}
When fixing another bug, I noticed that the alternate
TCP stacks do not build when various combinations of
ipv4 and ipv6 are disabled.

Reviewed by:	rrs, tuexen
Differential Revision:	https://reviews.freebsd.org/D31094
Sponsored by: Netflix
2021-07-07 13:02:08 -04:00
Randall Stewart
d7955cc0ff tcp: HPTS performance enhancements
HPTS drives both rack and bbr, and yet there have been many complaints
about performance. This bit of work restructures hpts to help reduce CPU
overhead. It does this by now instead of relying on the timer/callout to
drive it instead use user return from a system call as well as lro flushes
to drive hpts. The timer becomes a backstop that dynamically adjusts
based on how "late" we are.

Reviewed by: tuexen, glebius
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D31083
2021-07-07 07:22:35 -04:00
Randall Stewart
e834f9a44a tcp: Address goodput and TLP edge cases.
There are several cases where we make a goodput measurement and we are running
out of data when we decide to make the measurement. In reality we should not make
such a measurement if there is no chance we can have "enough" data. There is also
some corner case TLP's that end up not registering as a TLP like they should, we
fix this by pushing the doing_tlp setup to the actual timeout that knows it did
a TLP. This makes it so we always have the appropriate flag on the sendmap
indicating a TLP being done as well as count correctly so we make no more
that two TLP's.

In addressing the goodput lets also add a "quality" metric that can be viewed via
blackbox logs so that a casual observer does not have to figure out how good
of a measurement it is. This is needed due to the fact that we may still make
a measurement that is of a poorer quality as we run out of data but still have
a minimal amount of data to make a measurement.

Reviewed by: tuexen
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D31076
2021-07-06 15:26:37 -04:00
Andrew Gallatin
28d0a740dd ktls: auto-disable ifnet (inline hw) kTLS
Ifnet (inline) hw kTLS NICs typically keep state within
a TLS record, so that when transmitting in-order,
they can continue encryption on each segment sent without
DMA'ing extra state from the host.

This breaks down when transmits are out of order (eg,
TCP retransmits).  In this case, the NIC must re-DMA
the entire TLS record up to and including the segment
being retransmitted.  This means that when re-transmitting
the last 1448 byte segment of a TLS record, the NIC will
have to re-DMA the entire 16KB TLS record. This can lead
to the NIC running out of PCIe bus bandwidth well before
it saturates the network link if a lot of TCP connections have
a high retransmoit rate.

This change introduces a new sysctl (kern.ipc.tls.ifnet_max_rexmit_pct),
where TCP connections with higher retransmit rate will be
switched to SW kTLS so as to conserve PCIe bandwidth.

Reviewed by:	hselasky, markj, rrs
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D30908
2021-07-06 10:28:32 -04:00
Lutz Donnerhacke
4060e77f49 libalias: Remove a stray directive
Removal of a preprocessor line was missed during development.
Do it now and MFC it together with the other patches.

MFC after:	2 days
2021-07-04 17:54:45 +02:00
Lutz Donnerhacke
2f4d91f9cb libalias: Rewrite HISTORY
Fix the history entry (wrong year) and add the missing recent work.
MFC together with the other patches.

MFC after:	2 days
2021-07-04 17:46:47 +02:00
Lutz Donnerhacke
f284553444 libalias: Fix API bug on initialization
The kernel part of ipfw(8) does initialize LibAlias uncondistionally
with an zeroized port range (allowed ports from 0 to 0).  During
restucturing of libalias, port ranges are used everytime and are
therefor initialized with different values than zero.  The secondary
initialization from ipfw (and probably others) overrides the new
default values and leave the instance in an unfunctional state.  The
obvious solution is to detect such reinitializations and use the new
default value instead.

MFC after:	3 days
2021-07-03 23:03:07 +02:00
Lutz Donnerhacke
b50a4dce18 libalias: Avoid uninitialized expiration
The expiration time of direct address mappings is explicitly
uninitialized.  Expire times are always compared during housekeeping.
Despite the uninitialized value does not harm, it's simpler to just
set it to a reasonable default.  This was detected during valgrinding
the test suite.

MFC after:	3 days
2021-07-03 01:09:18 +02:00
Lutz Donnerhacke
25392fac94 libalias: Fix splay comparsion bug
Comparing elements in a tree requires transitiviy.  If a < b and b < c
then a must be smaller than c.  This way the tree elements are always
pairwise comparable.

Tristate comparsion functions returning values lower, equal, or
greater than zero, are usually implemented by a simple subtraction of
the operands.  If the size of the operands are equal to the size of
the result, integer modular arithmetics kick in and violates the
transitivity.

Example:
Working on byte with 0, 120, and 240. Now computing the differences:
  120 -   0 = 120
  240 - 120 = 120
  240 -   0 = -16

MFC after:	3 days
2021-07-03 00:31:53 +02:00
Michael Tuexen
c7f048ab35 sctp: initialize sequence numbers for ECN correctly
MFC after:	3 days
Reported by:	Junseok Yang (for the userland stack)
2021-06-27 20:14:48 +02:00
Michael Tuexen
6587a2bd1e sctp: Fix length check for ECNE chunks
MFC after:	3 days
2021-06-27 16:10:39 +02:00
Michael Tuexen
870af3f4dc tcp: tolerate missing timestamps
Some TCP stacks negotiate TS support, but do not send TS at all
or not for keep-alive segments. Since this includes modern widely
deployed stacks, tolerate the violation of RFC 7323 per default.

Reviewed by:		rgrimes, rrs, rscheff
MFC after:		3 days
Differential Revision:	https://reviews.freebsd.org/D30740
Sponsored by:		Netflix, Inc.
2021-06-27 16:03:57 +02:00
Randall Stewart
9e4d9e4c4d tcp: Preparation for allowing hardware TLS to be able to kick a tcp connection that is retransmitting too much out of hardware and back to software.
Hardware TLS is now supported in some interface cards and it works well. Except that
when we have connections that retransmit a lot we get into trouble with all the retransmits.
This prep step makes way for change that Drew will be making so that we can "kick out" a
session from hardware TLS.

Reviewed by: mtuexen, gallatin
Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D30895
2021-06-25 09:30:54 -04:00
Randall Stewart
66aec14a53 tcp: Rack not being very friendly with V6:4 socket and having a connection from V4
There were two bugs that prevented V4 sockets from connecting to
a rack server running a V4/V6 socket. As well as a bug that stops the
mapped v4 in V6 address from working.

Reviewed by: mtuexen
Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D30885
2021-06-24 14:42:21 -04:00
Wojciech Macek
17ac6d94db ip_mroute: initialize vif ifnet properly
Use if_alloc to ensure all fields of ifnet are allocated
properly

Reported by:   Damien Deville
Sponsored by:  Stormshield
Obtained from: Semihalf
Reviewed by:   mw
Differential revision: https://reviews.freebsd.org/D30608
2021-06-23 10:13:52 +02:00
Lutz Donnerhacke
f70c98a2f5 libalias: Fix compile time warning about unused functions
Compiling libalias results in warnings about unused functions.
Those warnings are caused by clang's heuristic to consider an inline
function as in use, iff the declaration is in a *.c file.
Declarations in *.h files do not emit those warnings.

Hence the declarations must be moved to an extra *.h file.

MFC after:	3 days
Differential Revision: https://reviews.freebsd.org/D30844
2021-06-23 10:06:04 +02:00
John Baldwin
a7f6c6fd94 toe: Read-lock the inp in toe_4tuple_check().
tcp_twcheck now expects a read lock on the inp for the SYN case
instead of a write lock.

Reviewed by:	np
Fixes:		1db08fbe3f tcp_input: always request read-locking of PCB for any pure SYN segment.
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D30782
2021-06-22 16:31:01 -07:00
Gleb Smirnoff
c4804b6b0b Unbreak TFO, that was broken with 8d5719aa74. These two assignments
are unneccessary and used to be there before TFO as an invariant.  With
TFO and after 8d5719aa74 the "so" value is still needed.

Reported & tested by:	tuexen
Fixes:	8d5719aa74
2021-06-22 16:03:44 -07:00
Lutz Donnerhacke
d261e57dea libalias: Switch to efficient data structure for incoming traffic
Current data structure is using a hash of unordered lists.  Those
unordered lists are quite efficient, because the least recently
inserted entries are most likely to be used again.  In order to avoid
long search times in other cases, the lists are hashed into many
buckets.  Unfortunatly a search for a miss needs an exhaustive
inspection and a careful definition of the hash.

Splay trees offer a similar feature: Almost O(1) for access of the
least recently used entries, and amortized O(ln(n)) for almost all
other cases.  Get rid of the hash.

Now the data structure should able to quickly react to external
packets without eating CPU cycles for breakfast, preventing a DoS.

PR:		192888
Discussed with:	Dimitry Luhtionov
MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D30536
2021-06-19 22:12:28 +02:00
Lutz Donnerhacke
935fc93af1 libalias: Switch to efficient data structure for outgoing traffic
Current data structure is using a hash of unordered lists.  Those
unordered lists are quite efficient, because the least recently
inserted entries are most likely to be used again.  In order to avoid
long search times in other cases, the lists are hashed into many
buckets.  Unfortunatly a search for a miss needs an exhaustive
inspection and a careful definition of the hash.

Splay trees offer a similar feature - almost O(1) for access of the
least recently used entries), and amortized O(ln(n) - for almost all
other cases.  Get rid of the hash.

Discussed with:	Dimitry Luhtionov
MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D30516
2021-06-19 22:09:44 +02:00
Lutz Donnerhacke
d989935b5b libalias: Restructure - Finalize
Note, that the restructuring is done.

MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D30582
2021-06-19 21:58:56 +02:00
Lutz Donnerhacke
fe83900f9f libalias: Restructure - Remove temporary state deleteAllLinks from global struct
The entry deleteAllLinks in the struct libalias is only used to signal
a state between internal calls.  It's not used between API calls.

MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D30604
2021-06-19 21:55:11 +02:00
Lutz Donnerhacke
9efcad61d8 libalias: Restructure - Use AliasRange instead of PORT_BASE
Get rid of PORT_BASE, replace by AliasRange. Simplify code.
Factor out the search for a new port. Improves the perfomance a bit.

Discussed with:	Dimitry Luhtionov
MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D30581
2021-06-19 21:40:09 +02:00
Lutz Donnerhacke
1178dda53d libalias: Restructure - Table for PPTP
Let PPTP use its own data structure.
Regroup and rename other lists, which are not PPTP.

MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D30580
2021-06-19 21:26:31 +02:00
Lutz Donnerhacke
7b44ff4c52 libalias: Restructure - Group expire handling entries
Reorder the internal structure semantically.

MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D30575
2021-06-19 21:12:27 +02:00
Lutz Donnerhacke
492d3b7109 libalias: Restructure - Group incoming links
Reorder incoming links by grouping of common search terms.
Significant performance improvement for incoming (missing) flows.

Remove LSNAT from outgoing search.
Slight speedup due to less comparsions in the loop.

MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D30574
2021-06-19 21:03:47 +02:00
Lutz Donnerhacke
d4ab07d2ae libalias: Restructure - Cleanup and Use for links
Factor out a common idiom to return found links.

MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D30573
2021-06-19 20:28:53 +02:00
Lutz Donnerhacke
d541903438 libalias: Restructure - Outgoing search
Factor out the outgoing search function.
Preparation for a new data structure.

MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D30572
2021-06-19 20:25:08 +02:00
Lutz Donnerhacke
19dcc4f225 libalias: Restructure - Cleanup _FindLinkIn
Simplify program flow in function _FindLinkIn.

MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D30571
2021-06-19 20:19:16 +02:00
Lutz Donnerhacke
cac129e603 libalias: Restructure - Table for partially links
Separate the partially specified links into a separate data structure.

This would causes a major parformance impact, if there are many of
them.  Use a (smaller) hash table to speed up the partially link
access.

MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D30570
2021-06-19 20:03:08 +02:00
Richard Scheffenegger
74d7fc8753 tcp: Add PRR cwnd reduction for non-SACK loss
This completes PRR cwnd reduction in all circumstances
for the base TCP stack (SACK loss recovery, ECN window reduction,
non-SACK loss recovery), preventing the arriving ACKs to
clock out new data at the old, too high rate. This
reduces the chance to induce additional losses while
recovering from loss (during congested network conditions).

For non-SACK loss recovery, each ACK is assumed to have
one MSS delivered. In order to prevent ACK-split attacks,
only one window worth of ACKs is considered to actually
have delivered new data.

MFC after: 6 weeks
Reviewed By: rrs, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D29441
2021-06-19 19:25:22 +02:00
Lutz Donnerhacke
32f9c2ceb3 libalias: Restructure - Separate fully qualified search
Search fully specified links first.  Some performance loss due to need
to revisit the db twice, if not found.

MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D30569
2021-06-19 19:21:05 +02:00
Lutz Donnerhacke
d41044ddfd libalias: Restructure - Common search terms
Factor out the common Out and In filter
Slightly better performance due to eager skip of search loop

MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D30568
2021-06-19 18:58:52 +02:00
Lutz Donnerhacke
ef828d39be libalias: Promote per instance global variable timeStamp
Summary:
- Use LibAliasTime as a real global variable for central timekeeping.
- Reduce number of syscalls in user space considerably.
- Dynamically adjust the packet counters to match the second resolution.
- Only check the first few packets after a time increase for expiry.

Discussed with:	hselasky
MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D30566
2021-06-19 18:25:44 +02:00
Lutz Donnerhacke
3fd20a79e7 libalias: Stats are unsigned
Stats counters are used as unsigned valued (i.e. printf("%u")) but are
defined as signed int.  This causes trouble later, so fix it early.

MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D30587
2021-06-19 18:21:17 +02:00
Mark Johnston
a100217489 Consistently use the SOCKBUF_MTX() and SOCK_MTX() macros
This makes it easier to change the socket locking protocols.  No
functional change intended.

MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-06-14 17:32:32 -04:00
Mark Johnston
f4bb1869dd Consistently use the SOLISTENING() macro
Some code was using it already, but in many places we were testing
SO_ACCEPTCONN directly.  As a small step towards fixing some bugs
involving synchronization with listen(2), make the kernel consistently
use SOLISTENING().  No functional change intended.

MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-06-14 17:32:27 -04:00
Michael Tuexen
f1536bb538 tcp: remove debug output from RACK
Reported by:		iron.udjin@gmail.com, Marek Zarychta
Reviewed by:		rrs
PR:			256538
MFC after:		3 days
Differential Revision:	https://reviews.freebsd.org/D30723
Sponsored by:		Netflix, Inc.
2021-06-11 20:23:39 +02:00
Randall Stewart
ba1b3e48f5 tcp: Missing mfree in rack and bbr
Recently (Nov) we added logic that protects against a peer negotiating a timestamp, and
then not including a timestamp. This involved in the input path doing a goto done_with_input
label. Now I suspect the code was cribbed from one in Rack that has to do with the SYN.
This had a bug, i.e. it should have a m_freem(m) before going to the label (bbr had this
missing m_freem() but rack did not). This then caused the missing m_freem to show
up in both BBR and Rack. Also looking at the code referencing m->m_pkthdr.lro_nsegs
later (after processing) is not a good idea, even though its only for logging. Best to
copy that off before any frees can take place.

Reviewed by: mtuexen
Sponsored by: Netflix Inc
Differential Revision:	https://reviews.freebsd.org/D30727
2021-06-11 11:38:08 -04:00
Michael Tuexen
fa3746be42 tcp: fix two bugs in new reno
* Completely initialise the CC module specific data
* Use beta_ecn in case of an ECN event whenever ABE is enabled
  or it is requested by the stack.

Reviewed by:		rscheff, rrs
MFC after:		3 days
Sponsored by:		Netflix, Inc.
2021-06-11 15:40:34 +02:00
Michael Tuexen
224cf7b35b tcp: fix compilation of IPv4-only builds
PR:			256538
Reported by:		iron.udjin@gmail.com
MFC after:		3 days
Sponsored by:		Netflix, Inc.
2021-06-11 09:50:46 +02:00
Lutz Donnerhacke
294799c6b0 libalias: tidy up housekeeping
Replace current expensive, but sparsly called housekeeping
by a single, repetive action.

This is part of a larger restructure of libalias in order to switch to
more efficient data structures.  The whole restructure process is
split into 15 reviews to ease reviewing.  All those steps will be
squashed into a single commit for MFC in order to hide the
intermediate states from production systems.

Reviewed by:	hselasky
MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D30277
2021-06-10 23:30:10 +02:00
Randall Stewart
67e892819b tcp: Mbuf leak while holding a socket buffer lock.
When running at NF the current Rack and BBR changes with the recent
commits from Richard that cause the socket buffer lock to be held over
the ip_output() call and then finally culminating in a call to tcp_handle_wakeup()
we get a lot of leaked mbufs. I don't think that this leak is actually caused
by holding the lock or what Richard has done, but is exposing some other
bug that has probably been lying dormant for a long time. I will continue to
look (using his changes) at what is going on to try to root cause out the issue.

In the meantime I can't leave the leaks out for everyone else. So this commit
will revert all of Richards changes and move both Rack and BBR back to just
doing the old sorwakeup_locked() calls after messing with the so_rcv buffer.

We may want to look at adding back in Richards changes after I have pinpointed
the root cause of the mbuf leak and fixed it.

Reviewed by: mtuexen,rscheff
Sponsored by: Netflix Inc
Differential Revision:	https://reviews.freebsd.org/D30704
2021-06-10 08:33:57 -04:00