Commit Graph

6980 Commits

Author SHA1 Message Date
Lutz Donnerhacke
2f4d91f9cb libalias: Rewrite HISTORY
Fix the history entry (wrong year) and add the missing recent work.
MFC together with the other patches.

MFC after:	2 days
2021-07-04 17:46:47 +02:00
Lutz Donnerhacke
f284553444 libalias: Fix API bug on initialization
The kernel part of ipfw(8) does initialize LibAlias uncondistionally
with an zeroized port range (allowed ports from 0 to 0).  During
restucturing of libalias, port ranges are used everytime and are
therefor initialized with different values than zero.  The secondary
initialization from ipfw (and probably others) overrides the new
default values and leave the instance in an unfunctional state.  The
obvious solution is to detect such reinitializations and use the new
default value instead.

MFC after:	3 days
2021-07-03 23:03:07 +02:00
Lutz Donnerhacke
b50a4dce18 libalias: Avoid uninitialized expiration
The expiration time of direct address mappings is explicitly
uninitialized.  Expire times are always compared during housekeeping.
Despite the uninitialized value does not harm, it's simpler to just
set it to a reasonable default.  This was detected during valgrinding
the test suite.

MFC after:	3 days
2021-07-03 01:09:18 +02:00
Lutz Donnerhacke
25392fac94 libalias: Fix splay comparsion bug
Comparing elements in a tree requires transitiviy.  If a < b and b < c
then a must be smaller than c.  This way the tree elements are always
pairwise comparable.

Tristate comparsion functions returning values lower, equal, or
greater than zero, are usually implemented by a simple subtraction of
the operands.  If the size of the operands are equal to the size of
the result, integer modular arithmetics kick in and violates the
transitivity.

Example:
Working on byte with 0, 120, and 240. Now computing the differences:
  120 -   0 = 120
  240 - 120 = 120
  240 -   0 = -16

MFC after:	3 days
2021-07-03 00:31:53 +02:00
Michael Tuexen
c7f048ab35 sctp: initialize sequence numbers for ECN correctly
MFC after:	3 days
Reported by:	Junseok Yang (for the userland stack)
2021-06-27 20:14:48 +02:00
Michael Tuexen
6587a2bd1e sctp: Fix length check for ECNE chunks
MFC after:	3 days
2021-06-27 16:10:39 +02:00
Michael Tuexen
870af3f4dc tcp: tolerate missing timestamps
Some TCP stacks negotiate TS support, but do not send TS at all
or not for keep-alive segments. Since this includes modern widely
deployed stacks, tolerate the violation of RFC 7323 per default.

Reviewed by:		rgrimes, rrs, rscheff
MFC after:		3 days
Differential Revision:	https://reviews.freebsd.org/D30740
Sponsored by:		Netflix, Inc.
2021-06-27 16:03:57 +02:00
Randall Stewart
9e4d9e4c4d tcp: Preparation for allowing hardware TLS to be able to kick a tcp connection that is retransmitting too much out of hardware and back to software.
Hardware TLS is now supported in some interface cards and it works well. Except that
when we have connections that retransmit a lot we get into trouble with all the retransmits.
This prep step makes way for change that Drew will be making so that we can "kick out" a
session from hardware TLS.

Reviewed by: mtuexen, gallatin
Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D30895
2021-06-25 09:30:54 -04:00
Randall Stewart
66aec14a53 tcp: Rack not being very friendly with V6:4 socket and having a connection from V4
There were two bugs that prevented V4 sockets from connecting to
a rack server running a V4/V6 socket. As well as a bug that stops the
mapped v4 in V6 address from working.

Reviewed by: mtuexen
Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D30885
2021-06-24 14:42:21 -04:00
Wojciech Macek
17ac6d94db ip_mroute: initialize vif ifnet properly
Use if_alloc to ensure all fields of ifnet are allocated
properly

Reported by:   Damien Deville
Sponsored by:  Stormshield
Obtained from: Semihalf
Reviewed by:   mw
Differential revision: https://reviews.freebsd.org/D30608
2021-06-23 10:13:52 +02:00
Lutz Donnerhacke
f70c98a2f5 libalias: Fix compile time warning about unused functions
Compiling libalias results in warnings about unused functions.
Those warnings are caused by clang's heuristic to consider an inline
function as in use, iff the declaration is in a *.c file.
Declarations in *.h files do not emit those warnings.

Hence the declarations must be moved to an extra *.h file.

MFC after:	3 days
Differential Revision: https://reviews.freebsd.org/D30844
2021-06-23 10:06:04 +02:00
John Baldwin
a7f6c6fd94 toe: Read-lock the inp in toe_4tuple_check().
tcp_twcheck now expects a read lock on the inp for the SYN case
instead of a write lock.

Reviewed by:	np
Fixes:		1db08fbe3f tcp_input: always request read-locking of PCB for any pure SYN segment.
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D30782
2021-06-22 16:31:01 -07:00
Gleb Smirnoff
c4804b6b0b Unbreak TFO, that was broken with 8d5719aa74. These two assignments
are unneccessary and used to be there before TFO as an invariant.  With
TFO and after 8d5719aa74 the "so" value is still needed.

Reported & tested by:	tuexen
Fixes:	8d5719aa74
2021-06-22 16:03:44 -07:00
Lutz Donnerhacke
d261e57dea libalias: Switch to efficient data structure for incoming traffic
Current data structure is using a hash of unordered lists.  Those
unordered lists are quite efficient, because the least recently
inserted entries are most likely to be used again.  In order to avoid
long search times in other cases, the lists are hashed into many
buckets.  Unfortunatly a search for a miss needs an exhaustive
inspection and a careful definition of the hash.

Splay trees offer a similar feature: Almost O(1) for access of the
least recently used entries, and amortized O(ln(n)) for almost all
other cases.  Get rid of the hash.

Now the data structure should able to quickly react to external
packets without eating CPU cycles for breakfast, preventing a DoS.

PR:		192888
Discussed with:	Dimitry Luhtionov
MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D30536
2021-06-19 22:12:28 +02:00
Lutz Donnerhacke
935fc93af1 libalias: Switch to efficient data structure for outgoing traffic
Current data structure is using a hash of unordered lists.  Those
unordered lists are quite efficient, because the least recently
inserted entries are most likely to be used again.  In order to avoid
long search times in other cases, the lists are hashed into many
buckets.  Unfortunatly a search for a miss needs an exhaustive
inspection and a careful definition of the hash.

Splay trees offer a similar feature - almost O(1) for access of the
least recently used entries), and amortized O(ln(n) - for almost all
other cases.  Get rid of the hash.

Discussed with:	Dimitry Luhtionov
MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D30516
2021-06-19 22:09:44 +02:00
Lutz Donnerhacke
d989935b5b libalias: Restructure - Finalize
Note, that the restructuring is done.

MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D30582
2021-06-19 21:58:56 +02:00
Lutz Donnerhacke
fe83900f9f libalias: Restructure - Remove temporary state deleteAllLinks from global struct
The entry deleteAllLinks in the struct libalias is only used to signal
a state between internal calls.  It's not used between API calls.

MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D30604
2021-06-19 21:55:11 +02:00
Lutz Donnerhacke
9efcad61d8 libalias: Restructure - Use AliasRange instead of PORT_BASE
Get rid of PORT_BASE, replace by AliasRange. Simplify code.
Factor out the search for a new port. Improves the perfomance a bit.

Discussed with:	Dimitry Luhtionov
MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D30581
2021-06-19 21:40:09 +02:00
Lutz Donnerhacke
1178dda53d libalias: Restructure - Table for PPTP
Let PPTP use its own data structure.
Regroup and rename other lists, which are not PPTP.

MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D30580
2021-06-19 21:26:31 +02:00
Lutz Donnerhacke
7b44ff4c52 libalias: Restructure - Group expire handling entries
Reorder the internal structure semantically.

MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D30575
2021-06-19 21:12:27 +02:00
Lutz Donnerhacke
492d3b7109 libalias: Restructure - Group incoming links
Reorder incoming links by grouping of common search terms.
Significant performance improvement for incoming (missing) flows.

Remove LSNAT from outgoing search.
Slight speedup due to less comparsions in the loop.

MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D30574
2021-06-19 21:03:47 +02:00
Lutz Donnerhacke
d4ab07d2ae libalias: Restructure - Cleanup and Use for links
Factor out a common idiom to return found links.

MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D30573
2021-06-19 20:28:53 +02:00
Lutz Donnerhacke
d541903438 libalias: Restructure - Outgoing search
Factor out the outgoing search function.
Preparation for a new data structure.

MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D30572
2021-06-19 20:25:08 +02:00
Lutz Donnerhacke
19dcc4f225 libalias: Restructure - Cleanup _FindLinkIn
Simplify program flow in function _FindLinkIn.

MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D30571
2021-06-19 20:19:16 +02:00
Lutz Donnerhacke
cac129e603 libalias: Restructure - Table for partially links
Separate the partially specified links into a separate data structure.

This would causes a major parformance impact, if there are many of
them.  Use a (smaller) hash table to speed up the partially link
access.

MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D30570
2021-06-19 20:03:08 +02:00
Richard Scheffenegger
74d7fc8753 tcp: Add PRR cwnd reduction for non-SACK loss
This completes PRR cwnd reduction in all circumstances
for the base TCP stack (SACK loss recovery, ECN window reduction,
non-SACK loss recovery), preventing the arriving ACKs to
clock out new data at the old, too high rate. This
reduces the chance to induce additional losses while
recovering from loss (during congested network conditions).

For non-SACK loss recovery, each ACK is assumed to have
one MSS delivered. In order to prevent ACK-split attacks,
only one window worth of ACKs is considered to actually
have delivered new data.

MFC after: 6 weeks
Reviewed By: rrs, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D29441
2021-06-19 19:25:22 +02:00
Lutz Donnerhacke
32f9c2ceb3 libalias: Restructure - Separate fully qualified search
Search fully specified links first.  Some performance loss due to need
to revisit the db twice, if not found.

MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D30569
2021-06-19 19:21:05 +02:00
Lutz Donnerhacke
d41044ddfd libalias: Restructure - Common search terms
Factor out the common Out and In filter
Slightly better performance due to eager skip of search loop

MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D30568
2021-06-19 18:58:52 +02:00
Lutz Donnerhacke
ef828d39be libalias: Promote per instance global variable timeStamp
Summary:
- Use LibAliasTime as a real global variable for central timekeeping.
- Reduce number of syscalls in user space considerably.
- Dynamically adjust the packet counters to match the second resolution.
- Only check the first few packets after a time increase for expiry.

Discussed with:	hselasky
MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D30566
2021-06-19 18:25:44 +02:00
Lutz Donnerhacke
3fd20a79e7 libalias: Stats are unsigned
Stats counters are used as unsigned valued (i.e. printf("%u")) but are
defined as signed int.  This causes trouble later, so fix it early.

MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D30587
2021-06-19 18:21:17 +02:00
Mark Johnston
a100217489 Consistently use the SOCKBUF_MTX() and SOCK_MTX() macros
This makes it easier to change the socket locking protocols.  No
functional change intended.

MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-06-14 17:32:32 -04:00
Mark Johnston
f4bb1869dd Consistently use the SOLISTENING() macro
Some code was using it already, but in many places we were testing
SO_ACCEPTCONN directly.  As a small step towards fixing some bugs
involving synchronization with listen(2), make the kernel consistently
use SOLISTENING().  No functional change intended.

MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-06-14 17:32:27 -04:00
Michael Tuexen
f1536bb538 tcp: remove debug output from RACK
Reported by:		iron.udjin@gmail.com, Marek Zarychta
Reviewed by:		rrs
PR:			256538
MFC after:		3 days
Differential Revision:	https://reviews.freebsd.org/D30723
Sponsored by:		Netflix, Inc.
2021-06-11 20:23:39 +02:00
Randall Stewart
ba1b3e48f5 tcp: Missing mfree in rack and bbr
Recently (Nov) we added logic that protects against a peer negotiating a timestamp, and
then not including a timestamp. This involved in the input path doing a goto done_with_input
label. Now I suspect the code was cribbed from one in Rack that has to do with the SYN.
This had a bug, i.e. it should have a m_freem(m) before going to the label (bbr had this
missing m_freem() but rack did not). This then caused the missing m_freem to show
up in both BBR and Rack. Also looking at the code referencing m->m_pkthdr.lro_nsegs
later (after processing) is not a good idea, even though its only for logging. Best to
copy that off before any frees can take place.

Reviewed by: mtuexen
Sponsored by: Netflix Inc
Differential Revision:	https://reviews.freebsd.org/D30727
2021-06-11 11:38:08 -04:00
Michael Tuexen
fa3746be42 tcp: fix two bugs in new reno
* Completely initialise the CC module specific data
* Use beta_ecn in case of an ECN event whenever ABE is enabled
  or it is requested by the stack.

Reviewed by:		rscheff, rrs
MFC after:		3 days
Sponsored by:		Netflix, Inc.
2021-06-11 15:40:34 +02:00
Michael Tuexen
224cf7b35b tcp: fix compilation of IPv4-only builds
PR:			256538
Reported by:		iron.udjin@gmail.com
MFC after:		3 days
Sponsored by:		Netflix, Inc.
2021-06-11 09:50:46 +02:00
Lutz Donnerhacke
294799c6b0 libalias: tidy up housekeeping
Replace current expensive, but sparsly called housekeeping
by a single, repetive action.

This is part of a larger restructure of libalias in order to switch to
more efficient data structures.  The whole restructure process is
split into 15 reviews to ease reviewing.  All those steps will be
squashed into a single commit for MFC in order to hide the
intermediate states from production systems.

Reviewed by:	hselasky
MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D30277
2021-06-10 23:30:10 +02:00
Randall Stewart
67e892819b tcp: Mbuf leak while holding a socket buffer lock.
When running at NF the current Rack and BBR changes with the recent
commits from Richard that cause the socket buffer lock to be held over
the ip_output() call and then finally culminating in a call to tcp_handle_wakeup()
we get a lot of leaked mbufs. I don't think that this leak is actually caused
by holding the lock or what Richard has done, but is exposing some other
bug that has probably been lying dormant for a long time. I will continue to
look (using his changes) at what is going on to try to root cause out the issue.

In the meantime I can't leave the leaks out for everyone else. So this commit
will revert all of Richards changes and move both Rack and BBR back to just
doing the old sorwakeup_locked() calls after messing with the so_rcv buffer.

We may want to look at adding back in Richards changes after I have pinpointed
the root cause of the mbuf leak and fixed it.

Reviewed by: mtuexen,rscheff
Sponsored by: Netflix Inc
Differential Revision:	https://reviews.freebsd.org/D30704
2021-06-10 08:33:57 -04:00
Randall Stewart
b45daaea95 tcp: LRO timestamps have lost their previous precision
Recently we had a rewrite to tcp_lro.c that was tested but one subtle change
was the move to a less precise timestamp. This causes all kinds of chaos
in tcp's that do pacing and needs to be fixed to use the more precise
time that was there before.

Reviewed by: mtuexen, gallatin, hselasky
Sponsored by: Netflix Inc
Differential Revision:	https://reviews.freebsd.org/D30695
2021-06-09 13:58:54 -04:00
Randall Stewart
4747500dea tcp: A better fix for the previously attempted fix of the ack-war issue with tcp.
So it turns out that my fix before was not correct. It ended with us failing
some of the "improved" SYN tests, since we are not in the correct states.
With more digging I have figured out the root of the problem is that when
we receive a SYN|FIN the reassembly code made it so we create a segq entry
to hold the FIN. In the established state where we were not in order this
would be correct i.e. a 0 len with a FIN would need to be accepted. But
if you are in a front state we need to strip the FIN so we correctly handle
the ACK but ignore the FIN. This gets us into the proper states
and avoids the previous ack war.

I back out some of the previous changes but then add a new change
here in tcp_reass() that fixes the root cause of the issue. We still
leave the rack panic fixes in place however.

Reviewed by: mtuexen
Sponsored by: Netflix Inc
Differential Revision:	https://reviews.freebsd.org/D30627
2021-06-04 05:26:43 -04:00
Mark Johnston
f96603b56f tcp, udp: Permit binding with AF_UNSPEC if the address is INADDR_ANY
Prior to commit f161d294b we only checked the sockaddr length, but now
we verify the address family as well.  This breaks at least ttcp.  Relax
the check to avoid breaking compatibility too much: permit AF_UNSPEC if
the address is INADDR_ANY.

Fixes:		f161d294b
Reported by:	Bakul Shah <bakul@iitbombay.org>
Reviewed by:	tuexen
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D30539
2021-05-31 18:53:34 -04:00
Lutz Donnerhacke
bec0a5dca7 libalias: Remove LibAliasCheckNewLink
Finally drop the function in 14-CURRENT.

Discussed with: kp
Differential Revision: https://reviews.freebsd.org/D30275
2021-05-31 13:04:11 +02:00
Lutz Donnerhacke
bfd41ba1fe libalias: Remove unused function LibAliasCheckNewLink
The functionality to detect a newly created link after processing a
single packet is decoupled from the packet processing.  Every new
packet is processed asynchronously and will reset the indicator, hence
the function is unusable.  I made a Google search for third party code,
which uses the function, and failed to find one.

That's why the function should be removed: It unusable and unused.
A much simplified API/ABI will remain in anything below 14.

Discussed with:	kp
Reviewed by:	manpages (bcr)
MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D30275
2021-05-31 12:53:57 +02:00
Wojciech Macek
d40cd26a86 ip_mroute: rework ip_mroute
Approved by:     mw
Obtained from:   Semihalf
Sponsored by:    Stormshield
Differential Revision: https://reviews.freebsd.org/D30354

Changes:
1. add spinlock to bw_meter

If two contexts read and modify bw_meter values
it might happen that these are corrupted.
Guard only code fragments which do read-and-modify.
Context which only do "reads" are not done inside
spinlock block. The only sideffect that can happen is
an 1-p;acket outdated value reported back to userspace.

2. replace all locks with a single RWLOCK

Multiple locks caused a performance issue in routing
hot path, when two of them had to be taken. All locks
were replaced with single RWLOCK which makes the hot
path able to take only shared access to lock most of
the times.
All configuration routines have to take exclusive lock
(as it was done before) but these operation are very rare
compared to packet routing.

3. redesign MFC expire and UPCALL expire

Use generic kthread and cv_wait/cv_signal for deferring
work. Previously, upcalls could be sent from two contexts
which complicated the design. All upcall sending is now
done in a kthread which allows hot path to work more
efficient in some rare cases.

4. replace mutex-guarded linked list with lock free buf_ring

All message and data is now passed over lockless buf_ring.
This allowed to remove some heavy locking when linked
lists were used.
2021-05-31 05:48:15 +02:00
Lutz Donnerhacke
b03a41befe libalias: Fix nameing and initialization of a constant
The commit 189f8eea contains a refactorisation of a constant.  During
later review D30283 the naming of the constant was improved and the
initialization became explicit.  Put this into the tree, in order to
MFC the correct naming.
2021-05-30 15:47:29 +02:00
Randall Stewart
8c69d988a8 tcp: When we have an out-of-order FIN we do want to strip off the FIN bit.
The last set of commits fixed both a panic (in rack) and an ACK-war (in freebsd and bbr).
However there was a missing case, i.e. where we get an out-of-order FIN by itself.
In such a case we don't want to leave the FIN bit set, otherwise we will do the
wrong thing and ack the FIN incorrectly. Instead we need to go through the
tcp_reasm() code and that way the FIN will be stripped and all will be well.

Reviewed by: mtuexen,rscheff
Sponsored by: Netflix Inc
Differential Revision:	https://reviews.freebsd.org/D30497
2021-05-27 10:50:32 -04:00
Richard Scheffenegger
c358f1857f tcp: Use local CC data only in the correct context
Most CC algos do use local data, and when calling
newreno_cong_signal from there, the latter misinterprets
the data as its own struct, leading to incorrect behavior.

Reported by:  chengc_netapp.com
Reviewed By:  chengc_netapp.com, tuexen, #transport
MFC after:    3 days
Sponsored By: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D30470
2021-05-26 20:15:53 +02:00
Randall Stewart
4f3addd94b tcp: Add a socket option to rack so we can test various changes to the slop value in timers.
Timer_slop, in TCP, has been 200ms for a long time. This value dates back
a long time when delayed ack timers were longer and links were slower. A
200ms timer slop allows 1 MSS to be sent over a 60kbps link. Its possible that
lowering this value to something more in line with todays delayed ack values (40ms)
might improve TCP. This bit of code makes it so rack can, via a socket option,
adjust the timer slop.

Reviewed by: mtuexen
Sponsered by: Netflix Inc
Differential Revision:	https://reviews.freebsd.org/D30249
2021-05-26 06:43:30 -04:00
Andrew Gallatin
086a35562f tcp: enter network epoch when calling tfb_tcp_fb_fini
We need to enter the network epoch when calling into
tfb_tcp_fb_fini.  I noticed this when I hit an assert
running the latest rack

Differential Revision: https://reviews.freebsd.org/D30407
Reviewed by: rrs, tuexen
Sponsored by: Netflix
2021-05-25 13:45:37 -04:00
Randall Stewart
13c0e198ca tcp: Fix bugs related to the PUSH bit and rack and an ack war
Michaels testing with UDP tunneling found an issue with the push bit, which was only partly fixed
in the last commit. The problem is the left edge gets transmitted before the adjustments are done
to the send_map, this means that right edge bits must be considered to be added only if
the entire RSM is being retransmitted.

Now syzkaller also continued to find a crash, which Michael sent me the reproducer for. Turns
out that the reproducer on default (freebsd) stack made the stack get into an ack-war with itself.
After fixing the reference issues in rack the same ack-war was found in rack (and bbr). Basically
what happens is we go into the reassembly code and lose the FIN bit. The trick here is we
should not be going into the reassembly code if tlen == 0 i.e. the peer never sent you anything.
That then gets the proper action on the FIN bit but then you end up in LAST_ACK with no
timers running. This is because the usrclosed function gets called and the FIN's and such have
already been exchanged. So when we should be entering FIN_WAIT2 (or even FIN_WAIT1) we get
stuck in LAST_ACK. Fixing this means tweaking the usrclosed function so that we properly
recognize the condition and drop into FIN_WAIT2 where a timer will allow at least TP_MAXIDLE
before closing (to allow time for the peer to retransmit its FIN if the ack is lost). Setting the fast_finwait2
timer can speed this up in testing.

Reviewed by: mtuexen,rscheff
Sponsored by: Netflix Inc
Differential Revision:	https://reviews.freebsd.org/D30451
2021-05-25 13:23:31 -04:00