Commit Graph

7278 Commits

Author SHA1 Message Date
Mike Karels
04cd74b4cd IPv4 multicast: fix netstat -g
The vif structure includes fields at the end which are #ifdef KERNEL,
causing a mismatch between the structure sizes between kernel and
user level.  netstat -g failed with an ENOMEM on the sysctl to fetch
the vif table.  Change the vif sysctl code in ip_mroute to copy out
only the user-level-visible portion of each table entry.

Reviewed by:	bz, wma
Differential Revision: https://reviews.freebsd.org/D34627
2022-03-22 07:38:01 -05:00
Mike Karels
2cf1e120c6 Enter epoch when addding IPv4 multicast forwarding cache entry
The code path from the IPv4 multicast setsockopt could call ip_output()
without entering an epoch.  Specifically, the MRT_ADD_MFC setbsocopt
would call add_mfc(), which in turn called ip_mdq() to send queued
packets.  This resulted in an epoch assert failure in ip_output().
Enter an epoch in add_mfc(), and add some epoch asserts to check
for similar failures.

Reviewed by:	kp, bz, wma, cy
Differential Revision: https://reviews.freebsd.org/D34624
2022-03-22 07:28:57 -05:00
Mark Johnston
9f70c04da4 rip: Fix a -Wunused-but-set-variable warning
Fixes:		81728a538d ("Split rtinit() into multiple functions.")
Reviewed by:	imp, melifaro
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D34395
2022-03-01 09:39:43 -05:00
Richard Scheffenegger
2ff07d9220 tcp: Restore correct ECT marking behavior on SACK retransmissions
While coalescing all ECN-related code into new common source files,
the flag to deal with SACK retransmissions was skipped. This leads
to non-compliant ECT-marking of SACK retransmissions, as well as
the premature sending of other TCP ECN flags (CWR).

Reviewed By: rrs, #transport
Sponsored by:        NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D34376
2022-02-25 20:05:32 +01:00
Randall Stewart
a43b0aca12 tcp: Push bit failure to set in fastpath
Recently changes were made to the tcp stack to use a macro/function
to set tcp flags. In the process the PUSH bit setting in the fastpath of
rack was broken. This fixes that as well as cleans up a warning that
is occurring when you don't have INVARIANT on (inp used in KASSERT).

We can use the tcp test suite to find this bug the test plan shows the script
that fails due to the missing push bit

Reviewed by: rscheff, tuexen
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D34332
2022-02-23 16:25:56 -05:00
Randall Stewart
ea9017fb25 tcp: Congestion control move to using reference counting.
In the transport call on 12/3 Gleb asked to move the CC modules towards
using reference counting to prevent folks from unloading a module in use.
It was also agreed that Michael would do a user space utility like tcp_drop
that could be used to move all connections that are using a specific CC
to some other CC.

This is the half I committed to doing, making it so that we maintain a refcount
on a cc module every time a pcb refers to it and decrementing that every
time a pcb no longer uses a cc module. This also helps us simplify the
whole unloading process by getting rid of tcp_ccunload() which munged
through all the tcb's. Instead we mark a module as being removed and
prevent further references to it. We also make sure that if a module is
marked as being removed it cannot be made as the default and also
the opposite of that, if its a default it fails and does not mark it as being
removed.

Reviewed by: Michael Tuexen, Gleb Smirnoff
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D33249
2022-02-21 06:30:17 -05:00
Michael Tuexen
bdb99f6f5e sctp: remove KASSERT() which not always holds
Reported by:	syzbot+c907045aed2043011f3c@syzkaller.appspotmail.com
MFC after:	3 days
2022-02-20 15:59:21 +01:00
Michael Tuexen
e255f0c9fb sctp: make sure new locking requirements are satisfied.
Reported by:	syzbot+cd3c1dd64861b8c200bd@syzkaller.appspotmail.com
MFC after:	3 days
2022-02-20 15:36:26 +01:00
Michael Tuexen
2f0656fb9b sctp: don't hold the assoc create lock longer than needed
Reported by:	syzbot+c738e3df67cf425c49a2@syzkaller.appspotmail.com
MFC after:	3 days
2022-02-20 14:55:41 +01:00
Michael Tuexen
a4a31271cc sctp: cleanup sctp_lower_sosend
This is a preparation for retiring the tcp send lock in the
next step.

MFC after:	3 days
2022-02-20 01:09:30 +01:00
Michael Tuexen
fd0d53f85c sctp: improve robustness
MFC after:	3 days
2022-02-18 14:30:07 +01:00
Michael Tuexen
274a0e4a8d sctp: cleanup, no functional change intended.
MFC after:	3 days
2022-02-18 14:20:01 +01:00
Michael Tuexen
3ca204c97a sctp: remove unused parameter
MFC after:	3 days
2022-02-18 12:20:44 +01:00
Michael Tuexen
11c4d4b966 sctp: fix a signed/unsigned mismatch.
MFC after:	3 days
2022-02-17 22:45:57 +01:00
Michael Tuexen
76e03cc940 sctp: avoid undefined behaviour and cleanup the code.
MFC after:	3 days
2022-02-17 19:23:59 +01:00
Kristof Provost
995cba5a0c netinet: allow UDP tunnels to be removed
udp_set_kernel_tunneling() rejects new callbacks if one is already set.
Allow callbacks to be cleared. The use case for this is OpenVPN DCO,
where the socket is opened by userspace and then adopted by the kernel
to run the tunnel. If the DCO interface is removed but userspace does
not close the socket (something the kernel cannot prevent) the installed
callbacks could be called with an invalidated context.

Allow new functions to be set, but only if they're NULL (i.e. allow the
callback functions to be cleared).

Reviewed by:	tuexen
MFC after:	3 weeks
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D34288
2022-02-16 10:59:04 +01:00
Richard Scheffenegger
0c2832ee4f tcp: Restore 6 tcps padding entries in HEAD
The padding in CURRENT shall not shrink. It is
designed that in CURRENT at always stays
the same, and then when a new stable is branched, it
inherits 6 pointer placeholders that can be used
withing this stable/X lifetime to extend the structure.

Reviewed By: tuexen, #transport
Sponsored by:        NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D34269
2022-02-15 09:24:07 +01:00
Bjoern A. Zeeb
232d323ef2 TCP syncache: enhance KASSERT output
Improve the "syncache: mbuf too small" assertion message with various
variables (some not actually needed) but enough that it will be obvious
if (a) we use IPv4 or IPv6, (b) if UDP tunneling is on, (c) what
max_linkhdr is, and (d) what MHLEN is.

This should help diagnostics in the future.
The case was hit with wireless drivers setting a large ic_headroom
and using IPv6.

Reviewed by:	gallatin, tuexen, rscheff
MFC after:	3 days
Differential Revision: https://reviews.freebsd.org/D34217
2022-02-14 00:03:20 +00:00
Mark Johnston
b4f60fab5d tcp: Avoid conditionally defined fields in union lro_address
The layout of the structure ends up depending on whether the including
file includes opt_inet.h and opt_inet6.h, so different compilation units
can end up seeing different versions of the structure.  Fix this by
unconditionally defining the address fields.

As a side effect, this eliminates some duplication in the kernel's CTF
type graph.

Reviewed by:	rscheff, tuexen
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D34242
2022-02-10 15:39:58 -05:00
Richard Scheffenegger
3f169c54ab tcp: Add/update AccECN related statistics and numbers
Reserve couters in the tcps struct in preparation
for AccECN, extend the debugging output for TF2
flags, optimize the syncache flags from individual
bits to a codepoint for the specifc ECN handshake.

This is in preparation of AccECN.

No functional chance except for extended debug
output capabilities.

Reviewed By: #transport, rrs
Sponsored by:        NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D34161
2022-02-10 00:21:31 +01:00
Randall Stewart
cc41c17433 opps my patch lost the removal of the tlp_threshold counter increments 2022-02-09 16:19:22 -05:00
Randall Stewart
8d64b4b4c4 cleanup of rack variables.
During a recent deep dive into all the variables so I could
discover why stack switching caused larger retransmits I examined
every variable in rack. In the process I found quite a few bits
that were not used and needed cleanup. This update pulls
out all the unused pieces from rack. Note there are *no* functional
changes here, just the removal of unused variables and a bit of
spacing clean up.

Reviewed by: Michael Tuexen, Richard Scheffenegger
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D34205
2022-02-09 16:08:32 -05:00
Michael Tuexen
a0aeb1cef5 in_pcb.c: fix compilation of an IPv4 only configuration
While there, remove a duplicate inclusion of sysctl.h.

Reported by:	Gary Jennejohn
Fixes:		a35bdd4489 - main - tcp: add sysctl interface for setting socket options
Sponsored by:	Netflix, Inc.
2022-02-09 19:58:29 +01:00
Michael Tuexen
a35bdd4489 tcp: add sysctl interface for setting socket options
This interface allows to set a socket option on a TCP endpoint,
which is specified by its inp_gencnt. This interface will be
used in an upcoming command line tool tcpsso.

Reviewed by:		glebius, rrs
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D34138
2022-02-09 12:24:41 +01:00
Michael Tuexen
528c764924 tcp: fix compliation when KERN_TLS is not defined
Reported by:	Gary Jennejohn
Fixes:		fd7daa7271 - main - tcp: make tcp_ctloutput_set() non-static
Sponsored by:	Netflix, Inc.
2022-02-09 12:16:43 +01:00
Michael Tuexen
fd7daa7271 tcp: make tcp_ctloutput_set() non-static
tcp_ctloutput_set() will be used via the sysctl interface in a
upcoming command line tool tcpsso.

Reviewed by:		glebius, rscheff
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D34164
2022-02-08 18:49:44 +01:00
Franco Fichtner
47ded797ce netinet: simplify RSS ifdef statements
Approved by:	transport (rrs)
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D31583
2022-02-07 19:22:03 -07:00
Richard Scheffenegger
ab001fcdf2 tcp: Apply tcp flags after ECN processing in rack_fast_output()
Missed to move the tcp_set_flags() past ECN processing
in rack_fast_output() earlier.

Reviewed By: rrs, #transport
Sponsored by:        NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D34180
2022-02-07 03:28:27 +01:00
Randall Stewart
a9696510f5 tcp: Add hystart++ to our cubic implementation.
As promised to the transport call on 11/4/22 here is an implementation
of hystart++ for cubic. It also cleans up the tcp_congestion function
to have a better name. Common variables are moved into the general
cc.h structure so that both cubic and newreno can use them for
hystart++

Reviewed by: Michael Tuexen, Richard Scheffenegger
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D33035
2022-02-07 06:37:46 -05:00
Richard Scheffenegger
1790549d80 tcp: use TCPSTAT_INC in kernel ecn functions
Incorrectly used KMOD_ marco in static kernel ECN functions.

Both eventually resolve to counter_s64_add(), but better
use the correct macros.

Reviewed By: tuexen, #transport
Sponsored by:        NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D34181
2022-02-05 16:55:22 +01:00
Richard Scheffenegger
f7220c486c tcp: move ECN handling code to a common file
Reduce the burden to maintain correct and
extensible ECN related code across multiple
stacks and codepaths.

Formally no functional change.

Incidentially this establishes correct
ECN operation in one instance.

Reviewed By: rrs, #transport
Sponsored by:        NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D34162
2022-02-05 15:04:42 +01:00
Richard Scheffenegger
7994ef3c39 Revert "tcp: move ECN handling code to a common file"
This reverts commit 0c424c90ea.
2022-02-05 01:07:51 +01:00
Richard Scheffenegger
0c424c90ea tcp: move ECN handling code to a common file
Reduce the burden to maintain correct and
extensible ECN related code across multiple
stacks and codepaths.

Formally no functional change.

Incidentially this establishes correct
ECN operation in one instance.

Reviewed By: rrs, #transport
Sponsored by:        NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D34162
2022-02-04 22:54:41 +01:00
Sylvian Meygret
cd7306bb1f ip_mroute: split mrouter interface deactivation and if_free
Move if_free outside MRW_LOCK. This will silence LOR message
which might appere during deinitialization.
2022-02-04 10:25:07 +01:00
Richard Scheffenegger
fd723975ec tcp: fix typo in commit f026275e26
missed one bitmask inversion while committing D34148

Differential Revision: https://reviews.freebsd.org/D34148
Differential Revision: https://reviews.freebsd.org/D34160
2022-02-03 21:05:09 +01:00
Richard Scheffenegger
3b0ee68050 tcp: Prevent setting of ECN bits with setsockopt()
setsockopt() grants full access to the deprecated
TOS byte. For TCP, mask out the ECN codepoint, so that
only the DSCP portion can be adjusted.

Reviewed By: tuexen, hselasky, #manpages, #transport, debdrup
Sponsored by:        NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D34154
2022-02-03 20:06:42 +01:00
Richard Scheffenegger
f026275e26 tcp: set IP ECN header codepoint properly
TCP RACK can cache the IP header while preparing
a new TCP packet for transmission. Thus all the
IP ECN codepoint bits need to be assigned, without
assuming a clear field beforehand.

Reviewed By: tuexen, kbowling, #transport
MFC after:   3 days
Sponsored by:        NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D34148
2022-02-03 16:53:41 +01:00
Richard Scheffenegger
1ebf460758 tcp: Access all 12 TCP header flags via inline function
In order to consistently provide access to all
(including reserved) TCP header flag bits,
use an accessor function tcp_get_flags and
tcp_set_flags. Also expand any flag variable from
uint8_t / char to uint16_t.

Reviewed By: hselasky, tuexen, glebius, #transport
Sponsored by:        NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D34130
2022-02-03 16:21:58 +01:00
Michael Tuexen
d51c80351f rack: fix compilation and small cleanup
Fix a function prototype missed in the last commit and whitespace
change.
Sponsored by:	Netflix, Inc.
2022-02-02 09:41:40 +01:00
Michael Tuexen
3b3c08c135 tcp: cleanup functions related to socket option handling
Consistently only pass the inp and the sopt around. Don't pass the
so around, since in a upcoming commit tcp_ctloutput_set() will be
called from a context different from setsockopt(). Also expect
the inp to be locked when calling tcp_ctloutput_[gs]et(), this is
also required for the upcoming use by tcpsso, a command line tool
to set socket options.
Reviewed by:		glebius, rscheff
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D34151
2022-02-02 09:27:59 +01:00
Wojciech Macek
77223d98b6 ip_mroute: refactor epoch-basd locking
Remove duplicated epoch_enter and epoch_exit in IP inp/outp routines.
Remove unnecessary macros as well.

Obtained from:		Semihalf
Spponsored by:		Stormshield
Reviewed by:		glebius
Differential revision:	https://reviews.freebsd.org/D34030
2022-02-02 06:48:05 +01:00
Richard Scheffenegger
93e28d6e89 tcp: LRO code to deal with all 12 TCP header flags
TCP per RFC793 has 4 reserved flag bits for future use. One
of those bits may be used for Accurate ECN.
This patch is to include these bits in the LRO code to ease
the extensibility if/when these bits are used.

Reviewed By: hselasky, rrs, #transport
Sponsored by:        NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D34127
2022-02-01 18:41:36 +01:00
John Baldwin
d782385e9b tcp_ratelimit: Handle some edge cases with TLS + RL send tags.
- After a connection has fallen back from NIC TLS to SW TLS, any
  pacing rate changes should modify the inpcb send tag even though
  SB_TLS_IFNET is set.

- If a connection tries to modify the pacing rate before the send
  tag has been converted from plain TLS to TLS + RL, don't fail
  the rate request set but let it fall through to setting the rate
  on the non-TLS inpcb RL tag.

Reviewed by:	gallatin, rrs, hselasky
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D34085
2022-01-31 16:40:04 -08:00
Gordon Bergling
4bd030b369 sctp(4): Fix a typo in an INVARIANTS panic message
- s/failes/fails/

MFC after:	1 week
2022-01-28 13:20:52 +01:00
Richard Scheffenegger
4531b3450b tcp: Tidying up the conditionals for unwinding a spurious RTO
- Use the semantically correct TSTMP_xx macro when comparing
  timestamps. (No functional change)
- check for bad retransmits only when TSopt is present in ACK
  (don't assume there will be a valid TSopt in the TCP options struct)
- exclude tsecr == 0, since that most likely indicates an
  invalid ts echo return (tsecr) value.

Reviewed By: tuexen, #transport
MFC after:   3 days
Sponsored by:        NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D34062
2022-01-27 18:59:55 +01:00
Richard Scheffenegger
68e623c3f0 tcp: Rewind erraneous RTO only while performing RTO retransmissions
Under rare circumstances, a spurious retranmission is
incorrectly detected and rewound, messing up various tcpcb values,
which can lead to a panic when SACK is in use.

Reviewed By: tuexen, chengc_netapp.com, #transport
MFC after:   3 days
Sponsored by:        NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D33979
2022-01-27 18:49:42 +01:00
Andrew Gallatin
8a7404b2ae tcp: fix leaks in tcp_chg_pacing_rate error paths
tcp_chg_pacing_rate() is expected to release the hw rate limit table,
but failed to do so in several error cases, leading to ever
increasing counts of flows using the rate.

This patch was mostly done by rrs

Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D34058
Reviewed by: hselasky, rrs,  jhb (inital version, outside of Differential)
2022-01-27 10:35:03 -05:00
Andrew Gallatin
9ba117960e Fix a memory leak when ip_output_send() returns EAGAIN due to send tag issues
When ip_output_send() returns EAGAIN due to issues with send tags (route
change, lagg failover, etc), it must free the mbuf. This is because
ip_output_send() was written as a wrapper/replacement for a direct
call to  if_output(), and the contract with if_output() has
historically been that it owns the mbufs once called. When
ip_output_send() failed to free mbufs, it violated this assumption
and lead to leaked mbufs.

This was noticed when using NIC TLS in combination with hardware
rate-limited connections. When seeing lots of NIC output drops
triggered ratelimit send tag changes, we noticed we were leaking
ktls_sessions, send tags and mbufs. This was due ip_output_send()
leaking mbufs which held references to ktls_sessions, which in
turn held references to send tags.

Many thanks to jbh, rrs, hselasky and markj for their help in
debugging this.

Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D34054
Reviewed by: hselasky, jhb, rrs
MFC after: 2 weeks
2022-01-27 10:34:34 -05:00
Gordon Bergling
9e58cca3e8 extra_tcp_stacks: Fix two typos in source code comments
- s/differnt/different/

MFC after;	3 days
2022-01-26 18:02:55 +01:00
Gordon Bergling
b3df222eae extra_tcp_stacks: Fix a few common typos
TCP_BBR:
- Fix a typo introducted in 1b90dfa5d2, which was reported by tuexen@

TCP_RACK:
- Correct two sysctl descriptions: s/corret/correct/

tcp_bbr(4): Also fix s/measurment/measurement/ in the man page

MFC after:	1 week
2022-01-26 10:35:17 +01:00