Recently changes were made to the tcp stack to use a macro/function
to set tcp flags. In the process the PUSH bit setting in the fastpath of
rack was broken. This fixes that as well as cleans up a warning that
is occurring when you don't have INVARIANT on (inp used in KASSERT).
We can use the tcp test suite to find this bug the test plan shows the script
that fails due to the missing push bit
Reviewed by: rscheff, tuexen
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D34332
In the transport call on 12/3 Gleb asked to move the CC modules towards
using reference counting to prevent folks from unloading a module in use.
It was also agreed that Michael would do a user space utility like tcp_drop
that could be used to move all connections that are using a specific CC
to some other CC.
This is the half I committed to doing, making it so that we maintain a refcount
on a cc module every time a pcb refers to it and decrementing that every
time a pcb no longer uses a cc module. This also helps us simplify the
whole unloading process by getting rid of tcp_ccunload() which munged
through all the tcb's. Instead we mark a module as being removed and
prevent further references to it. We also make sure that if a module is
marked as being removed it cannot be made as the default and also
the opposite of that, if its a default it fails and does not mark it as being
removed.
Reviewed by: Michael Tuexen, Gleb Smirnoff
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D33249
udp_set_kernel_tunneling() rejects new callbacks if one is already set.
Allow callbacks to be cleared. The use case for this is OpenVPN DCO,
where the socket is opened by userspace and then adopted by the kernel
to run the tunnel. If the DCO interface is removed but userspace does
not close the socket (something the kernel cannot prevent) the installed
callbacks could be called with an invalidated context.
Allow new functions to be set, but only if they're NULL (i.e. allow the
callback functions to be cleared).
Reviewed by: tuexen
MFC after: 3 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D34288
The padding in CURRENT shall not shrink. It is
designed that in CURRENT at always stays
the same, and then when a new stable is branched, it
inherits 6 pointer placeholders that can be used
withing this stable/X lifetime to extend the structure.
Reviewed By: tuexen, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D34269
Improve the "syncache: mbuf too small" assertion message with various
variables (some not actually needed) but enough that it will be obvious
if (a) we use IPv4 or IPv6, (b) if UDP tunneling is on, (c) what
max_linkhdr is, and (d) what MHLEN is.
This should help diagnostics in the future.
The case was hit with wireless drivers setting a large ic_headroom
and using IPv6.
Reviewed by: gallatin, tuexen, rscheff
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D34217
The layout of the structure ends up depending on whether the including
file includes opt_inet.h and opt_inet6.h, so different compilation units
can end up seeing different versions of the structure. Fix this by
unconditionally defining the address fields.
As a side effect, this eliminates some duplication in the kernel's CTF
type graph.
Reviewed by: rscheff, tuexen
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34242
Reserve couters in the tcps struct in preparation
for AccECN, extend the debugging output for TF2
flags, optimize the syncache flags from individual
bits to a codepoint for the specifc ECN handshake.
This is in preparation of AccECN.
No functional chance except for extended debug
output capabilities.
Reviewed By: #transport, rrs
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D34161
During a recent deep dive into all the variables so I could
discover why stack switching caused larger retransmits I examined
every variable in rack. In the process I found quite a few bits
that were not used and needed cleanup. This update pulls
out all the unused pieces from rack. Note there are *no* functional
changes here, just the removal of unused variables and a bit of
spacing clean up.
Reviewed by: Michael Tuexen, Richard Scheffenegger
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D34205
While there, remove a duplicate inclusion of sysctl.h.
Reported by: Gary Jennejohn
Fixes: a35bdd4489 - main - tcp: add sysctl interface for setting socket options
Sponsored by: Netflix, Inc.
This interface allows to set a socket option on a TCP endpoint,
which is specified by its inp_gencnt. This interface will be
used in an upcoming command line tool tcpsso.
Reviewed by: glebius, rrs
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D34138
tcp_ctloutput_set() will be used via the sysctl interface in a
upcoming command line tool tcpsso.
Reviewed by: glebius, rscheff
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D34164
Missed to move the tcp_set_flags() past ECN processing
in rack_fast_output() earlier.
Reviewed By: rrs, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D34180
As promised to the transport call on 11/4/22 here is an implementation
of hystart++ for cubic. It also cleans up the tcp_congestion function
to have a better name. Common variables are moved into the general
cc.h structure so that both cubic and newreno can use them for
hystart++
Reviewed by: Michael Tuexen, Richard Scheffenegger
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D33035
Incorrectly used KMOD_ marco in static kernel ECN functions.
Both eventually resolve to counter_s64_add(), but better
use the correct macros.
Reviewed By: tuexen, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D34181
Reduce the burden to maintain correct and
extensible ECN related code across multiple
stacks and codepaths.
Formally no functional change.
Incidentially this establishes correct
ECN operation in one instance.
Reviewed By: rrs, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D34162
Reduce the burden to maintain correct and
extensible ECN related code across multiple
stacks and codepaths.
Formally no functional change.
Incidentially this establishes correct
ECN operation in one instance.
Reviewed By: rrs, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D34162
setsockopt() grants full access to the deprecated
TOS byte. For TCP, mask out the ECN codepoint, so that
only the DSCP portion can be adjusted.
Reviewed By: tuexen, hselasky, #manpages, #transport, debdrup
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D34154
TCP RACK can cache the IP header while preparing
a new TCP packet for transmission. Thus all the
IP ECN codepoint bits need to be assigned, without
assuming a clear field beforehand.
Reviewed By: tuexen, kbowling, #transport
MFC after: 3 days
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D34148
In order to consistently provide access to all
(including reserved) TCP header flag bits,
use an accessor function tcp_get_flags and
tcp_set_flags. Also expand any flag variable from
uint8_t / char to uint16_t.
Reviewed By: hselasky, tuexen, glebius, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D34130
Consistently only pass the inp and the sopt around. Don't pass the
so around, since in a upcoming commit tcp_ctloutput_set() will be
called from a context different from setsockopt(). Also expect
the inp to be locked when calling tcp_ctloutput_[gs]et(), this is
also required for the upcoming use by tcpsso, a command line tool
to set socket options.
Reviewed by: glebius, rscheff
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D34151
TCP per RFC793 has 4 reserved flag bits for future use. One
of those bits may be used for Accurate ECN.
This patch is to include these bits in the LRO code to ease
the extensibility if/when these bits are used.
Reviewed By: hselasky, rrs, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D34127
- After a connection has fallen back from NIC TLS to SW TLS, any
pacing rate changes should modify the inpcb send tag even though
SB_TLS_IFNET is set.
- If a connection tries to modify the pacing rate before the send
tag has been converted from plain TLS to TLS + RL, don't fail
the rate request set but let it fall through to setting the rate
on the non-TLS inpcb RL tag.
Reviewed by: gallatin, rrs, hselasky
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D34085
- Use the semantically correct TSTMP_xx macro when comparing
timestamps. (No functional change)
- check for bad retransmits only when TSopt is present in ACK
(don't assume there will be a valid TSopt in the TCP options struct)
- exclude tsecr == 0, since that most likely indicates an
invalid ts echo return (tsecr) value.
Reviewed By: tuexen, #transport
MFC after: 3 days
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D34062
Under rare circumstances, a spurious retranmission is
incorrectly detected and rewound, messing up various tcpcb values,
which can lead to a panic when SACK is in use.
Reviewed By: tuexen, chengc_netapp.com, #transport
MFC after: 3 days
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D33979
tcp_chg_pacing_rate() is expected to release the hw rate limit table,
but failed to do so in several error cases, leading to ever
increasing counts of flows using the rate.
This patch was mostly done by rrs
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D34058
Reviewed by: hselasky, rrs, jhb (inital version, outside of Differential)
When ip_output_send() returns EAGAIN due to issues with send tags (route
change, lagg failover, etc), it must free the mbuf. This is because
ip_output_send() was written as a wrapper/replacement for a direct
call to if_output(), and the contract with if_output() has
historically been that it owns the mbufs once called. When
ip_output_send() failed to free mbufs, it violated this assumption
and lead to leaked mbufs.
This was noticed when using NIC TLS in combination with hardware
rate-limited connections. When seeing lots of NIC output drops
triggered ratelimit send tag changes, we noticed we were leaking
ktls_sessions, send tags and mbufs. This was due ip_output_send()
leaking mbufs which held references to ktls_sessions, which in
turn held references to send tags.
Many thanks to jbh, rrs, hselasky and markj for their help in
debugging this.
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D34054
Reviewed by: hselasky, jhb, rrs
MFC after: 2 weeks
TCP_BBR:
- Fix a typo introducted in 1b90dfa5d2, which was reported by tuexen@
TCP_RACK:
- Correct two sysctl descriptions: s/corret/correct/
tcp_bbr(4): Also fix s/measurment/measurement/ in the man page
MFC after: 1 week
X_ip_mrouter_done might sleep, which triggers INVARIANTS to
print additional errors on the screen.
Move it outside the lock, but provide some basic synchronization
to avoid race condition during module uninit/unload.
Obtained from: Semihalf
Sponsored by: Stormshield