In the changes to get hystart++ into cubic an inadvertent line
was removed in the conditional to figure out if you need to exit
hystart++ back to slowstart. The line of course is the most crucial
one (the others are valid but not critical) i.e. is the new rtt
less than the point where we entered hystart++. Without the line
we end up bouncing in and out of CSS.
Reported By: Reese Enghardt
Sponsored By: Netflix Inc.
In the transport call on 12/3 Gleb asked to move the CC modules towards
using reference counting to prevent folks from unloading a module in use.
It was also agreed that Michael would do a user space utility like tcp_drop
that could be used to move all connections that are using a specific CC
to some other CC.
This is the half I committed to doing, making it so that we maintain a refcount
on a cc module every time a pcb refers to it and decrementing that every
time a pcb no longer uses a cc module. This also helps us simplify the
whole unloading process by getting rid of tcp_ccunload() which munged
through all the tcb's. Instead we mark a module as being removed and
prevent further references to it. We also make sure that if a module is
marked as being removed it cannot be made as the default and also
the opposite of that, if its a default it fails and does not mark it as being
removed.
Reviewed by: Michael Tuexen, Gleb Smirnoff
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D33249
As promised to the transport call on 11/4/22 here is an implementation
of hystart++ for cubic. It also cleans up the tcp_congestion function
to have a better name. Common variables are moved into the general
cc.h structure so that both cubic and newreno can use them for
hystart++
Reviewed by: Michael Tuexen, Richard Scheffenegger
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D33035
This reverts commit 266f97b5e9a7958e365e78288616a459b40d924a, reversing
changes made to a10253cffea84c0c980a36ba6776b00ed96c3e3b.
A mismerge of a merge to catch up to main resulted in files being
committed which should not have been.
I just discovered that the return of the EBUSY error was incorrectly
rigged so that you could unload a CC module that was set to default.
Its supposed to be an EBUSY error. Make it so.
Reviewed by: Michael Tuexen
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D33229
This causes new vnets to inherit the cc algorithm from vnet0. This is a
temporary patch to fix vnet jail creation.
With encouragement from: glebius
Fixes: b8d60729deef ("tcp: Congestion control cleanup.")
Differential Revision: https://reviews.freebsd.org/D32970
Define CC_NEWRENO in all the appropriate DEFAULTS and std.* config
files. It's the default congestion control algorithm. Add code to cc.c
so that CC_DEFAULT is "newreno" if it's not overriden in the config
file.
Sponsored by: Netflix
Fixes: b8d60729deef ("tcp: Congestion control cleanup.")
Revired by: manu, hselasky, jhb, glebius, tuexen
Differential Revision: https://reviews.freebsd.org/D32964
NOTE: HEADS UP read the note below if your kernel config is not including GENERIC!!
This patch does a bit of cleanup on TCP congestion control modules. There were some rather
interesting surprises that one could get i.e. where you use a socket option to change
from one CC (say cc_cubic) to another CC (say cc_vegas) and you could in theory get
a memory failure and end up on cc_newreno. This is not what one would expect. The
new code fixes this by requiring a cc_data_sz() function so we can malloc with M_WAITOK
and pass in to the init function preallocated memory. The CC init is expected in this
case *not* to fail but if it does and a module does break the
"no fail with memory given" contract we do fall back to the CC that was in place at the time.
This also fixes up a set of common newreno utilities that can be shared amongst other
CC modules instead of the other CC modules reaching into newreno and executing
what they think is a "common and understood" function. Lets put these functions in
cc.c and that way we have a common place that is easily findable by future developers or
bug fixers. This also allows newreno to evolve and grow support for its features i.e. ABE
and HYSTART++ without having to dance through hoops for other CC modules, instead
both newreno and the other modules just call into the common functions if they desire
that behavior or roll there own if that makes more sense.
Note: This commit changes the kernel configuration!! If you are not using GENERIC in
some form you must add a CC module option (one of CC_NEWRENO, CC_VEGAS, CC_CUBIC,
CC_CDG, CC_CHD, CC_DCTCP, CC_HTCP, CC_HD). You can have more than one defined
as well if you desire. Note that if you create a kernel configuration that does not
define a congestion control module and includes INET or INET6 the kernel compile will
break. Also you need to define a default, generic adds 'options CC_DEFAULT=\"newreno\"
but you can specify any string that represents the name of the CC module (same names
that show up in the CC module list under net.inet.tcp.cc). If you fail to add the
options CC_DEFAULT in your kernel configuration the kernel build will also break.
Reviewed by: Michael Tuexen
Sponsored by: Netflix Inc.
RELNOTES:YES
Differential Revision: https://reviews.freebsd.org/D32693
TCP Hystart draft version -03:
https://datatracker.ietf.org/doc/html/draft-ietf-tcpm-hystartplusplus
Is a new version of hystart that allows one to carefully exit slow start if the RTT
spikes too much. The newer version has a slower-slow-start so to speak that then
kicks in for five round trips. To see if you exited too early, if not into congestion avoidance.
This commit will add that feature to our newreno CC and add the needed bits in rack to
be able to enable it.
Reviewed by: tuexen
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D32373
* Completely initialise the CC module specific data
* Use beta_ecn in case of an ECN event whenever ABE is enabled
or it is requested by the stack.
Reviewed by: rscheff, rrs
MFC after: 3 days
Sponsored by: Netflix, Inc.
Most CC algos do use local data, and when calling
newreno_cong_signal from there, the latter misinterprets
the data as its own struct, leading to incorrect behavior.
Reported by: chengc_netapp.com
Reviewed By: chengc_netapp.com, tuexen, #transport
MFC after: 3 days
Sponsored By: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D30470
This fixes several breakages (panics) since the tcp_lro code was
committed that have been reported. Quite a few new features are
now in rack (prefecting of DGP -- Dynamic Goodput Pacing among the
largest). There is also support for ack-war prevention. Documents
comming soon on rack..
Sponsored by: Netflix
Reviewed by: rscheff, mtuexen
Differential Revision: https://reviews.freebsd.org/D30036
As full support of RFC6675 is in place, deprecating
net.inet.tcp.rfc6675_pipe and enabling by default
net.inet.tcp.sack.revised.
Reviewed By: #transport, kbowling, rrs
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D28702
This will pave the way of setting ssthresh differently in TCP CUBIC, according
to RFC8312 section 4.7.
No functional change, only code movement.
Submitted by: chengc_netapp.com
Reviewed by: rrs, tuexen, rscheff
MFC after: 2 weeks
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D26807
Adjust ssthresh in after_idle to the maximum of
the prior ssthresh, or 3/4 of the prior cwnd. See
RFC2861 section 2 for an in depth explanation for
the rationale around this.
As newreno is the default "fall-through" reaction,
most tcp variants will benefit from this.
Reviewed by: tuexen
MFC after: 2 weeks
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D22438
During the DCTCP improvements, use of CCF_DELACK was
removed. This change is just to rename the unused flag
bit to prevent use of it, without also re-implementing
the tcp_input and tcp_output interfaces.
No functional change.
Reviewed by: chengc_netapp.com, tuexen
MFC after: 2 weeks
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D26181
Since cubic calculates cwnd based on absolute
time, retaining RFC3465 (ABC) once-per-window updates
can lead to dramatic changes of cwnd in the convex
region. Updating cwnd for each incoming ack minimizes
this delta, preventing unintentional line-rate bursts.
Reviewed by: chengc_netapp.com, tuexen (mentor)
MFC after: 2 weeks
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D26060
The Cubic concave region was not aligned nicely for the very first exit from
slow start, where a 50% cwnd reduction is done instead of the normal 30%.
This addresses an issue, where a short line-rate burst could result from that
sudden jump of cwnd.
In addition, the Fast Convergence Heuristic has been expanded to work also
with ECN induced congestion response.
Submitted by: chengc_netapp.com
Reported by: chengc_netapp.com
Reviewed by: tuexen (mentor), rgrimes (mentor)
Approved by: tuexen (mentor), rgrimes (mentor)
MFC after: 3 weeks
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D25976
Initializing K to zero in D23655 introduced a miscalculation,
where cwnd would suddenly jump to cwnd_max instead of gradually
increasing, after leaving slow-start.
Properly calculating K instead of resetting it to zero resolves
this issue. Also making sure, that cwnd is recalculated at the
earliest opportunity once slow-start is over.
Reported by: chengc_netapp.com
Reviewed by: chengc_netapp.com, tuexen (mentor), rgrimes (mentor)
Approved by: tuexen (mentor), rgrimes (mentor)
MFC after: 3 weeks
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D25746
In cubic, the concave region is when snd_cwnd starts growing slower
towards max_cwnd (cwnd at the time of the congestion event), and
the convex region is when snd_cwnd starts to grow faster and
eventually appearing like slow-start like growth.
PR: 238478
Reviewed by: tuexen (mentor), rgrimes (mentor)
Approved by: tuexen (mentor), rgrimes (mentor)
MFC after: 2 weeks
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D24657
Without versioning information, using preexisting loader /
linker code is not easily possible when another module may
have dependencies on pre-loaded modules, and also doesn't
allow the automatic loading of dependent modules.
No functional change of the actual modules.
Reviewed by: tuexen (mentor), rgrimes (mentor)
Approved by: tuexen (mentor), rgrimes (mentor)
MFC after: 2 weeks
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D25744
Proper TCP Cubic operation requires the knowledge
of the maximum congestion window prior to the
last congestion event.
This restores and improves a bugfix previously added
by jtl@ but subsequently removed due to a revert.
Reported by: chengc_netapp.com
Reviewed by: chengc_netapp.com, tuexen (mentor)
Approved by: tuexen (mentor), rgrimes (mentor)
MFC after: 2 weeks
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D25133
Cubic calculates the new cwnd based on absolute time
elapsed since the start of an epoch. A cubic epoch is
started on congestion events, or once the congestion
avoidance phase is started, after slow-start has
completed.
When a sender is application limited for an extended
amount of time and subsequently a larger volume of data
becomes ready for sending, Cubic recalculates cwnd
with a lingering cubic epoch. This recalculation
of the cwnd can induce a massive increase in cwnd,
causing a burst of data to be sent at line rate by
the sender.
This adds a flag to reset the cubic epoch once a
session transitions from app-limited to cwnd-limited
to prevent the above effect.
Reviewed by: chengc_netapp.com, tuexen (mentor)
Approved by: tuexen (mentor), rgrimes (mentor)
MFC after: 3 weeks
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D25065
Introducing flags to track the initial Wmax dragging and exit
from slow-start in TCP Cubic. This prevents sudden jumps in the
caluclated cwnd by cubic, especially when the flow is application
limited during slow start (cwnd can not grow as fast as expected).
The downside is that cubic may remain slightly longer in the
concave region before starting the convex region beyond Wmax again.
Reviewed by: chengc_netapp.com, tuexen (mentor)
Approved by: tuexen (mentor), rgrimes (mentor, blanket)
MFC after: 3 weeks
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D23655
In mixed ECN marking and loss scenarios it was found, that
the alpha value of DCTCP is updated two times. The second
update happens with freshly initialized counters indicating
to ECN loss. Overall this leads to alpha not adjusting as
quickly as expected to ECN markings, and therefore lead to
excessive loss.
Reported by: Cheng Cui
Reviewed by: chengc_netapp.com, rrs, tuexen (mentor)
Approved by: tuexen (mentor)
MFC after: 2 weeks
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D24817
Running TCP Cubic together with ECN could end up reducing cwnd down to 1 byte, if the
receiver continously sets the ECE flag, resulting in very poor transmission speeds.
In line with RFC6582 App. B, a lower bound of 2 MSS is introduced, as well as a typecast
to prevent any potential integer overflows during intermediate calculation steps of the
adjusted cwnd.
Reported by: Cheng Cui
Reviewed by: tuexen (mentor)
Approved by: tuexen (mentor)
MFC after: 2 weeks
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D23353
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.
This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.
Mark all obvious cases as MPSAFE. All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT
Approved by: kib (mentor, blanket)
Commented by: kib, gallatin, melifaro
Differential Revision: https://reviews.freebsd.org/D23718
and not only for the DCTCP congestion control.
Submitted by: Richard Scheffenegger
Reviewed by: rgrimes, tuexen@, Cheng Cui
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D23119
This makes it possible to retrieve per-connection statistical
information such as the receive window size, RTT, or goodput,
using a newly added TCP_STATS getsockopt(3) option, and extract
them using the stats_voistat_fetch(3) API.
See the net/tcprtt port for an example consumer of this API.
Compared to the existing TCP_INFO system, the main differences
are that this mechanism is easy to extend without breaking ABI,
and provides statistical information instead of raw "snapshots"
of values at a given point in time. stats(3) is more generic
and can be used in both userland and the kernel.
Reviewed by: thj
Tested by: thj
Obtained from: Netflix
Relnotes: yes
Sponsored by: Klara Inc, Netflix
Differential Revision: https://reviews.freebsd.org/D20655
This allows adding more ECN related flags in the future.
No functional change intended.
Submitted by: Richard Scheffenegger
Reviewed by: rrs@, tuexen@
Differential Revision: https://reviews.freebsd.org/D22497
The adjustments are inspired by the Linux stack, which has had a
functionally equivalent implementation for more than a decade now.
Submitted by: Richard Scheffenegger
Reviewed by: Cheng Cui
Differential Revision: https://reviews.freebsd.org/D18982
This patch addresses a very common case of frequent application stalls,
where TCP runs idle and looses the state of the network.
Submitted by: Richard Scheffenegger
Reviewed by: Cheng Cui
Differential Revision: https://reviews.freebsd.org/D18954
to the higher of the previous ssthresh or 3/4 of the prior cwnd.
Submitted by: Richard Scheffenegger
Reviewed by: Cheng Cui
Differential Revision: https://reviews.freebsd.org/D18982
DCTCP specific methods. Also fallthrough NewReno for non ECN capable
TCP connections and improve the integer arithmetic.
Obtained from: Richard Scheffenegger
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D20550
* Initialize the alpha parameter to a conservative value (like Linux)
* Improve handling of arithmetic.
* Improve man-page
Obtained from: Richard Scheffenegger
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D20549