the RTT that a flow will build up in buffers in
transit. It is a slight modification to RFC2581
but is more friendly i.e. less aggressive.
MFC after: 3 months
1) Add four new points that allow you to get more information
to cc algo's
2) Fix the case where user changes module on a existing TCB, in
such a case, the initialization module needs to be called on all nets.
3) Move htcp_cc structure to a union that other modules can use.
4) Add 5th point for get/set socket options for cc_module specific options
MFC after: 2 months
VNET socket push back:
try to minimize the number of places where we have to switch vnets
and narrow down the time we stay switched. Add assertions to the
socket code to catch possibly unset vnets as seen in r204147.
While this reduces the number of vnet recursion in some places like
NFS, POSIX local sockets and some netgraph, .. recursions are
impossible to fix.
The current expectations are documented at the beginning of
uipc_socket.c along with the other information there.
Sponsored by: The FreeBSD Foundation
Sponsored by: CK Software GmbH
Reviewed by: jhb
Tested by: zec
Tested by: Mikolaj Golub (to.my.trociny gmail.com)
MFC after: 2 weeks
at the Univ-of-Del. Basically when a 1-to-1 socket did a
socket/bind/send(data)/close. If the timing was right
we would dereference a socket that is NULL.
MFC after: 1 month
* Store the flowid when receiving an SCTP/IPv6 packet.
* Store the flowid when receiving an SCTP packet with wrong CRC.
* Initilize flowid correctly.
* Put test code under INVARIANTS.
MFC after: 3 months.
In the dec.2009 rewrite I introduced a bug, using for the
computation the arrival time instead of the time the packet
has exited from the queue.
The bandwidth computation was still correct because it is
computed elsewhere, but traffic was sent out in bursts.
The bug is also present in RELENG_8 after dec.2009
Thanks to Daikichi Osuga for investingating, finding and fixing the
bug with detailed graphs of the behaviour before and after the fix.
Submitted by: Daikichi Osuga
MFC after: 2 weeks
1) They don't use the giant "MAX_CPU" define and instead
are allocated dynamically based on mp_ncpus
2) Will zero with the netstat -z -s -p sctp
3) Will be properly handled by both the sctp_init and finish
(the multi-net stuff was incorrectly bzero'ing in sctp_init
the wrong size.. the bzero is now moved to the right places).
And of course the free is put in at the very end.
MFC after: 3 Months
threads. These serve as input threads and are queued
packets based on the V-tag number. This is similar to
what a modern card can do with queue's for TCP... but
alas modern cards know nothing about SCTP.
MFC after: 3 months (maybe)
2) Add separate max-bursts for retransmit and hb. These
are set to sysctlable values but not settable via the
socket api. This makes sure we don't blast out HB's or
fast-retransmits.
3) Determine on the first data transmission on a net if
its local-lan (by being under or over a RTT). This
can later be used to think about different algorithms
based on locallan vs big-i (experimental)
4) The cwnd should NOT be allowed to grow when an ECNEcho
is seen (TCP has this same bug). We fix this in SCTP
so an ECNe being seen prevents an advance of cwnd.
5) CWR's should not be sent multiple times to the
same network, instead just updating the TSN being
transmitted if needed.
MFC after: 1 Month
top 8 bits of the 32 bit signal bit field space for internal use. These private
signals should not be leaked outside of a module.
Given that many algorithm modules use the NewReno hook functions to simplify
their implementation, the obvious place such a leak would show up is in the
NewReno cong_signal hook function.
- Show the full number of significant bits in the signal type definitions in
<netinet/cc.h>.
- Add a bitmask to simplify figuring out if a given signal is in the private or
public bit range.
- Add a sanity check in newreno_cong_signal() to ensure private signals are not
being leaked into the hook function.
Sponsored by: FreeBSD Foundation
Discussed with: David Hayes <dahayes at swin edu au>
MFC after: 1 week
X-MFC with: r215166
algorithm described in the paper "Improved coexistence and loss tolerance for
delay based TCP congestion control" by Hayes and Armitage. It is implemented as
a kernel module compatible with the recently committed modular congestion
control framework.
CHD enhances the approach taken by the Hamilton-Delay (HD) algorithm to provide
tolerance to non-congestion related packet loss and improvements to coexistence
with loss-based congestion control algorithms. A key idea in improving
coexistence with loss-based congestion control algorithms is the use of a shadow
window, which attempts to track how NewReno's congestion window (cwnd) would
evolve. At the next packet loss congestion event, CHD uses the shadow window to
correct cwnd in a way that reduces the amount of unfairness CHD experiences when
competing with loss-based algorithms.
In collaboration with: David Hayes <dahayes at swin edu au> and
Grenville Armitage <garmitage at swin edu au>
Sponsored by: FreeBSD Foundation
Reviewed by: bz and others along the way
MFC after: 3 months
algorithm based on the paper "A strategy for fair coexistence of loss and
delay-based congestion control algorithms" by Budzisz, Stanojevic, Shorten and
Baker. It is implemented as a kernel module compatible with the recently
committed modular congestion control framework.
HD uses a probabilistic approach to reacting to delay-based congestion. The
probability of reducing cwnd is zero when the queuing delay is very small,
increasing to a maximum at a set threshold, then back down to zero again when
the queuing delay is high. Normal operation keeps the queuing delay below the
set threshold. However, since loss-based congestion control algorithms push the
queuing delay high when probing for bandwidth, having the probability of
reducing cwnd drop back to zero for high delays allows HD to compete with
loss-based algorithms.
In collaboration with: David Hayes <dahayes at swin edu au> and
Grenville Armitage <garmitage at swin edu au>
Sponsored by: FreeBSD Foundation
Reviewed by: bz and others along the way
MFC after: 3 months
based on the paper "TCP Vegas: end to end congestion avoidance on a global
internet" by Brakmo and Peterson. It is implemented as a kernel module
compatible with the recently committed modular congestion control framework.
VEGAS uses network delay as a congestion indicator and unlike regular loss-based
algorithms, attempts to keep the network operating with stable queuing delays
and no congestion losses. By keeping network buffers used along the path within
a set range, queuing delays are kept low while maintaining high throughput.
In collaboration with: David Hayes <dahayes at swin edu au> and
Grenville Armitage <garmitage at swin edu au>
Sponsored by: FreeBSD Foundation
Reviewed by: bz and others along the way
MFC after: 3 months
1) We now remove ECN-Nonce since it will no longer continue as a I-D
2) Eliminate last_tsn_echo, this tied us to an assoc not the net
and thus we were not doing m-homing on the ECN-Echo senders side right.
3) Increment the count going out even if the TSN in lower in the pending
ECN-Echo, this way the receiver knows exactly how many packets were
marked even with network re-ordering
4) Fix so we DO NOT stop doing delayed sack if a ECN Echo is in queue
MFC after: 1 month
1) ECN was on an association basis, this is incorrect and
will not work with CMT or for that matter if the user
is sending to multiple addresses. This commit makes
ECN on a per path basis.
2) Adopt the new format for the ECN internet draft. This also
maintains compatability with old format chunks as well.
3) Keep track of the real time of a RTT down to micro seconds.
For some future conditional features (for like a data center
this is good information to have).
MFC after: 1 month
This will be used for Data Center congestion
control, we won't want to engage it in the
ECN code unless we KNOW that the RTT is less
than 500us.
MFC after: 1 week
sends were being accounted for. The
counting was such that we counted only
when we queued a chunk, not when we sent it.
Now keep an additional counter for queuing and
one for sending.
MFC after: 1 week
with the latest socket API ID. Especially it can be disabled.
Full compliance needs changing the structure used in the
socket option. Since this breaks the API, it will be a
seperate commit which will not be MFCed to stable/8.
MFC after: 3 months.
Khelp/Hhook KPIs to hook into the TCP stack and maintain a per-connection, low
noise estimate of the instantaneous RTT. ERTT's implementation is robust even in
the face of delayed acknowledgements and/or TSO being in use for a connection.
A high quality, low noise RTT estimate is a requirement for applications such as
delay-based congestion control, for which we will be importing some algorithm
implementations shortly.
In collaboration with: David Hayes <dahayes at swin edu au> and
Grenville Armitage <garmitage at swin edu au>
Sponsored by: FreeBSD Foundation
Reviewed by: bz and others along the way
MFC after: 3 months
write to the buffer causes it to overflow. We therefore can't hold the CC list
rwlock over a call to sbuf_printf() for an sbuf configured with SBUF_AUTOEXTEND.
Switch to a fixed length sbuf which should be of sufficient size except in the
very unlikely event that the sysctl is being processed as one or more new
algorithms are loaded. If that happens, we accept the race and may fail the
sysctl gracefully if there is insufficient room to print the names of all the
algorithms.
This should address a WITNESS warning and the potential panic that would occur
if the sbuf call to malloc did sleep whilst holding the CC list rwlock.
Sponsored by: FreeBSD Foundation
Reported by: Nick Hibma
Reviewed by: bz
MFC after: 3 weeks
X-MFC with: r215166
- The mean RTT is updated at the end of each congestion epoch, but if we switch
to congestion avoidance within the first epoch (e.g. if ssthresh was primed
from the hostcache), we'll trigger a divide by zero panic in
cubic_ack_received(). Set the mean to the min in cubic_record_rtt() if the
mean is less than the min to ensure we have a sane mean for use in this
situation. This fixes the panic reported by Nick Hibma.
- Adjust conditions under which we update the mean RTT in cubic_post_recovery()
to ensure a low latency path won't yield an RTT of less than 1. This avoids
another potential divide by zero panic when running CUBIC in networks with
sub-millisecond latencies.
- Remove the "safety" assignment of min into mean when we don't update the mean
because of failed conditions. The above change to the conditions for updating
the mean ensures the safety issue is addressed and I feel it is better to keep
our previous mean estimate around if we can't update than to revert to the
min.
- Initialise the mean RTT to 1 on connection startup to act as a safety belt if
a situation we haven't considered and addressed with the above changes were to
crop up in the wild.
Sponsored by: FreeBSD Foundation
Reported and tested by: Nick Hibma
Discussed with: David Hayes <dahayes at swin edu au>
MFC after: 5 weeks
X-MFC with: r216114