Commit Graph

4009 Commits

Author SHA1 Message Date
rrs
21c4d27f12 1) Allow a chunk to track the cwnd it was at when sent.
2) Add separate max-bursts for retransmit and hb. These
   are set to sysctlable values but not settable via the
   socket api. This makes sure we don't blast out HB's or
   fast-retransmits.
3) Determine on the first data transmission on a net if
   its local-lan (by being under or over a RTT). This
   can later be used to think about different algorithms
   based on locallan vs big-i (experimental)
4) The cwnd should NOT be allowed to grow when an ECNEcho
   is seen (TCP has this same bug). We fix this in SCTP
   so an ECNe being seen prevents an advance of cwnd.
5) CWR's should not be sent multiple times to the
   same network, instead just updating the TSN being
   transmitted if needed.

MFC after:	1 Month
2011-02-02 11:13:23 +00:00
lstewart
8d21b8a169 Algorithm modules can define their own private congestion signal types in the
top 8 bits of the 32 bit signal bit field space for internal use. These private
signals should not be leaked outside of a module.

Given that many algorithm modules use the NewReno hook functions to simplify
their implementation, the obvious place such a leak would show up is in the
NewReno cong_signal hook function.

- Show the full number of significant bits in the signal type definitions in
  <netinet/cc.h>.

- Add a bitmask to simplify figuring out if a given signal is in the private or
  public bit range.

- Add a sanity check in newreno_cong_signal() to ensure private signals are not
  being leaked into the hook function.

Sponsored by:	FreeBSD Foundation
Discussed with:	David Hayes <dahayes at swin edu au>
MFC after:	1 week
X-MFC with:	r215166
2011-02-01 13:32:27 +00:00
lstewart
108062a407 Fix typo in comment: "course" -> "coarse"
Sponsored by:	FreeBSD Foundation
Submitted by:	jmallett
MFC after:	3 months
X-MFC with:	r218152
2011-02-01 07:10:13 +00:00
lstewart
6d2fcf0be2 Import an implementation of the CAIA-Hamilton-Delay (CHD) congestion control
algorithm described in the paper "Improved coexistence and loss tolerance for
delay based TCP congestion control" by Hayes and Armitage. It is implemented as
a kernel module compatible with the recently committed modular congestion
control framework.

CHD enhances the approach taken by the Hamilton-Delay (HD) algorithm to provide
tolerance to non-congestion related packet loss and improvements to coexistence
with loss-based congestion control algorithms. A key idea in improving
coexistence with loss-based congestion control algorithms is the use of a shadow
window, which attempts to track how NewReno's congestion window (cwnd) would
evolve. At the next packet loss congestion event, CHD uses the shadow window to
correct cwnd in a way that reduces the amount of unfairness CHD experiences when
competing with loss-based algorithms.

In collaboration with:	David Hayes <dahayes at swin edu au> and
				Grenville Armitage <garmitage at swin edu au>
Sponsored by:	FreeBSD Foundation
Reviewed by:	bz and others along the way
MFC after:	3 months
2011-02-01 07:05:14 +00:00
lstewart
1138f32d73 Import a clean-room implementation of the Hamilton-Delay (HD) congestion control
algorithm based on the paper "A strategy for fair coexistence of loss and
delay-based congestion control algorithms" by Budzisz, Stanojevic, Shorten and
Baker. It is implemented as a kernel module compatible with the recently
committed modular congestion control framework.

HD uses a probabilistic approach to reacting to delay-based congestion. The
probability of reducing cwnd is zero when the queuing delay is very small,
increasing to a maximum at a set threshold, then back down to zero again when
the queuing delay is high. Normal operation keeps the queuing delay below the
set threshold. However, since loss-based congestion control algorithms push the
queuing delay high when probing for bandwidth, having the probability of
reducing cwnd drop back to zero for high delays allows HD to compete with
loss-based algorithms.

In collaboration with:	David Hayes <dahayes at swin edu au> and
				Grenville Armitage <garmitage at swin edu au>
Sponsored by:	FreeBSD Foundation
Reviewed by:	bz and others along the way
MFC after:	3 months
2011-02-01 06:42:46 +00:00
lstewart
fec0e548bc Import a clean-room implementation of the VEGAS congestion control algorithm
based on the paper "TCP Vegas: end to end congestion avoidance on a global
internet" by Brakmo and Peterson. It is implemented as a kernel module
compatible with the recently committed modular congestion control framework.

VEGAS uses network delay as a congestion indicator and unlike regular loss-based
algorithms, attempts to keep the network operating with stable queuing delays
and no congestion losses. By keeping network buffers used along the path within
a set range, queuing delays are kept low while maintaining high throughput.

In collaboration with:	David Hayes <dahayes at swin edu au> and
				Grenville Armitage <garmitage at swin edu au>
Sponsored by:	FreeBSD Foundation
Reviewed by:	bz and others along the way
MFC after:	3 months
2011-02-01 06:17:00 +00:00
rrs
730eb4b414 More ECN fixes:
1) We now remove ECN-Nonce since it will no longer continue as a I-D
2) Eliminate last_tsn_echo, this tied us to an assoc not the net
   and thus we were not doing m-homing on the ECN-Echo senders side right.
3) Increment the count going out even if the TSN in lower in the pending
   ECN-Echo, this way the receiver knows exactly how many packets were
   marked even with network re-ordering
4) Fix so we DO NOT stop doing delayed sack if a ECN Echo is in queue
MFC after:	1 month
2011-01-31 11:50:11 +00:00
bz
9fc71bde19 Remove duplicate printing of TF_NOPUSH in db_print_tflags().
MFC after:	10 days
2011-01-29 22:11:13 +00:00
rrs
181e76925b Fixes to ECN in SCTP.
1) ECN was on an association basis, this is incorrect and
   will not work with CMT or for that matter if the user
   is sending to multiple addresses. This commit makes
   ECN on a per path basis.
2) Adopt the new format for the ECN internet draft. This also
   maintains compatability with old format chunks as well.
3) Keep track of the real time of a RTT down to micro seconds.
   For some future conditional features (for like a data center
   this is good information to have).
MFC after:	1 month
2011-01-29 19:55:29 +00:00
rrs
f218f28d16 Keep track of the real last RTT on each net.
This will be used for Data Center congestion
control, we won't want to engage it in the
ECN code unless we KNOW that the RTT is less
than 500us.

MFC after:	1 week
2011-01-28 21:05:21 +00:00
rrs
700f5d5a13 Fix a bug in the way ECN-Echo chunk
sends were being accounted for. The
counting was such that we counted only
when we queued a chunk, not when we sent it.
Now keep an additional counter for queuing and
one for sending.

MFC after:	1 week
2011-01-28 20:49:15 +00:00
tuexen
901b1779e7 * Use 300 ms as the default for RTO_MIN.
* Disable burst mitigation by default.
* Remove unused constant.
Discussed with rrs.
MFC after: 3 months.
2011-01-26 21:38:17 +00:00
tuexen
0044f4e6eb Make SCTP_MAX_BURST compliant with the latest version of
the socket API ID. This is not compatible with the API
in stable/8.
2011-01-26 19:55:54 +00:00
tuexen
32d6ca8049 Change infrastructure for SCTP_MAX_BURST to allow compliance
with the latest socket API ID. Especially it can be disabled.

Full compliance needs changing the structure used in the
socket option. Since this breaks the API, it will be a
seperate commit which will not be MFCed to stable/8.

MFC after: 3 months.
2011-01-26 19:49:03 +00:00
deischen
3025829199 Prison check addresses set with multicast interface options.
Reviewed by:	bz
MFC after:	1 week
2011-01-26 17:31:03 +00:00
thompsa
bd51c8de85 When matching an incoming ARP against a bridge, ensure both interfaces belong
to the same bridge.

Submitted by:	Alexander Zagrebin
2011-01-25 17:15:23 +00:00
lstewart
7bfbf8dedb Import the ERTT (Enhanced Round Trip Time) Khelp module. ERTT uses the
Khelp/Hhook KPIs to hook into the TCP stack and maintain a per-connection, low
noise estimate of the instantaneous RTT. ERTT's implementation is robust even in
the face of delayed acknowledgements and/or TSO being in use for a connection.

A high quality, low noise RTT estimate is a requirement for applications such as
delay-based congestion control, for which we will be importing some algorithm
implementations shortly.

In collaboration with:	David Hayes <dahayes at swin edu au> and
				Grenville Armitage <garmitage at swin edu au>
Sponsored by:	FreeBSD Foundation
Reviewed by:	bz and others along the way
MFC after:	3 months
2011-01-24 23:08:38 +00:00
tuexen
459496919d Add stream scheduling support.
This work is based on a patch received from Robin Seggelmann.

MFC after: 3 months.
2011-01-23 19:36:28 +00:00
lstewart
3087aa2fa9 An sbuf configured with SBUF_AUTOEXTEND will call malloc with M_WAITOK when a
write to the buffer causes it to overflow. We therefore can't hold the CC list
rwlock over a call to sbuf_printf() for an sbuf configured with SBUF_AUTOEXTEND.

Switch to a fixed length sbuf which should be of sufficient size except in the
very unlikely event that the sysctl is being processed as one or more new
algorithms are loaded. If that happens, we accept the race and may fail the
sysctl gracefully if there is insufficient room to print the names of all the
algorithms.

This should address a WITNESS warning and the potential panic that would occur
if the sbuf call to malloc did sleep whilst holding the CC list rwlock.

Sponsored by:	FreeBSD Foundation
Reported by:	Nick Hibma
Reviewed by:	bz
MFC after:	3 weeks
X-MFC with:	r215166
2011-01-23 13:00:25 +00:00
tuexen
de03d8a719 Remove unnecessary checking of variable.
MFC after: 3 months.
2011-01-23 07:27:35 +00:00
lstewart
3a9a1dbc22 Some correctness and robustness fixes related to CUBIC's mean RTT estimate:
- The mean RTT is updated at the end of each congestion epoch, but if we switch
  to congestion avoidance within the first epoch (e.g. if ssthresh was primed
  from the hostcache), we'll trigger a divide by zero panic in
  cubic_ack_received(). Set the mean to the min in cubic_record_rtt() if the
  mean is less than the min to ensure we have a sane mean for use in this
  situation. This fixes the panic reported by Nick Hibma.

- Adjust conditions under which we update the mean RTT in cubic_post_recovery()
  to ensure a low latency path won't yield an RTT of less than 1. This avoids
  another potential divide by zero panic when running CUBIC in networks with
  sub-millisecond latencies.

- Remove the "safety" assignment of min into mean when we don't update the mean
  because of failed conditions. The above change to the conditions for updating
  the mean ensures the safety issue is addressed and I feel it is better to keep
  our previous mean estimate around if we can't update than to revert to the
  min.

- Initialise the mean RTT to 1 on connection startup to act as a safety belt if
  a situation we haven't considered and addressed with the above changes were to
  crop up in the wild.

Sponsored by:	FreeBSD Foundation
Reported and tested by:	Nick Hibma
Discussed with:	David Hayes <dahayes at swin edu au>
MFC after:	5 weeks
X-MFC with:	r216114
2011-01-21 05:19:47 +00:00
tuexen
3a242c4813 Improve comments.
MFC after: 1 week.
2011-01-20 13:53:34 +00:00
rrs
cf6406f93b Fix it so we align with new socket API draft for
state's in destination (i.e. ACTIVE/INACTIVE/UNCONFIRMED)

MFC after:	1 week
2011-01-20 12:40:09 +00:00
tuexen
851d2aaf27 Cleanup the management of CC functions.
MFC after: 3 months.
2011-01-19 22:10:35 +00:00
rrs
fb297ff2cc Fix style 9 nit that snuck in when I
grabbed the wrong patch ;-0 (thanks Daniel)

MFC after:	1 week
2011-01-19 20:57:08 +00:00
rrs
647f1a160f Fix a bug where Multicast packets sent from a
udp endpoint may end up echoing back to the sender
even with OUT joining the multi-cast group.

Reviewed by:	gnn, bms, bz?
Obtained from:	deischen (with help from)
2011-01-19 19:07:16 +00:00
mdf
0086ff603f Specify a CTLTYPE_FOO so that a future sysctl(8) change does not need
to rely on the format string.  For SYSCTL_PROC instances that I
noticed a discrepancy between the CTLTYPE and the format specifier,
fix the CTLTYPE.
2011-01-18 21:14:13 +00:00
tuexen
5ca779f63c Add support for resource pooling to CMT.
An original version of the patch was developed by Martin Becke
and Thomas Dreibholz.

MFC after: 3 months
2011-01-16 10:02:46 +00:00
jhb
9e010db002 Use a blocking malloc() to initialize the dummynet taskq.
Reviewed by:	luigi
2011-01-13 17:02:39 +00:00
csjp
78edae2a73 Un-break the build: use the correct format specifier for sizeof() 2011-01-12 23:07:51 +00:00
mdf
5e41205b16 sysctl(9) cleanup checkpoint: amd64 GENERIC builds cleanly.
Commit the net* piece.
2011-01-12 19:53:50 +00:00
gnn
31dade5c8b Fix several bugs in the ARP code related to improperly formatted
packets.

*) Reject requests with a protocol length not equal to 4.  This is IPv4
and there is no reason to accept anything else.

*) Reject packets that have a multicast source hardware address.

*) Drop requests where the hardware address length is not equal
to the hardware address length of the interface.

Pointed out by:	Rozhuk Ivan
MFC after:	1 week
2011-01-12 19:11:17 +00:00
lstewart
4184608283 Fixe some whitespace nits that were introduced in r216758.
Sponsored by:	FreeBSD Foundation
Submitted by:	pjd
MFC after:	10 weeks
X-MFC with:	r216758
2011-01-11 01:32:08 +00:00
lstewart
910c45c7da Reset the last_sack_ack SACK hint for TCP input processing to ensure that the
hint is 0 when no SACK data is received to update the hint with. This was
accidentally omitted from r216753.

Sponsored by:	FreeBSD Foundation
MFC after:	10 weeks
X-MFC with:	216753
2011-01-10 06:12:01 +00:00
deischen
6fbd535b5a Make sure to always do source address selection on
an unbound socket, regardless of any multicast options.
If an address is specified via a multicast option, then
let it override normal the source address selection.

This fixes a bug where source address selection was
not being performed when multicast options were present
but without an interface being specified.

Reviewed by:	bz
MFC after:	1 day
2011-01-08 22:33:46 +00:00
jhb
24979b2fb3 Trim extra spaces before tabs. 2011-01-07 21:40:34 +00:00
gnn
a6ca4af563 Fix a memory leak in ARP queues.
Pointed out by: jhb@
MFC after:	2 weeks
2011-01-07 20:02:05 +00:00
gnn
fe42eae980 Adjust ARP hold queue locking.
Submitted by:	Rozhuk Ivan, jhb
MFC after:	2 weeks
2011-01-07 18:14:58 +00:00
jhb
05673f05f2 Use a regular taskqueue for dummynet rather than a "fast" taskqueue.
Reviewed by:	luigi
2011-01-07 16:47:20 +00:00
tuexen
e53f353cdc Bugfix: Make sure that the COMM_UP notificatin is delivered first also
on the passive side.

MFC after: 3 days.
2011-01-02 10:27:27 +00:00
tuexen
68eca43089 Fix a typo.
MFC after: 3 months.
2011-01-01 22:22:57 +00:00
bz
969f8f9046 Try to catch a possible divide-by-zero as early as possible if "mtu" is 0
(also test for negative MTUs if checking it anyway).
An MTU of 0 is arguably a bug elsewhere, but this at least gives us some
more debugging hints.

Sponsored by:	ISPsystem (Early 2010)
MFC after:	1 week
2010-12-31 21:47:11 +00:00
tuexen
12d4883ccf Define and use SCTP_SSN_GE, SCTP_SSN_GT, SCTP_TSN_GE, SCTP_TSN_GT macros
and use them instead of the generic compare_with_wrap.
Retire compare_with_wrap.

MFC after: 3 months.
2010-12-30 21:32:35 +00:00
tuexen
839236cbc1 Code cleanup: Use LIST_FOREACH, LIST_FOREACH_SAFE, TAILQ_FOREACH,
TAILQ_FOREACH_SAFE where appropriate.
No functional change.

MFC after: 3 months.
2010-12-30 16:56:20 +00:00
tuexen
e71b6473c6 Fix three bugs related to the sequence number wrap-around affecting
the processing of ECNE and ASCONF chunks.

Reviewed by: rrs
MFC after: 3 days.
2010-12-30 16:23:13 +00:00
lstewart
5fb7e0486c Add a comment for the ccv member of struct tcpcb.
Sponsored by:	FreeBSD Foundation
MFC after:	5 weeks
X-MFC with:	r215166
2010-12-28 12:37:57 +00:00
lstewart
446c1bbb10 - Add some helper hook points to the TCP stack. The hooks allow Khelp modules to
access inbound/outbound events and associated data for established TCP
  connections. The hooks only run if at least one hook function is registered
  for the hook point, ensuring the impact on the stack is effectively nil when
  no TCP Khelp modules are loaded. struct tcp_hhook_data is passed as contextual
  data to any registered Khelp module hook functions.

- Add an OSD (Object Specific Data) pointer to struct tcpcb to allow Khelp
  modules to associate per-connection data with the TCP control block.

- Bump __FreeBSD_version and add a note to UPDATING regarding to ABI changes
  introduced by this commit and r216753.

In collaboration with:	David Hayes <dahayes at swin edu au> and
				Grenville Armitage <garmitage at swin edu au>
Sponsored by:	FreeBSD Foundation
Reviewed by:	bz, others along the way
MFC after:	3 months
2010-12-28 12:13:30 +00:00
lstewart
04373a011b Add a new sack hint to track the most recent and highest sacked sequence number.
This will be used by the incoming Enhanced RTT Khelp module.

Sponsored by:	FreeBSD Foundation
Submitted by:	David Hayes <dahayes at swin edu au>
Reviewed by:	bz and others (as part of a larger patch)
MFC after:	3 months
2010-12-28 03:27:20 +00:00
lstewart
425155a6c9 Fix a whitespace nit introduced in r215166.
Sponsored by:	FreeBSD Foundation
Spotted by:	bz
MFC after:	5 weeks
X-MFC with:	r215166
2010-12-28 01:38:52 +00:00
rwatson
7d9fa04b1a Remove comment bemoaning the lack of an INP_INHASHLIST above in_pcbdrop();
I fixed this in r189657 in early 2009, so the comment is OBE.

Reviewed by:	bz
MFC after:	3 days
2010-12-27 19:38:25 +00:00