freebsd-skq

Author	SHA1	Message	Date
Mikolaj Golub	db2f5a2461	Fixup for r261590 (vnet sysctl handlers cleanup). Reviewed by: glebius	2014-02-09 08:13:17 +00:00
Lawrence Stewart	92a0637f73	Import an implementation of the CAIA Delay-Gradient (CDG) congestion control algorithm, which is based on the 2011 v0.1 patch release and described in the paper "Revisiting TCP Congestion Control using Delay Gradients" by David Hayes and Grenville Armitage. It is implemented as a kernel module compatible with the modular congestion control framework. CDG is a hybrid congestion control algorithm which reacts to both packet loss and inferred queuing delay. It attempts to operate as a delay-based algorithm where possible, but utilises heuristics to detect loss-based TCP cross traffic and will compete effectively as required. CDG is therefore incrementally deployable and suitable for use on shared networks. In collaboration with: David Hayes <david.hayes at ieee.org> and Grenville Armitage <garmitage at swin edu au> MFC after: 4 days Sponsored by: Cisco University Research Program and FreeBSD Foundation	2013-07-02 08:44:56 +00:00
Sergey Kandaurov	6bed196c35	Staticize malloc types. Approved by: lstewart MFC after: 1 week	2011-04-13 11:28:46 +00:00
Lawrence Stewart	891b8ed467	Use the full and proper company name for Swinburne University of Technology throughout the source tree. Requested by: Grenville Armitage, Director of CAIA at Swinburne University of Technology MFC after: 3 days	2011-04-12 08:13:18 +00:00
Lawrence Stewart	03f0843bdb	Algorithm modules can define their own private congestion signal types in the top 8 bits of the 32 bit signal bit field space for internal use. These private signals should not be leaked outside of a module. Given that many algorithm modules use the NewReno hook functions to simplify their implementation, the obvious place such a leak would show up is in the NewReno cong_signal hook function. - Show the full number of significant bits in the signal type definitions in <netinet/cc.h>. - Add a bitmask to simplify figuring out if a given signal is in the private or public bit range. - Add a sanity check in newreno_cong_signal() to ensure private signals are not being leaked into the hook function. Sponsored by: FreeBSD Foundation Discussed with: David Hayes <dahayes at swin edu au> MFC after: 1 week X-MFC with: r215166	2011-02-01 13:32:27 +00:00
Lawrence Stewart	ec943febbb	Fix typo in comment: "course" -> "coarse" Sponsored by: FreeBSD Foundation Submitted by: jmallett MFC after: 3 months X-MFC with: r218152	2011-02-01 07:10:13 +00:00
Lawrence Stewart	0927e1a18b	Import an implementation of the CAIA-Hamilton-Delay (CHD) congestion control algorithm described in the paper "Improved coexistence and loss tolerance for delay based TCP congestion control" by Hayes and Armitage. It is implemented as a kernel module compatible with the recently committed modular congestion control framework. CHD enhances the approach taken by the Hamilton-Delay (HD) algorithm to provide tolerance to non-congestion related packet loss and improvements to coexistence with loss-based congestion control algorithms. A key idea in improving coexistence with loss-based congestion control algorithms is the use of a shadow window, which attempts to track how NewReno's congestion window (cwnd) would evolve. At the next packet loss congestion event, CHD uses the shadow window to correct cwnd in a way that reduces the amount of unfairness CHD experiences when competing with loss-based algorithms. In collaboration with: David Hayes <dahayes at swin edu au> and Grenville Armitage <garmitage at swin edu au> Sponsored by: FreeBSD Foundation Reviewed by: bz and others along the way MFC after: 3 months	2011-02-01 07:05:14 +00:00
Lawrence Stewart	ac230a79e1	Import a clean-room implementation of the Hamilton-Delay (HD) congestion control algorithm based on the paper "A strategy for fair coexistence of loss and delay-based congestion control algorithms" by Budzisz, Stanojevic, Shorten and Baker. It is implemented as a kernel module compatible with the recently committed modular congestion control framework. HD uses a probabilistic approach to reacting to delay-based congestion. The probability of reducing cwnd is zero when the queuing delay is very small, increasing to a maximum at a set threshold, then back down to zero again when the queuing delay is high. Normal operation keeps the queuing delay below the set threshold. However, since loss-based congestion control algorithms push the queuing delay high when probing for bandwidth, having the probability of reducing cwnd drop back to zero for high delays allows HD to compete with loss-based algorithms. In collaboration with: David Hayes <dahayes at swin edu au> and Grenville Armitage <garmitage at swin edu au> Sponsored by: FreeBSD Foundation Reviewed by: bz and others along the way MFC after: 3 months	2011-02-01 06:42:46 +00:00
Lawrence Stewart	1d4ed791d0	Import a clean-room implementation of the VEGAS congestion control algorithm based on the paper "TCP Vegas: end to end congestion avoidance on a global internet" by Brakmo and Peterson. It is implemented as a kernel module compatible with the recently committed modular congestion control framework. VEGAS uses network delay as a congestion indicator and unlike regular loss-based algorithms, attempts to keep the network operating with stable queuing delays and no congestion losses. By keeping network buffers used along the path within a set range, queuing delays are kept low while maintaining high throughput. In collaboration with: David Hayes <dahayes at swin edu au> and Grenville Armitage <garmitage at swin edu au> Sponsored by: FreeBSD Foundation Reviewed by: bz and others along the way MFC after: 3 months	2011-02-01 06:17:00 +00:00
Lawrence Stewart	a66ac850d7	An sbuf configured with SBUF_AUTOEXTEND will call malloc with M_WAITOK when a write to the buffer causes it to overflow. We therefore can't hold the CC list rwlock over a call to sbuf_printf() for an sbuf configured with SBUF_AUTOEXTEND. Switch to a fixed length sbuf which should be of sufficient size except in the very unlikely event that the sysctl is being processed as one or more new algorithms are loaded. If that happens, we accept the race and may fail the sysctl gracefully if there is insufficient room to print the names of all the algorithms. This should address a WITNESS warning and the potential panic that would occur if the sbuf call to malloc did sleep whilst holding the CC list rwlock. Sponsored by: FreeBSD Foundation Reported by: Nick Hibma Reviewed by: bz MFC after: 3 weeks X-MFC with: r215166	2011-01-23 13:00:25 +00:00
Lawrence Stewart	47f44cdd93	Some correctness and robustness fixes related to CUBIC's mean RTT estimate: - The mean RTT is updated at the end of each congestion epoch, but if we switch to congestion avoidance within the first epoch (e.g. if ssthresh was primed from the hostcache), we'll trigger a divide by zero panic in cubic_ack_received(). Set the mean to the min in cubic_record_rtt() if the mean is less than the min to ensure we have a sane mean for use in this situation. This fixes the panic reported by Nick Hibma. - Adjust conditions under which we update the mean RTT in cubic_post_recovery() to ensure a low latency path won't yield an RTT of less than 1. This avoids another potential divide by zero panic when running CUBIC in networks with sub-millisecond latencies. - Remove the "safety" assignment of min into mean when we don't update the mean because of failed conditions. The above change to the conditions for updating the mean ensures the safety issue is addressed and I feel it is better to keep our previous mean estimate around if we can't update than to revert to the min. - Initialise the mean RTT to 1 on connection startup to act as a safety belt if a situation we haven't considered and addressed with the above changes were to crop up in the wild. Sponsored by: FreeBSD Foundation Reported and tested by: Nick Hibma Discussed with: David Hayes <dahayes at swin edu au> MFC after: 5 weeks X-MFC with: r216114	2011-01-21 05:19:47 +00:00
Matthew D Fleming	f88910cdf5	sysctl(9) cleanup checkpoint: amd64 GENERIC builds cleanly. Commit the net* piece.	2011-01-12 19:53:50 +00:00
Lawrence Stewart	5728a0eae3	Import a clean-room implementation of the experimental H-TCP congestion control algorithm based on the Internet-Draft "draft-leith-tcp-htcp-06.txt". It is implemented as a kernel module compatible with the recently committed modular congestion control framework. H-TCP was designed to provide increased throughput in fast and long-distance networks. It attempts to maintain fairness when competing with legacy NewReno TCP in lower speed scenarios where NewReno is able to operate adequately. The paper "H-TCP: A framework for congestion control in high-speed and long-distance networks" provides additional detail. In collaboration with: David Hayes <dahayes at swin edu au> and Grenville Armitage <garmitage at swin edu au> Sponsored by: FreeBSD Foundation Reviewed by: rpaulo (older patch from a few weeks ago) MFC after: 3 months	2010-12-02 06:40:21 +00:00
Lawrence Stewart	67fef78ba4	Import a clean-room implementation of the experimental CUBIC congestion control algorithm based on the Internet-Draft "draft-rhee-tcpm-cubic-02.txt". It is implemented as a kernel module compatible with the recently committed modular congestion control framework. CUBIC was designed for provide increased throughput in fast and long-distance networks. It attempts to maintain fairness when competing with legacy NewReno TCP in lower speed scenarios where NewReno is able to operate adequately. The paper "CUBIC: A New TCP-Friendly High-Speed TCP Variant" provides additional detail. In collaboration with: David Hayes <dahayes at swin edu au> and Grenville Armitage <garmitage at swin edu au> Sponsored by: FreeBSD Foundation Reviewed by: rpaulo (older patch from a few weeks ago) MFC after: 3 months	2010-12-02 06:05:44 +00:00
Lawrence Stewart	74a5a1949e	General cleanup of the NewReno CC module (no functional changes): - Remove superfluous includes and unhelpful comments. - Alphabetically order functions. - Make functions static. Sponsored by: FreeBSD Foundation MFC after: 9 weeks X-MFC with: r215166	2010-12-02 02:32:46 +00:00
Lawrence Stewart	2ea8da28e9	- Reinstantiate the after_idle hook call in tcp_output(), which got lost somewhere along the way due to mismerging r211464 in our development tree. - Capture the essence of r211464 in NewReno's after_idle() hook. We don't use V_ss_fltsz/V_ss_fltsz_local yet which needs to be revisited. Sponsored by: FreeBSD Foundation Submitted by: David Hayes <dahayes at swin edu au> MFC after: 9 weeks X-MFC with: r215166	2010-12-02 01:36:00 +00:00
Lawrence Stewart	78b01840af	Make the CC framework more VIMAGE friendly by adding the machinery to allow vnets to select their own default CC algorithm independent of each other and the base system. If the base system or a vnet has set a default which gets unloaded, we reset that netstack's default to NewReno. Sponsored by: FreeBSD Foundation Tested by: Mikolaj Golub <to.my.trociny at gmail com> Reviewed by: bz (briefly) MFC after: 3 months	2010-11-16 09:34:31 +00:00
Lawrence Stewart	ebf92e869f	- Querying the default CC algo is more common than setting it and the function is small, so there is no good reason not to declare the buffer at the top. - Fix a whitespace nit. Sponsored by: FreeBSD Foundation MFC after: 11 weeks X-MFC with: r215166	2010-11-16 08:43:25 +00:00
Lawrence Stewart	99065ae6a8	Move protocol specific implementation detail out of the core CC framework. Sponsored by: FreeBSD Foundation Tested by: Mikolaj Golub <to.my.trociny at gmail com> MFC after: 11 weeks X-MFC with: r215166	2010-11-16 08:30:39 +00:00
Lawrence Stewart	4e805854ed	On CC algorithm module unload, we walk the list of active TCP control blocks. Any found to be using the algorithm that is about to go away are switched back to NewReno to avoid leaving dangling pointers which would trigger a panic. For VIMAGE kernels, there is a list per vnet to walk, yet the implementation was only examining one of the vnet lists. Fix the implementation of the above feature for VIMAGE kernels by looping through all active TCP control blocks across all vnets. Sponsored by: FreeBSD Foundation Tested by: Mikolaj Golub <to.my.trociny at gmail com> Reviewed by: bz (briefly) MFC after: 11 weeks	2010-11-16 07:57:56 +00:00
Lawrence Stewart	14f57a8b02	cc_init() should only be run once on system boot, but with VIMAGE kernels it runs on boot and each time a vnet jail is created. Running cc_init() multiple times results in a panic when attempting to initialise the cc_list lock again, and so r215166 effectively broke the use of vnet jails. Switch to using a SYSINIT to run cc_init() on boot. CC algorithm modules loaded on boot register in the same SI_SUB_PROTO_IFATTACHDOMAIN category as is used in this patch, so cc_init() is run at SI_ORDER_FIRST to ensure the framework is initialised before module registration is attempted. Sponsored by: FreeBSD Foundation Reported and tested by: Mikolaj Golub <to.my.trociny at gmail com> MFC after: 11 weeks X-MFC with: r215166	2010-11-16 07:09:05 +00:00
Lawrence Stewart	dbc4240942	This commit marks the first formal contribution of the "Five New TCP Congestion Control Algorithms for FreeBSD" FreeBSD Foundation funded project. More details about the project are available at: http://caia.swin.edu.au/freebsd/5cc/ - Add a KPI and supporting infrastructure to allow modular congestion control algorithms to be used in the net stack. Algorithms can maintain per-connection state if required, and connections maintain their own algorithm pointer, which allows different connections to concurrently use different algorithms. The TCP_CONGESTION socket option can be used with getsockopt()/setsockopt() to programmatically query or change the congestion control algorithm respectively from within an application at runtime. - Integrate the framework with the TCP stack in as least intrusive a manner as possible. Care was also taken to develop the framework in a way that should allow integration with other congestion aware transport protocols (e.g. SCTP) in the future. The hope is that we will one day be able to share a single set of congestion control algorithm modules between all congestion aware transport protocols. - Introduce a new congestion recovery (TF_CONGRECOVERY) state into the TCP stack and use it to decouple the meaning of recovery from a congestion event and recovery from packet loss (TF_FASTRECOVERY) a la RFC2581. ECN and delay based congestion control protocols don't generally need to recover from packet loss and need a different way to note a congestion recovery episode within the stack. - Remove the net.inet.tcp.newreno sysctl, which simplifies some portions of code and ensures the stack always uses the appropriate mechanisms for recovering from packet loss during a congestion recovery episode. - Extract the NewReno congestion control algorithm from the TCP stack and massage it into module form. NewReno is always built into the kernel and will remain the default algorithm for the forseeable future. Implementations of additional different algorithms will become available in the near future. - Bump __FreeBSD_version to 900025 and note in UPDATING that rebuilding code that relies on the size of "struct tcpcb" is required. Many thanks go to the Cisco University Research Program Fund at Community Foundation Silicon Valley and the FreeBSD Foundation. Their support of our work at the Centre for Advanced Internet Architectures, Swinburne University of Technology is greatly appreciated. In collaboration with: David Hayes <dahayes at swin edu au> and Grenville Armitage <garmitage at swin edu au> Sponsored by: Cisco URP, FreeBSD Foundation Reviewed by: rpaulo Tested by: David Hayes (and many others over the years) MFC after: 3 months	2010-11-12 06:41:55 +00:00

22 Commits