freebsd-skq

Author	SHA1	Message	Date
sbruno	b4426c87f1	Revert r307901 - Inform CC modules about loss events. This was discussed between various transport@ members and it was requested to be reverted and discussed. Submitted by: Kevin Bowling <kevin.bowling@kev009.com> Reported by: lawrence Reviewed by: hiren Sponsored by: Limelight Networks	2017-07-25 15:08:52 +00:00
sbruno	fab58e2974	Revert r308180 - Set slow start threshold more accurrately on loss ... This was discussed between various transport@ members and it was requested to be reverted and discussed. Submitted by: kevin Reported by: lawerence Reviewed by: hiren	2017-07-25 15:03:05 +00:00
cem	6d240aee91	Fix a variety of cosmetic typos and misspellings No functional change. PR: 216096, 216097, 216098, 216101, 216102, 216106, 216109, 216110 Reported by: Bulat <bltsrc at mail.ru> Sponsored by: Dell EMC Isilon	2017-01-15 18:00:45 +00:00
hiren	97041f7818	Set slow start threshold more accurately on loss to be flightsize/2 instead of cwnd/2 as recommended by RFC5681. (spotted by mmacy at nextbsd dot org) Restore pre-r307901 behavior of aligning ssthresh/cwnd on mss boundary. (spotted by slawa at zxy dot spb dot ru) Tested by: dim, Slawa <slawa at zxy dot spb dot ru> MFC after: 1 month X-MFC with: r307901 Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D8349	2016-11-01 21:08:37 +00:00
hiren	f084a95815	FreeBSD tcp stack used to inform respective congestion control module about the loss event but not use or obay the recommendations i.e. values set by it in some cases. Here is an attempt to solve that confusion by following relevant RFCs/drafts. Stack only sets congestion window/slow start threshold values when there is no CC module availalbe to take that action. All CC modules are inspected and updated when needed to take appropriate action on loss. tcp_stacks/fastpath module has been updated to adapt these changes. Note: Probably, the most significant change would be to not bring congestion window down to 1MSS on a loss signaled by 3-duplicate acks and letting respective CC decide that value. In collaboration with: Matt Macy <mmacy at nextbsd dot org> Discussed on: transport@ mailing list Reviewed by: jtl MFC after: 1 month Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D8225	2016-10-25 05:45:47 +00:00
hiren	f904bc1c31	Undo r307899. It needs a bit more work and proper commit log.	2016-10-25 05:07:51 +00:00
hiren	301cc1629e	In Collaboration with: Matt Macy <mmacy at nextbsd dot com> Reviewed by: jtl Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D8225	2016-10-25 05:03:33 +00:00
jtl	69bf0a6bcd	Remove "long" variables from the TCP stack (not including the modular congestion control framework). Reviewed by: gnn, lstewart (partial) Sponsored by: Juniper Networks, Netflix Differential Revision: (multiple) Tested by: Limelight, Netflix	2016-10-06 16:28:34 +00:00
lstewart	b01e24747b	Pass the number of segments coalesced by LRO up the stack by repurposing the tso_segsz pkthdr field during RX processing, and use the information in TCP for more correct accounting and as a congestion control input. This is only a start, and an audit of other uses for the data is left as future work. Reviewed by: gallatin, rrs Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D7564	2016-08-25 13:33:32 +00:00
brd	b3e395ca9c	Fix the case for some sysctl descriptions. Reviewed by: gnn	2016-07-26 20:20:09 +00:00
hiren	7033f894b1	Add an option to use rfc6675 based pipe/inflight bytes calculation in htcp. Submitted by: Kevin Bowling <kevin.bowling@kev009.com> MFC after: 1 week Sponsored by: Limelight Networks	2016-05-09 19:19:03 +00:00
pfg	d9c9113377	sys/net*: minor spelling fixes. No functional change.	2016-05-03 18:05:43 +00:00
glebius	0769763b2b	Rename netinet/tcp_cc.h to netinet/cc/cc.h. Discussed with: lstewart	2016-01-27 17:59:39 +00:00
glebius	a1e3038e68	- Rename cc.h to more meaningful tcp_cc.h. - Declare it a kernel only include, which it already is. - Don't include tcp.h implicitly from tcp_cc.h	2016-01-21 22:34:51 +00:00
glebius	b0b805d811	Cleanup TCP files from unnecessary interface related includes.	2016-01-21 22:24:20 +00:00
hiren	6fab2b0d61	Add an option to use rfc6675 based pipe/inflight bytes calculation in newreno. MFC after: 3 weeks Sponsored by: Limelight Networks	2015-12-09 08:53:41 +00:00
hiren	2a4c4c39de	Add an option to use rfc6675 based pipe/inflight bytes calculation in cubic. Reviewed by: gnn MFC after: 3 weeks Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D4205	2015-12-09 07:56:40 +00:00
hiren	8ad8794452	DCTCP (Data Center TCP) implementation. DCTCP congestion control algorithm aims to maximise throughput and minimise latency in data center networks by utilising the proportion of Explicit Congestion Notification (ECN) marked packets received from capable hardware as a congestion signal. Highlights: Implemented as a mod_cc(4) module. ECN (Explicit congestion notification) processing is done differently from RFC3168. Takes one-sided DCTCP into consideration where only one of the sides is using DCTCP and other is using standard ECN. IETF draft: http://tools.ietf.org/html/draft-bensley-tcpm-dctcp-00 Thesis report by Midori Kato: https://eggert.org/students/kato-thesis.pdf Submitted by: Midori Kato <katoon@sfc.wide.ad.jp> and Lars Eggert <lars@netapp.com> with help and modifications from hiren Differential Revision: https://reviews.freebsd.org/D604 Reviewed by: gnn	2015-01-12 08:33:04 +00:00
glebius	99f4ec50e8	Remove SYSCTL_VNET_* macros, and simply put CTLFLAG_VNET where needed. Sponsored by: Nginx, Inc.	2014-11-07 09:39:05 +00:00
hselasky	a0b8ff0c54	The SYSCTL data pointers can come from userspace and must not be directly accessed. Although this will work on some platforms, it can throw an exception if the pointer is invalid and then panic the kernel. Add a missing SYSCTL_IN() of "SCTP_BASE_STATS" structure. MFC after: 3 days Sponsored by: Mellanox Technologies	2014-10-28 12:00:39 +00:00
hselasky	144232fef2	Preserve limitation of "TCP_CA_NAME_MAX" when matching the algorithm name. MFC after: 3 days Suggested by: gnn @	2014-10-27 16:08:41 +00:00
hselasky	d92cafa3fc	Make assignments to "net.inet.tcp.cc.algorithm" work by fixing a bad string comparison. MFC after: 3 days Reported by: Jukka Ukkonen <jau789@gmail.com> Sponsored by: Mellanox Technologies	2014-10-27 11:21:47 +00:00
hselasky	49c137f7be	Fix multiple incorrect SYSCTL arguments in the kernel: - Wrong integer type was specified. - Wrong or missing "access" specifier. The "access" specifier sometimes included the SYSCTL type, which it should not, except for procedural SYSCTL nodes. - Logical OR where binary OR was expected. - Properly assert the "access" argument passed to all SYSCTL macros, using the CTASSERT macro. This applies to both static- and dynamically created SYSCTLs. - Properly assert the the data type for both static and dynamic SYSCTLs. In the case of static SYSCTLs we only assert that the data pointed to by the SYSCTL data pointer has the correct size, hence there is no easy way to assert types in the C language outside a C-function. - Rewrote some code which doesn't pass a constant "access" specifier when creating dynamic SYSCTL nodes, which is now a requirement. - Updated "EXAMPLES" section in SYSCTL manual page. MFC after: 3 days Sponsored by: Mellanox Technologies	2014-10-21 07:31:21 +00:00
lstewart	b3bf7524ad	Destroy the "qdiffsample_zone" UMA zone on unload to avoid a use-after-unload panic easily triggered by running "sysctl -a" after unload. Reported and tested by: Grenville Armitage <garmitage@swin.edu.au> MFC after: 1 week	2014-08-19 02:19:53 +00:00
hselasky	e253e43882	Fix string length argument passed to "sysctl_handle_string()" so that the complete string is returned by the function and not just only one byte. PR: 192544 MFC after: 2 weeks	2014-08-10 07:51:55 +00:00
trociny	42b6b03337	Fixup for r261590 (vnet sysctl handlers cleanup). Reviewed by: glebius	2014-02-09 08:13:17 +00:00
lstewart	1920a78bd9	Import an implementation of the CAIA Delay-Gradient (CDG) congestion control algorithm, which is based on the 2011 v0.1 patch release and described in the paper "Revisiting TCP Congestion Control using Delay Gradients" by David Hayes and Grenville Armitage. It is implemented as a kernel module compatible with the modular congestion control framework. CDG is a hybrid congestion control algorithm which reacts to both packet loss and inferred queuing delay. It attempts to operate as a delay-based algorithm where possible, but utilises heuristics to detect loss-based TCP cross traffic and will compete effectively as required. CDG is therefore incrementally deployable and suitable for use on shared networks. In collaboration with: David Hayes <david.hayes at ieee.org> and Grenville Armitage <garmitage at swin edu au> MFC after: 4 days Sponsored by: Cisco University Research Program and FreeBSD Foundation	2013-07-02 08:44:56 +00:00
pluknet	e48732e3d4	Staticize malloc types. Approved by: lstewart MFC after: 1 week	2011-04-13 11:28:46 +00:00
lstewart	545f7c0ca7	Use the full and proper company name for Swinburne University of Technology throughout the source tree. Requested by: Grenville Armitage, Director of CAIA at Swinburne University of Technology MFC after: 3 days	2011-04-12 08:13:18 +00:00
lstewart	8d21b8a169	Algorithm modules can define their own private congestion signal types in the top 8 bits of the 32 bit signal bit field space for internal use. These private signals should not be leaked outside of a module. Given that many algorithm modules use the NewReno hook functions to simplify their implementation, the obvious place such a leak would show up is in the NewReno cong_signal hook function. - Show the full number of significant bits in the signal type definitions in <netinet/cc.h>. - Add a bitmask to simplify figuring out if a given signal is in the private or public bit range. - Add a sanity check in newreno_cong_signal() to ensure private signals are not being leaked into the hook function. Sponsored by: FreeBSD Foundation Discussed with: David Hayes <dahayes at swin edu au> MFC after: 1 week X-MFC with: r215166	2011-02-01 13:32:27 +00:00
lstewart	108062a407	Fix typo in comment: "course" -> "coarse" Sponsored by: FreeBSD Foundation Submitted by: jmallett MFC after: 3 months X-MFC with: r218152	2011-02-01 07:10:13 +00:00
lstewart	6d2fcf0be2	Import an implementation of the CAIA-Hamilton-Delay (CHD) congestion control algorithm described in the paper "Improved coexistence and loss tolerance for delay based TCP congestion control" by Hayes and Armitage. It is implemented as a kernel module compatible with the recently committed modular congestion control framework. CHD enhances the approach taken by the Hamilton-Delay (HD) algorithm to provide tolerance to non-congestion related packet loss and improvements to coexistence with loss-based congestion control algorithms. A key idea in improving coexistence with loss-based congestion control algorithms is the use of a shadow window, which attempts to track how NewReno's congestion window (cwnd) would evolve. At the next packet loss congestion event, CHD uses the shadow window to correct cwnd in a way that reduces the amount of unfairness CHD experiences when competing with loss-based algorithms. In collaboration with: David Hayes <dahayes at swin edu au> and Grenville Armitage <garmitage at swin edu au> Sponsored by: FreeBSD Foundation Reviewed by: bz and others along the way MFC after: 3 months	2011-02-01 07:05:14 +00:00
lstewart	1138f32d73	Import a clean-room implementation of the Hamilton-Delay (HD) congestion control algorithm based on the paper "A strategy for fair coexistence of loss and delay-based congestion control algorithms" by Budzisz, Stanojevic, Shorten and Baker. It is implemented as a kernel module compatible with the recently committed modular congestion control framework. HD uses a probabilistic approach to reacting to delay-based congestion. The probability of reducing cwnd is zero when the queuing delay is very small, increasing to a maximum at a set threshold, then back down to zero again when the queuing delay is high. Normal operation keeps the queuing delay below the set threshold. However, since loss-based congestion control algorithms push the queuing delay high when probing for bandwidth, having the probability of reducing cwnd drop back to zero for high delays allows HD to compete with loss-based algorithms. In collaboration with: David Hayes <dahayes at swin edu au> and Grenville Armitage <garmitage at swin edu au> Sponsored by: FreeBSD Foundation Reviewed by: bz and others along the way MFC after: 3 months	2011-02-01 06:42:46 +00:00
lstewart	fec0e548bc	Import a clean-room implementation of the VEGAS congestion control algorithm based on the paper "TCP Vegas: end to end congestion avoidance on a global internet" by Brakmo and Peterson. It is implemented as a kernel module compatible with the recently committed modular congestion control framework. VEGAS uses network delay as a congestion indicator and unlike regular loss-based algorithms, attempts to keep the network operating with stable queuing delays and no congestion losses. By keeping network buffers used along the path within a set range, queuing delays are kept low while maintaining high throughput. In collaboration with: David Hayes <dahayes at swin edu au> and Grenville Armitage <garmitage at swin edu au> Sponsored by: FreeBSD Foundation Reviewed by: bz and others along the way MFC after: 3 months	2011-02-01 06:17:00 +00:00
lstewart	3087aa2fa9	An sbuf configured with SBUF_AUTOEXTEND will call malloc with M_WAITOK when a write to the buffer causes it to overflow. We therefore can't hold the CC list rwlock over a call to sbuf_printf() for an sbuf configured with SBUF_AUTOEXTEND. Switch to a fixed length sbuf which should be of sufficient size except in the very unlikely event that the sysctl is being processed as one or more new algorithms are loaded. If that happens, we accept the race and may fail the sysctl gracefully if there is insufficient room to print the names of all the algorithms. This should address a WITNESS warning and the potential panic that would occur if the sbuf call to malloc did sleep whilst holding the CC list rwlock. Sponsored by: FreeBSD Foundation Reported by: Nick Hibma Reviewed by: bz MFC after: 3 weeks X-MFC with: r215166	2011-01-23 13:00:25 +00:00
lstewart	3a9a1dbc22	Some correctness and robustness fixes related to CUBIC's mean RTT estimate: - The mean RTT is updated at the end of each congestion epoch, but if we switch to congestion avoidance within the first epoch (e.g. if ssthresh was primed from the hostcache), we'll trigger a divide by zero panic in cubic_ack_received(). Set the mean to the min in cubic_record_rtt() if the mean is less than the min to ensure we have a sane mean for use in this situation. This fixes the panic reported by Nick Hibma. - Adjust conditions under which we update the mean RTT in cubic_post_recovery() to ensure a low latency path won't yield an RTT of less than 1. This avoids another potential divide by zero panic when running CUBIC in networks with sub-millisecond latencies. - Remove the "safety" assignment of min into mean when we don't update the mean because of failed conditions. The above change to the conditions for updating the mean ensures the safety issue is addressed and I feel it is better to keep our previous mean estimate around if we can't update than to revert to the min. - Initialise the mean RTT to 1 on connection startup to act as a safety belt if a situation we haven't considered and addressed with the above changes were to crop up in the wild. Sponsored by: FreeBSD Foundation Reported and tested by: Nick Hibma Discussed with: David Hayes <dahayes at swin edu au> MFC after: 5 weeks X-MFC with: r216114	2011-01-21 05:19:47 +00:00
mdf	5e41205b16	sysctl(9) cleanup checkpoint: amd64 GENERIC builds cleanly. Commit the net* piece.	2011-01-12 19:53:50 +00:00
lstewart	8300743dab	Import a clean-room implementation of the experimental H-TCP congestion control algorithm based on the Internet-Draft "draft-leith-tcp-htcp-06.txt". It is implemented as a kernel module compatible with the recently committed modular congestion control framework. H-TCP was designed to provide increased throughput in fast and long-distance networks. It attempts to maintain fairness when competing with legacy NewReno TCP in lower speed scenarios where NewReno is able to operate adequately. The paper "H-TCP: A framework for congestion control in high-speed and long-distance networks" provides additional detail. In collaboration with: David Hayes <dahayes at swin edu au> and Grenville Armitage <garmitage at swin edu au> Sponsored by: FreeBSD Foundation Reviewed by: rpaulo (older patch from a few weeks ago) MFC after: 3 months	2010-12-02 06:40:21 +00:00
lstewart	6e097be88a	Import a clean-room implementation of the experimental CUBIC congestion control algorithm based on the Internet-Draft "draft-rhee-tcpm-cubic-02.txt". It is implemented as a kernel module compatible with the recently committed modular congestion control framework. CUBIC was designed for provide increased throughput in fast and long-distance networks. It attempts to maintain fairness when competing with legacy NewReno TCP in lower speed scenarios where NewReno is able to operate adequately. The paper "CUBIC: A New TCP-Friendly High-Speed TCP Variant" provides additional detail. In collaboration with: David Hayes <dahayes at swin edu au> and Grenville Armitage <garmitage at swin edu au> Sponsored by: FreeBSD Foundation Reviewed by: rpaulo (older patch from a few weeks ago) MFC after: 3 months	2010-12-02 06:05:44 +00:00
lstewart	6d18efb46a	General cleanup of the NewReno CC module (no functional changes): - Remove superfluous includes and unhelpful comments. - Alphabetically order functions. - Make functions static. Sponsored by: FreeBSD Foundation MFC after: 9 weeks X-MFC with: r215166	2010-12-02 02:32:46 +00:00
lstewart	f6f8b1d549	- Reinstantiate the after_idle hook call in tcp_output(), which got lost somewhere along the way due to mismerging r211464 in our development tree. - Capture the essence of r211464 in NewReno's after_idle() hook. We don't use V_ss_fltsz/V_ss_fltsz_local yet which needs to be revisited. Sponsored by: FreeBSD Foundation Submitted by: David Hayes <dahayes at swin edu au> MFC after: 9 weeks X-MFC with: r215166	2010-12-02 01:36:00 +00:00
lstewart	d54d1011fb	Make the CC framework more VIMAGE friendly by adding the machinery to allow vnets to select their own default CC algorithm independent of each other and the base system. If the base system or a vnet has set a default which gets unloaded, we reset that netstack's default to NewReno. Sponsored by: FreeBSD Foundation Tested by: Mikolaj Golub <to.my.trociny at gmail com> Reviewed by: bz (briefly) MFC after: 3 months	2010-11-16 09:34:31 +00:00
lstewart	c63666dcd4	- Querying the default CC algo is more common than setting it and the function is small, so there is no good reason not to declare the buffer at the top. - Fix a whitespace nit. Sponsored by: FreeBSD Foundation MFC after: 11 weeks X-MFC with: r215166	2010-11-16 08:43:25 +00:00
lstewart	4cb0a4f8c6	Move protocol specific implementation detail out of the core CC framework. Sponsored by: FreeBSD Foundation Tested by: Mikolaj Golub <to.my.trociny at gmail com> MFC after: 11 weeks X-MFC with: r215166	2010-11-16 08:30:39 +00:00
lstewart	b114c25bf8	On CC algorithm module unload, we walk the list of active TCP control blocks. Any found to be using the algorithm that is about to go away are switched back to NewReno to avoid leaving dangling pointers which would trigger a panic. For VIMAGE kernels, there is a list per vnet to walk, yet the implementation was only examining one of the vnet lists. Fix the implementation of the above feature for VIMAGE kernels by looping through all active TCP control blocks across all vnets. Sponsored by: FreeBSD Foundation Tested by: Mikolaj Golub <to.my.trociny at gmail com> Reviewed by: bz (briefly) MFC after: 11 weeks	2010-11-16 07:57:56 +00:00
lstewart	c28db57abb	cc_init() should only be run once on system boot, but with VIMAGE kernels it runs on boot and each time a vnet jail is created. Running cc_init() multiple times results in a panic when attempting to initialise the cc_list lock again, and so r215166 effectively broke the use of vnet jails. Switch to using a SYSINIT to run cc_init() on boot. CC algorithm modules loaded on boot register in the same SI_SUB_PROTO_IFATTACHDOMAIN category as is used in this patch, so cc_init() is run at SI_ORDER_FIRST to ensure the framework is initialised before module registration is attempted. Sponsored by: FreeBSD Foundation Reported and tested by: Mikolaj Golub <to.my.trociny at gmail com> MFC after: 11 weeks X-MFC with: r215166	2010-11-16 07:09:05 +00:00
lstewart	df9f23bf3f	This commit marks the first formal contribution of the "Five New TCP Congestion Control Algorithms for FreeBSD" FreeBSD Foundation funded project. More details about the project are available at: http://caia.swin.edu.au/freebsd/5cc/ - Add a KPI and supporting infrastructure to allow modular congestion control algorithms to be used in the net stack. Algorithms can maintain per-connection state if required, and connections maintain their own algorithm pointer, which allows different connections to concurrently use different algorithms. The TCP_CONGESTION socket option can be used with getsockopt()/setsockopt() to programmatically query or change the congestion control algorithm respectively from within an application at runtime. - Integrate the framework with the TCP stack in as least intrusive a manner as possible. Care was also taken to develop the framework in a way that should allow integration with other congestion aware transport protocols (e.g. SCTP) in the future. The hope is that we will one day be able to share a single set of congestion control algorithm modules between all congestion aware transport protocols. - Introduce a new congestion recovery (TF_CONGRECOVERY) state into the TCP stack and use it to decouple the meaning of recovery from a congestion event and recovery from packet loss (TF_FASTRECOVERY) a la RFC2581. ECN and delay based congestion control protocols don't generally need to recover from packet loss and need a different way to note a congestion recovery episode within the stack. - Remove the net.inet.tcp.newreno sysctl, which simplifies some portions of code and ensures the stack always uses the appropriate mechanisms for recovering from packet loss during a congestion recovery episode. - Extract the NewReno congestion control algorithm from the TCP stack and massage it into module form. NewReno is always built into the kernel and will remain the default algorithm for the forseeable future. Implementations of additional different algorithms will become available in the near future. - Bump __FreeBSD_version to 900025 and note in UPDATING that rebuilding code that relies on the size of "struct tcpcb" is required. Many thanks go to the Cisco University Research Program Fund at Community Foundation Silicon Valley and the FreeBSD Foundation. Their support of our work at the Centre for Advanced Internet Architectures, Swinburne University of Technology is greatly appreciated. In collaboration with: David Hayes <dahayes at swin edu au> and Grenville Armitage <garmitage at swin edu au> Sponsored by: Cisco URP, FreeBSD Foundation Reviewed by: rpaulo Tested by: David Hayes (and many others over the years) MFC after: 3 months	2010-11-12 06:41:55 +00:00

47 Commits