freebsd-dev

Author	SHA1	Message	Date
Warner Losh	b23b156e2e	Fix casting error from newer gcc Cast the pointers to (uintptr_t) before assigning to type uint64_t. This eliminates an error from gcc when we cast the pointer to a larger integer type.	2019-10-09 21:02:06 +00:00
Randall Stewart	5b63b22075	Brad Davis identified a problem with the new LRO code, VLAN's no longer worked. The problem was that the defines used the same space as the VLAN id. This commit does three things. 1) Move the LRO used fields to the PH_per fields. This is safe since the entire PH_per is used for IP reassembly which LRO code will not hit. 2) Remove old unused pace fields that are not used in mbuf.h 3) The VLAN processing is not in the mbuf queueing code. Consequently if a VLAN submits to Rack or BBR we need to bypass the mbuf queueing for now until rack_bbr_common is updated to handle the VLAN properly. Reported by: Brad Davis	2019-10-06 22:29:02 +00:00
Conrad Meyer	373013b05f	Fix build after r351934 tcp_queue_pkts() is only used if TCPHPTS is defined (and it is not by default). Reported by: gcc	2019-09-06 18:33:39 +00:00
Randall Stewart	e57b2d0e51	This adds the final tweaks to LRO that will now allow me to add BBR. These changes make it so you can get an array of timestamps instead of a compressed ack/data segment. BBR uses this to aid with its delivery estimates. We also now (via Drew's suggestions) will not go to the expense of the tcb lookup if no stack registers to want this feature. If HPTS is not present the feature is not present either and you just get the compressed behavior. Sponsored by: Netflix Inc Differential Revision: https://reviews.freebsd.org/D21127	2019-09-06 14:25:41 +00:00
Sean Bruno	d7fb35d13a	Update tcp_lro with tested bugfixes from Netflix and LLNW: rrs - Lets make the LRO code look for true dup-acks and window update acks fly on through and combine. rrs - Make the LRO engine a bit more aware of ack-only seq space. Lets not have it incorrectly wipe out newer acks for older acks when we have out-of-order acks (common in wifi environments). jeggleston - LRO eating window updates Based on all of the above I think we are RFC compliant doing it this way: https://tools.ietf.org/html/rfc1122 section 4.2.2.16 "Note that TCP has a heuristic to select the latest window update despite possible datagram reordering; as a result, it may ignore a window update with a smaller window than previously offered if neither the sequence number nor the acknowledgment number is increased." Submitted by: Kevin Bowling <kevin.bowling@kev009.com> Reviewed by: rstone gallatin Sponsored by: NetFlix and Limelight Networks Differential Revision: https://reviews.freebsd.org/D14540	2018-03-09 00:08:43 +00:00
Pedro F. Giffuni	fe267a5590	sys: general adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 2-Clause license, however the tool I was using misidentified many licenses so this was mostly a manual - error prone - task. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. No functional change intended.	2017-11-27 15:23:17 +00:00
Navdeep Parhar	f8acc03ef1	Flush the LRO ctrl as soon as lro_mbufs fills up. There is no need to wait for the next enqueue from the driver. Reviewed by: gnn@, hselasky@, gallatin@ MFC after: 1 week Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D10432	2017-04-24 22:35:00 +00:00
Navdeep Parhar	ea9a92f112	Frames that are not considered for LRO should not be counted in LRO statistics. Reviewed by: gnn@, hselasky@, gallatin@ MFC after: 1 week Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D10430	2017-04-24 22:31:56 +00:00
Navdeep Parhar	b0ca71f0a0	Free lro_hash unconditionally, just like lro_mbuf_data a few lines later. Fix whitespace nit while here.	2017-04-19 23:06:07 +00:00
Navdeep Parhar	a3927369fa	Do not leak lro_hash on failure to allocate lro_mbuf_data. MFC after: 1 week	2017-04-19 22:27:26 +00:00
Navdeep Parhar	3d24e03800	Remove redundant assignment.	2017-04-19 22:20:41 +00:00
Lawrence Stewart	4b7b743c16	Pass the number of segments coalesced by LRO up the stack by repurposing the tso_segsz pkthdr field during RX processing, and use the information in TCP for more correct accounting and as a congestion control input. This is only a start, and an audit of other uses for the data is left as future work. Reviewed by: gallatin, rrs Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D7564	2016-08-25 13:33:32 +00:00
Sepherosa Ziehau	8452c1b345	tcp/lro: Make # of LRO entries tunable Reviewed by: hps, gallatin Obtained from: rrs, gallatin MFC after: 2 weeks Sponsored by: Netflix (rrs, gallatin), Microsoft (sephe) Differential Revision: https://reviews.freebsd.org/D7499	2016-08-16 06:40:27 +00:00
Sepherosa Ziehau	b9ec6f0b02	tcp/lro: If timestamps mismatch or it's a FIN, force flush. This keeps the segments/ACK/FIN delivery order. Before this patch, it was observed: if A sent FIN immediately after an ACK, B would deliver FIN first to the TCP stack, then the ACK. This out-of-order delivery causes one unnecessary ACK sent from B. Reviewed by: gallatin, hps Obtained from: rrs, gallatin Sponsored by: Netflix (rrs, gallatin), Microsoft (sephe) Differential Revision: https://reviews.freebsd.org/D7415	2016-08-05 09:08:00 +00:00
Sepherosa Ziehau	05cde7efa6	tcp/lro: Implement hash table for LRO entries. This significantly improves HTTP workload performance and reduces HTTP workload latency. Reviewed by: rrs, gallatin, hps Obtained from: rrs, gallatin Sponsored by: Netflix (rrs, gallatin) , Microsoft (sephe) Differential Revision: https://reviews.freebsd.org/D6689	2016-08-02 06:36:47 +00:00
Hans Petter Selasky	ec6689059d	Use insertion sort instead of bubble sort in TCP LRO. Replacing the bubble sort with insertion sort gives an 80% reduction in runtime on average, with randomized keys, for small partitions. If the keys are pre-sorted, insertion sort runs in linear time, and even if the keys are reversed, insertion sort is faster than bubble sort, although not by much. Update comment describing "tcp_lro_sort()" while at it. Differential Revision: https://reviews.freebsd.org/D6619 Sponsored by: Mellanox Technologies Tested by: Netflix Suggested by: Pieter de Goeje <pieter@degoeje.nl> Reviewed by: ed, gallatin, gnn, transport	2016-06-03 08:35:07 +00:00
Hans Petter Selasky	fc271df341	Use optimised complexity safe sorting routine instead of the kernel's "qsort()". The kernel's "qsort()" routine can in worst case spend O(N*N) amount of comparisons before the input array is sorted. It can also recurse a significant amount of times using up the kernel's interrupt thread stack. The custom sorting routine takes advantage of that the sorting key is only 64 bits. Based on set and cleared bits in the sorting key it partitions the array until it is sorted. This process has a recursion limit of 64 times, due to the number of set and cleared bits which can occur. Compiled with -O2 the sorting routine was measured to use 64-bytes of stack. Multiplying this by 64 gives a maximum stack consumption of 4096 bytes for AMD64. The same applies to the execution time, that the array to be sorted will not be traversed more than 64 times. When serving roughly 80Gb/s with 80K TCP connections, the old method consisting of "qsort()" and "tcp_lro_mbuf_compare_header()" used 1.4% CPU, while the new "tcp_lro_sort()" used 1.1% for LRO related sorting as measured by Intel Vtune. The testing was done using a sysctl to toggle between "qsort()" and "tcp_lro_sort()". Differential Revision: https://reviews.freebsd.org/D6472 Sponsored by: Mellanox Technologies Tested by: Netflix Reviewed by: gallatin, rrs, sephe, transport	2016-05-26 11:10:31 +00:00
Sepherosa Ziehau	51e3c20d36	tcp/lro: Refactor the active list operation. Ease more work concerning active list, e.g. hash table etc. Reviewed by: gallatin, rrs (earlier version) Sponsored by: Microsoft OSTC Differential Revision: https://reviews.freebsd.org/D6137	2016-05-03 08:13:25 +00:00
Sepherosa Ziehau	9b436b180c	tcp/lro: Fix more typo Noticed by: hiren MFC after: 1 week Sponsored by: Microsoft OSTC	2016-04-28 01:43:18 +00:00
Sepherosa Ziehau	9e3db01282	tcp/lro: Fix typo. MFC after: 1 week Sponsored by: Microsoft OSTC	2016-04-27 09:40:55 +00:00
Sepherosa Ziehau	1ea448225c	tcp/lro: Change SLIST to LIST, so that removing an entry is O(1) This is kinda critical to the performance when the CPU is slow and network bandwidth is high, e.g. in the hypervisor. Reviewed by: rrs, gallatin, Dexuan Cui <decui microsoft com> Sponsored by: Microsoft OSTC Differential Revision: https://reviews.freebsd.org/D5765	2016-04-01 06:43:05 +00:00
Sepherosa Ziehau	6dd38b8716	tcp/lro: Use tcp_lro_flush_all in device drivers to avoid code duplication And factor out tcp_lro_rx_done, which deduplicates the same logic with netinet/tcp_lro.c Reviewed by: gallatin (1st version), hps, zbb, np, Dexuan Cui <decui microsoft com> Sponsored by: Microsoft OSTC Differential Revision: https://reviews.freebsd.org/D5725	2016-04-01 06:28:33 +00:00
Sepherosa Ziehau	489f0c3c17	tcp/lro: Return TCP_LRO_NO_ENTRIES if we are short of LRO entries. So that callers could react accordingly. Reviewed by: gallatin (no objection) MFC after: 1 week Sponsored by: Microsoft OSTC Differential Revision: https://reviews.freebsd.org/D5695	2016-03-25 02:54:13 +00:00
Sepherosa Ziehau	7ae3d4bf54	tcp/lro: Allow drivers to set the TCP ACK/data segment aggregation limit ACK aggregation limit is append count based, while the TCP data segment aggregation limit is length based. Unless the network driver sets these two limits, it's an NO-OP. Reviewed by: adrian, gallatin (previous version), hselasky (previous version) Approved by: adrian (mentor) MFC after: 1 week Sponsored by: Microsoft OSTC Differential Revision: https://reviews.freebsd.org/D5185	2016-02-18 04:58:34 +00:00
Hans Petter Selasky	3e9470b721	Use a pair of ifs when comparing the 32-bit flowid integers so that the sign bit doesn't cause an overflow. The overflow manifests itself as a sorting index wrap around in the middle of the sorted array, which is not a problem for the LRO code, but might be a problem for the logic inside qsort(). Reviewed by: gnn @ Sponsored by: Mellanox Technologies Differential Revision: https://reviews.freebsd.org/D5239	2016-02-11 10:03:50 +00:00
Gleb Smirnoff	8ec07310fa	These files were getting sys/malloc.h and vm/uma.h with header pollution via sys/mbuf.h	2016-02-01 17:41:21 +00:00
Hans Petter Selasky	e936121d31	Add optimizing LRO wrapper: - Add optimizing LRO wrapper which pre-sorts all incoming packets according to the hash type and flowid. This prevents exhaustion of the LRO entries due to too many connections at the same time. Testing using a larger number of higher bandwidth TCP connections showed that the incoming ACK packet aggregation rate increased from ~1.3:1 to almost 3:1. Another test showed that for a number of TCP connections greater than 16 per hardware receive ring, where 8 TCP connections was the LRO active entry limit, there was a significant improvement in throughput due to being able to fully aggregate more than 8 TCP stream. For very few very high bandwidth TCP streams, the optimizing LRO wrapper will add CPU usage instead of reducing CPU usage. This is expected. Network drivers which want to use the optimizing LRO wrapper needs to call "tcp_lro_queue_mbuf()" instead of "tcp_lro_rx()" and "tcp_lro_flush_all()" instead of "tcp_lro_flush()". Further the LRO control structure must be initialized using "tcp_lro_init_args()" passing a non-zero number into the "lro_mbufs" argument. - Make LRO statistics 64-bit. Previously 32-bit integers were used for statistics which can be prone to wrap-around. Fix this while at it and update all SYSCTL's which expose LRO statistics. - Ensure all data is freed when destroying a LRO control structures, especially leftover LRO entries. - Reduce number of memory allocations needed when setting up a LRO control structure by precomputing the total amount of memory needed. - Add own memory allocation counter for LRO. - Bump the FreeBSD version to force recompilation of all KLDs due to change of the LRO control structure size. Sponsored by: Mellanox Technologies Reviewed by: gallatin, sbruno, rrs, gnn, transport Tested by: Netflix Differential Revision: https://reviews.freebsd.org/D4914	2016-01-19 15:33:28 +00:00
Navdeep Parhar	9523d1bfc3	Fix leak in tcp_lro_rx. Simply clearing M_PKTHDR isn't enough, any tags hanging off the header need to be freed too. Differential Revision: https://reviews.freebsd.org/D2708 Reviewed by: ae@, hiren@	2015-06-30 17:19:58 +00:00
Navdeep Parhar	7127e6acf0	Merge r254336 from user/np/cxl_tuning. Add a last-modified timestamp to each LRO entry and provide an interface to flush all inactive entries. Drivers decide when to flush and what the inactivity threshold should be. Network drivers that process an rx queue to completion can enter a livelock type situation when the rate at which packets are received reaches equilibrium with the rate at which the rx thread is processing them. When this happens the final LRO flush (normally when the rx routine is done) does not occur. Pure ACKs and segments with total payload < 64K can get stuck in an LRO entry. Symptoms are that TCP tx-mostly connections' performance falls off a cliff during heavy, unrelated rx on the interface. Flushing only inactive LRO entries works better than any of these alternates that I tried: - don't LRO pure ACKs - flush _all_ LRO entries periodically (every 'x' microseconds or every 'y' descriptors) - stop rx processing in the driver periodically and schedule remaining work for later. Reviewed by: andre	2013-08-28 23:00:34 +00:00
Andrew Gallatin	e5ca1ffab5	Fix tcp_lro_rx_ipv4() for drivers that do not set CSUM_IP_CHECKED. Specifcially, in_cksum_hdr() returns 0 (not 0xffff) when the IPv4 checksum is correct. Without this fix, the tcp_lro code will reject good IPv4 traffic from drivers that do not implement IPv4 header harder csum offload. Sponsored by: Myricom Inc. MFC after: 7 days	2013-02-21 17:00:35 +00:00
Bjoern A. Zeeb	5fa2656e55	Make TCP LRO work properly with VIMAGE kernels rather than just panicing. There's no VIMAGE context set there yet as this is before if_ethersubr.c. MFC after: 3 days X-MFC with: r235981	2012-06-01 11:42:50 +00:00
Bjoern A. Zeeb	cace7064fc	Trim the extra $FreeBSD$ from the comment below the license. We use the __FBSDID() macro on the file now instead. MFC after: 3 days	2012-05-26 10:28:11 +00:00
Bjoern A. Zeeb	31bfc56ecd	In case forwarding is turned on for a given address family, refuse to queue the packet for LRO and tell the driver to directly pass it on. This avoids re-assembly and later re-fragmentation problems when forwarding. It's not the best solution but the simplest and most effective for the moment. Should have been done: ages ago Discussed with and by: many MFC after: 3 days	2012-05-25 08:17:59 +00:00
Bjoern A. Zeeb	62b5b6ecd0	MFp4 bz_ipv6_fast: Significantly update tcp_lro for mostly two things: 1) introduce basic support for IPv6 without extension headers. 2) try hard to also get the incremental checksum updates right, especially also in the IPv4 case for the IP and TCP header. Move variables around for better locality, factor things out into functions, allow checksum updates to be compiled out, ... Leave a few comments on further things to look at in the future, though that is not the full list. Update drivers with appropriate #includes as needed for IPv6 data type in LRO. Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems Reviewed by: gnn (as part of the whole) MFC After: 3 days	2012-05-24 23:03:23 +00:00
Bjoern A. Zeeb	27f190a3ca	Switch to a standard 2 clause BSD license (from bsd-style-copyright). Approved by: Myricom Inc. (gallatin) Approved by: Intel Corporation (jfv)	2012-05-15 13:23:44 +00:00
Colin Percival	ca7122622b	Don't allow lro->len to exceed 65535, as this will result in overflow when len is inserted back into the synthetic IP packet and cause a multiple of 2^16 bytes of TCP "packet loss". This improves Linux->FreeBSD netperf bandwidth by a factor of 300 in testing on Amazon EC2. Reviewed by: jfv MFC after: 2 weeks	2011-07-05 18:43:54 +00:00
Jack F Vogel	c31aa19c53	Port of the LRO fix from mxge driver to the generic LRO code. Thanks to Andrew Gallatin for the change. MFC after: 7 days	2011-04-07 21:20:26 +00:00
John Baldwin	79e955ed63	Trim extra spaces before tabs.	2011-01-07 21:40:34 +00:00
Kip Macy	4570959392	Don't calculate checksum if it has already been validated Obtained from: Chelsio Inc. MFC after: 3 days	2008-08-24 02:31:09 +00:00
Jack F Vogel	6c5087a818	Add generic TCP LOR into netinet	2008-06-11 22:12:50 +00:00

40 Commits