freebsd-dev

Author	SHA1	Message	Date
Gleb Smirnoff	eb1b1807af	Mechanically substitute flags from historic mbuf allocator with malloc(9) flags within sys. Exceptions: - sys/contrib not touched - sys/mbuf.h edited manually	2012-12-05 08:04:20 +00:00
Andre Oppermann	da2299c5c7	Remove unused and unnecessary CSUM_IP_FRAGS checksumming capability. Checksumming the IP header of fragments is no different from doing normal IP headers. Discussed with: yongari MFC after: 1 week	2012-11-27 19:31:49 +00:00
Andre Oppermann	13feab8286	Add DELACK to list of timers. MFC after: 1 week	2012-11-27 19:07:28 +00:00
Navdeep Parhar	825fd1e437	Make sure that tcp_timer_activate() correctly sees TCP_OFFLOAD (or not).	2012-11-27 06:42:44 +00:00
Alfred Perlstein	08373e0bc4	Auto size the tcbhashsize structure based on max sockets. While here, also make the code that enforces power-of-two more forgiving, instead of just resetting to 512, graciously round-down to the next lower power of two.	2012-11-27 03:04:24 +00:00
Michael Tuexen	a50f0e3152	Add support for sctp_peeloff() also in the front states of the association. MFC after: 3 days	2012-11-26 16:44:03 +00:00
Michael Tuexen	e3976bb8d7	Find the endpoint for an incoming packet also if the endpoint comes from sctp_peeloff(). MFC after: 3 days	2012-11-26 16:43:32 +00:00
Michael Tuexen	440da2d35b	Allow shutdown() to be used on fds returned from sctp_peeloff(). MFC after: 3 days	2012-11-26 08:50:00 +00:00
Michael Tuexen	a3158782c2	Remove unused function. MFC after: 1 week	2012-11-25 14:25:08 +00:00
Michael Tuexen	3a51a2647a	Add support for SCTP/UDP/IPV6. This completes the support of http://tools.ietf.org/html/draft-ietf-tsvwg-sctp-udp-encaps MFC after: 1 week	2012-11-17 20:04:04 +00:00
Michael Tuexen	325c8c46b1	Get the accounting working. We now have counters how many chunks for each SCTP outgoing stream are in the send and sent queue. While there, improve the naming of NR-SACK related constants recently introduced. MFC after: 1 week	2012-11-16 19:39:10 +00:00
Roman Divacky	8252626fb4	Initialize hdrlen to 0 to avoid clang warning in NOINET case.	2012-11-10 10:41:00 +00:00
Bjoern A. Zeeb	ec89d0398b	Cleanup some whitspace in this file to get it out of an upcoming patch. MFC after: 10 days	2012-11-08 03:29:55 +00:00
Michael Tuexen	a7ad6026e0	Add per outgoing stream accounting for chunks in the send and sent queue. This provides no functional change, but is a preparation for an upcoming stream reset improvement. Done with rrs@. MFC after: 1 week	2012-11-07 22:11:38 +00:00
Michael Tuexen	2a4985847a	Add some missing changes missed in the last commit. MFC after: 1 week X-MFC with: 242708	2012-11-07 21:25:32 +00:00
Michael Tuexen	98f2956c11	Improve PR-SCTP if used in combination with NR-SACK. Based on work done by Mohammad Rajiullah. MFC after: 1 week	2012-11-07 20:59:00 +00:00
Kevin Lo	0f5e7edc14	Fix typo; s/ouput/output	2012-11-07 07:00:59 +00:00
Mateusz Guzik	8e1e6e5f4a	Fix possible spurious sbunlock in sctp_sorecvmsg. Reviewed by: tuexen Approved by: trasz (mentor) MFC after: 3 days	2012-11-06 23:04:23 +00:00
Michael Tuexen	f3b05218ea	Move from early SSN assignment to late SSN assignment. This doesn't change functionality, but makes upcoming change much easier. Developed with rrs@ at the IETF 85. MFC after: 1 week	2012-11-05 20:55:17 +00:00
Andre Oppermann	60ee3bb213	Back out r242262. The simplified window change/update logic wasn't complete and ready for production use. PR: kern/173309	2012-11-05 09:13:06 +00:00
Andrey V. Elsukov	ffdbf9da3b	Remove the recently added sysctl variable net.pfil.forward. Instead, add protocol specific mbuf flags M_IP_NEXTHOP and M_IP6_NEXTHOP. Use them to indicate that the mbuf's chain contains the PACKET_TAG_IPFORWARD tag. And do a tag lookup only when this flag is set. Suggested by: andre	2012-11-02 01:20:55 +00:00
Michael Tuexen	21f67da7c4	Whitespace changes due to upstream integration of SCTP changes in the FreeBSD code base.	2012-10-29 20:47:32 +00:00
Michael Tuexen	24d4ce2c87	Add braces (as used elsewhere in the SCTP code).	2012-10-29 20:44:29 +00:00
Michael Tuexen	09c1c8563a	Use ntohs() and htons() in correct order. However, this doesn't change functionality.	2012-10-29 20:42:48 +00:00
Andre Oppermann	78f59b4bfd	Forced commit to provide the correct commit message to r242251: Defer sending an independent window update if a delayed ACK is pending saving a packet. The window update then gets piggy-backed on the next already scheduled ACK. Added grammar fixes as well. MFC after: 2 weeks	2012-10-29 13:16:33 +00:00
Andre Oppermann	8d045dbdf3	Define the delayed ACK timeout value directly as hz/10 instead of obfuscating it by going through PR_FASTHZ. No functional change. MFC after: 2 weeks	2012-10-29 12:17:02 +00:00
Andre Oppermann	322181c98e	If the user has closed the socket then drop a persisting connection after a much reduced timeout. Typically web servers close their sockets quickly under the assumption that the TCP connections goes away as well. That is not entirely true however. If the peer closed the window we're going to wait for a long time with lots of data in the send buffer. MFC after: 2 weeks	2012-10-28 19:58:20 +00:00
Andre Oppermann	09440655fe	Increase the initial CWND to 10 segments as defined in IETF TCPM draft-ietf-tcpm-initcwnd-05. It explains why the increased initial window improves the overall performance of many web services without risking congestion collapse. As long as it remains a draft it is placed under a sysctl marking it as experimental: net.inet.tcp.experimental.initcwnd10 = 1 When it becomes an official RFC soon the sysctl will be changed to the RFC number and moved to net.inet.tcp. This implementation differs from the RFC draft in that it is a bit more conservative in the case of packet loss on SYN or SYN\|ACK because we haven't reduced the default RTO to 1 second yet. Also the restart window isn't yet increased as allowed. Both will be adjusted with upcoming changes. Is is enabled by default. In Linux it is enabled since kernel 3.0. MFC after: 2 weeks	2012-10-28 19:47:46 +00:00
Andre Oppermann	77339e1cdc	Update comment to reflect the change made in r242263. MFC after: 2 weeks	2012-10-28 19:22:18 +00:00
Andre Oppermann	c4ab59c1a1	Add SACK_PERMIT to the list of TCP options that are switched off after retransmitting a SYN three times. MFC after: 2 weeks	2012-10-28 19:20:23 +00:00
Andre Oppermann	79ce26a08c	Simplify and enhance the window change/update acceptance logic, especially in the presence of bi-directional data transfers. snd_wl1 tracks the right edge, including data in the reassembly queue, of valid incoming data. This makes it like rcv_nxt plus reassembly. It never goes backwards to prevent older, possibly reordered segments from updating the window. snd_wl2 tracks the left edge of sent data. This makes it a duplicate of snd_una. However joining them right now is difficult due to separate update dependencies in different places in the code flow. snd_wnd tracks the current advertized send window by the peer. In tcp_output() the effective window is calculated by subtracting the already in-flight data, snd_nxt less snd_una, from it. ACK's become the main clock of window updates and will always update the window when the left edge of what we sent is advanced. The ACK clock is the primary signaling mechanism in ongoing data transfers. This works reliably even in the presence of reordering, reassembly and retransmitted segments. The ACK clock is most important because it determines how much data we are allowed to inject into the network. Zero window updates get us out of persistence mode are crucial. Here a segment that neither moves ACK nor SEQ but enlarges WND is accepted. When the ACK clock is not active (that is we're not or no longer sending any data) any segment that moves the extended right SEQ edge, including out-of-order segments, updates the window. This gives us updates especially during ping-pong transfers where the peer isn't done consuming the already acknowledged data from the receive buffer while responding with data. The SSH protocol is a prime candidate to benefit from the improved bi-directional window update logic as it has its own windowing mechanism on top of TCP and is frequently sending back protocol ACK's. Tcpdump provided by: darrenr Tested by: darrenr MFC after: 2 weeks	2012-10-28 19:16:22 +00:00
Andre Oppermann	024fd5b6bb	For retransmits of SYN\|ACK from the syncache use the slightly more aggressive special tcp_syn_backoff[] retransmit schedule instead of the normal tcp_backoff[] schedule for established connections. MFC after: 2 weeks	2012-10-28 19:02:07 +00:00
Andre Oppermann	f4748ef5fb	When retransmitting SYN in TCPS_SYN_SENT state use TCPTV_RTOBASE, the default retransmit timeout, as base to calculate the backoff time until next try instead of the TCP_REXMTVAL() macro which only works correctly when we already have measured an actual RTT+RTTVAR. Before it would cause the first retransmit at RTOBASE, the next four at the same time (!) about 200ms later, and then another one again RTOBASE later. MFC after: 2 weeks	2012-10-28 18:56:57 +00:00
Andre Oppermann	602e8e45ee	Remove bogus 'else' in #ifdef that prevented the rttvar from being reset tcp_timer_rexmt() on retransmit for IPv6 sessions. MFC after: 2 weeks	2012-10-28 18:45:04 +00:00
Andre Oppermann	4faaea5505	Allow arbitrary MSS sizes and don't mind about the cluster size anymore. We've got more cluster sizes for quite some time now and the orginally imposed limits and the previously codified thoughts on efficiency gains are no longer true. MFC after: 2 weeks	2012-10-28 18:33:52 +00:00
Andre Oppermann	f3a10d7954	Change the syncache count reporting the current number of entries from an unprotected u_int that reports garbage on SMP to a function based sysctl obtaining the current value from UMA. Also read back the actual cache_limit after page size rounding by UMA. PR: kern/165879 MFC after: 2 weeks	2012-10-28 18:07:34 +00:00
Andre Oppermann	aafa0b4164	Simplify implementation of net.inet.tcp.reass.maxsegments and net.inet.tcp.reass.cursegments. MFC after: 2 weeks	2012-10-28 17:59:46 +00:00
Andre Oppermann	f62563d33c	Prevent a flurry of forced window updates when an application is doing small reads on a (partially) filled receive socket buffer. Normally one would a send a window update every time the available space in the socket buffer increases by two times MSS. This leads to a flurry of window updates that do not provide any meaningful new information to the sender. There still is available space in the window and the sender can continue sending data. All window updates then get carried by the regular ACKs. Only when the socket buffer was (almost) full and the window closed accordingly a window updates delivery new information and allows the sender to start sending more data again. Send window updates only every two MSS when the socket buffer has less than 1/8 space available, or the available space in the socket buffer increased by 1/4 its full capacity, or the socket buffer is very small. The next regular data ACK will carry and report the exact window size again. Reported by: sbruno Tested by: darrenr Tested by: Darren Baginski PR: kern/116335 MFC after: 2 weeks	2012-10-28 17:40:35 +00:00
Andre Oppermann	4249614cb0	When SYN or SYN/ACK had to be retransmitted RFC5681 requires us to reduce the initial CWND to one segment. This reduction got lost some time ago due to a change in initialization ordering. Additionally in tcp_timer_rexmt() avoid entering fast recovery when we're still in TCPS_SYN_SENT state. MFC after: 2 weeks	2012-10-28 17:30:28 +00:00
Andre Oppermann	cf8f04f4c0	When SYN or SYN/ACK had to be retransmitted RFC5681 requires us to reduce the initial CWND to one segment. This reduction got lost some time ago due to a change in initialization ordering. Additionally in tcp_timer_rexmt() avoid entering fast recovery when we're still in TCPS_SYN_SENT state. MFC after: 2 weeks	2012-10-28 17:25:08 +00:00
Andre Oppermann	22efabd40c	Adjust the initial default CWND upon connection establishment to the new and increased values specified by RFC5681 Section 3.1. The even larger initial CWND per RFC3390, if enabled, is not affected. MFC after: 2 weeks	2012-10-28 17:16:09 +00:00
Gleb Smirnoff	078468ede4	o Remove last argument to ip_fragment(), and obtain all needed information on checksums directly from mbuf flags. This simplifies code. o Clear CSUM_IP from the mbuf in ip_fragment() if we did checksums in hardware. Some driver may not announce CSUM_IP in theur if_hwassist, although try to do checksums if CSUM_IP set on mbuf. Example is em(4). o While here, consistently use CSUM_IP instead of its alias CSUM_DELAY_IP. After this change CSUM_DELAY_IP vanishes from the stack. Submitted by: Sebastian Kuzminsky <seb lineratesystems.com>	2012-10-26 21:06:33 +00:00
Andrey V. Elsukov	c1de64a495	Remove the IPFIREWALL_FORWARD kernel option and make possible to turn on the related functionality in the runtime via the sysctl variable net.pfil.forward. It is turned off by default. Sponsored by: Yandex LLC Discussed with: net@ MFC after: 2 weeks	2012-10-25 09:39:14 +00:00
Gleb Smirnoff	a7f707cd37	After r241923 the updated ip_len no longer needed.	2012-10-25 09:02:21 +00:00
Gleb Smirnoff	b6fcf6f9f5	Fix error in r241913 that had broken fragment reassembly.	2012-10-25 09:00:57 +00:00
Gleb Smirnoff	9e2a372fd2	Use ip_stripoptions() instead of handrolled version.	2012-10-23 10:30:09 +00:00
Gleb Smirnoff	4937a6561f	Simplify ip_stripoptions() reducing number of intermediate variables.	2012-10-23 10:29:31 +00:00
Gleb Smirnoff	8ad458a471	Do not reduce ip_len by size of IP header in the ip_input() before passing a packet to protocol input routines. For several protocols this mean that now protocol needs to do subtraction itself, and for another half this means that we do not need to add header length back to the packet. Make ip_stripoptions() to adjust ip_len, since now we enter this function with a packet header whose ip_len does represent length of entire packet, not payload only.	2012-10-23 08:33:13 +00:00
Xin LI	6f56329a25	Remove __P. Submitted by: kevlo Reviewed by: md5(1) MFC after: 2 months	2012-10-22 21:49:56 +00:00
Gleb Smirnoff	8f134647ca	Switch the entire IPv4 stack to keep the IP packet header in network byte order. Any host byte order processing is done in local variables and host byte order values are never[1] written to a packet. After this change a packet processed by the stack isn't modified at all[2] except for TTL. After this change a network stack hacker doesn't need to scratch his head trying to figure out what is the byte order at the given place in the stack. [1] One exception still remains. The raw sockets convert host byte order before pass a packet to an application. Probably this would remain for ages for compatibility. [2] The ip_input() still subtructs header len from ip->ip_len, but this is planned to be fixed soon. Reviewed by: luigi, Maxim Dounin <mdounin mdounin.ru> Tested by: ray, Olivier Cochard-Labbe <olivier cochard.me>	2012-10-22 21:09:03 +00:00

1 2 3 4 5 ...

4520 Commits