freebsd-nq

Author	SHA1	Message	Date
Michael Tuexen	5b495f17a5	Whitespace changes. The tools using to generate the sources has been updated and produces different whitespaces. Commit this seperately to avoid intermixing these with real code changes. MFC after: 3 days	2016-12-06 10:21:25 +00:00
Michael Tuexen	4ddd5aadea	Fix the handling of TCP FIN-segments in the CLOSED state When a TCP segment with the FIN bit set was received in the CLOSED state, a TCP RST-ACK-segment is sent. When computing SEG.ACK for this, the FIN counts as one byte. This accounting was missing and is fixed by this patch. Reviewed by: hiren MFC after: 1 month Sponsored by: Netflix, Inc. Differential Revision: https://svn.freebsd.org/base/head	2016-12-02 08:02:31 +00:00
Andrey V. Elsukov	dc9d21f8b0	Rework ip_tryforward() to use FIB4 KPI. Tested by: olivier Obtained from: Yandex LLC MFC after: 1 month Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D8526	2016-11-28 17:55:32 +00:00
Hiren Panchasara	2806b2933b	For RTT calculations mid-session, we explicitly ignore ACKs with tsecr of 0 as many borken middle-boxes tend to do that. But during 3whs, in syncache_expand(), we don't do that which causes us to send a RST to such a client. Relax this constraint by only using tsecr to compare against timestamp that we sent when it is not 0. As a result, we'd now accept the final ACK of 3whs with tsecr of 0. Reviewed by: jtl, gnn Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D8552	2016-11-21 20:53:11 +00:00
Michael Tuexen	35dfb8cb68	Ensure that TCP state changes to state-closing are reported via dtrace. This does not cover state changes from TIME-WAIT. Reviewed by: gnn MFC after: 3 weeks Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D8443	2016-11-19 14:45:08 +00:00
Michael Tuexen	6779a1a101	Notify the use via setting errno when a TCP RST segment is received either in the CLOSING or LAST-ACK state. Reviewed by: hiren MFC after: 3 weeks Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D8371	2016-11-17 08:15:02 +00:00
Andrey V. Elsukov	8432fa5fd9	Initialize ip6 pointer before use. PR: 214169 MFC after: 1 week	2016-11-06 02:33:04 +00:00
Hiren Panchasara	e04310d59b	Set slow start threshold more accurately on loss to be flightsize/2 instead of cwnd/2 as recommended by RFC5681. (spotted by mmacy at nextbsd dot org) Restore pre-r307901 behavior of aligning ssthresh/cwnd on mss boundary. (spotted by slawa at zxy dot spb dot ru) Tested by: dim, Slawa <slawa at zxy dot spb dot ru> MFC after: 1 month X-MFC with: r307901 Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D8349	2016-11-01 21:08:37 +00:00
Julien Charbon	f1ee30ccd6	Remove an extraneous call to soisconnected() in syncache_socket(), introduced with r261242. The useful and expected soisconnected() call is done in tcp_do_segment(). Has been found as part of unrelated PR:212920 investigation. Improve slightly (~2%) the maximum number of TCP accept per second. Tested by: kevin.bowling_kev009.com, jch Approved by: gnn, hiren MFC after: 1 week Sponsored by: Verisign, Inc Differential Revision: https://reviews.freebsd.org/D8072	2016-10-26 15:19:18 +00:00
Hiren Panchasara	4e7f755377	FreeBSD tcp stack used to inform respective congestion control module about the loss event but not use or obay the recommendations i.e. values set by it in some cases. Here is an attempt to solve that confusion by following relevant RFCs/drafts. Stack only sets congestion window/slow start threshold values when there is no CC module availalbe to take that action. All CC modules are inspected and updated when needed to take appropriate action on loss. tcp_stacks/fastpath module has been updated to adapt these changes. Note: Probably, the most significant change would be to not bring congestion window down to 1MSS on a loss signaled by 3-duplicate acks and letting respective CC decide that value. In collaboration with: Matt Macy <mmacy at nextbsd dot org> Discussed on: transport@ mailing list Reviewed by: jtl MFC after: 1 month Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D8225	2016-10-25 05:45:47 +00:00
Hiren Panchasara	dd13b7d387	Undo r307899. It needs a bit more work and proper commit log.	2016-10-25 05:07:51 +00:00
Hiren Panchasara	95d8236011	In Collaboration with: Matt Macy <mmacy at nextbsd dot com> Reviewed by: jtl Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D8225	2016-10-25 05:03:33 +00:00
Ryan Stone	6c1bd55875	Fix ip_output() on point-to-point links In r304435, ip_output() was changed to use the result of the route lookup to decide whether the outgoing packet was a broadcast or not. This introduced a regression on interfaces where IFF_BROADCAST was not set (e.g. point-to-point links), as the algorithm could incorrectly treat the destination address as a broadcast address, and ip_output() would subsequently drop the packet as broadcasting on a non-IFF_BROADCAST interface is not allowed. Differential Revision: https://reviews.freebsd.org/D8303 Reviewed by: jtl Reported by: ambrisko MFC after: 2 weeks X-MFC-With: r304435 Sponsored by: Dell EMC Isilon	2016-10-24 22:11:33 +00:00
Michael Tuexen	38d3251c3d	No functional changes, mostly getting the whitespace changes resulting from an updated formatting tool chain. MFC after: 1 month	2016-10-22 17:21:21 +00:00
Michael Tuexen	3e1465754f	Make ICMPv6 hard error handling for TCP consistent with the ICMPv4 handling. Ensure that: * Protocol unreachable errors are handled by indicating ECONNREFUSED to the TCP user for both IPv4 and IPv6. These were ignored for IPv6. * Communication prohibited errors are handled by indicating ECONNREFUSED to the TCP user for both IPv4 and IPv6. These were ignored for IPv6. * Hop Limited exceeded errors are handled by indicating EHOSTUNREACH to the TCP user for both IPv4 and IPv6. For IPv6 the TCP connected was dropped but errno wasn't set. Reviewed by: gallatin, rrs MFC after: 1 month Sponsored by: Netflix Differential Revision: 7904	2016-10-21 10:32:57 +00:00
Julien Charbon	f5cf1e5f5a	Fix a double-free when an inp transitions to INP_TIMEWAIT state after having been dropped. This fixes enforces in_pcbdrop() logic in tcp_input(): "in_pcbdrop() is used by TCP to mark an inpcb as unused and avoid future packet delivery or event notification when a socket remains open but TCP has closed." PR: 203175 Reported by: Palle Girgensohn, Slawa Olhovchenkov Tested by: Slawa Olhovchenkov Reviewed by: Slawa Olhovchenkov Approved by: gnn, Slawa Olhovchenkov Differential Revision: https://reviews.freebsd.org/D8211 MFC after: 1 week Sponsored by: Verisign, inc	2016-10-18 07:16:49 +00:00
Hiren Panchasara	784ce8fad2	Make sure tcp_mss() has the same check as tcp_mss_update() to have t_maxseg set to at least 64. This is still just a coverup to avoid kernel panic and not an actual fix. PR: 213232 Reviewed by: glebius MFC after: 1 week Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D8272	2016-10-18 02:40:25 +00:00
Patrick Kelsey	09c305eb65	Fix cases where the TFO pending counter would leak references, and eventually, memory. Also renamed some tfo labels and added/reworked comments for clarity. Based on an initial patch from jtl. PR: 213424 Reviewed by: jtl MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D8235	2016-10-15 01:41:28 +00:00
Jonathan T. Looney	82676a28eb	r307082 added the TCP_HHOOK kernel option and made some existing code only compile when that option is configured. In tcp_destroy(), the error variable is now only used in code enclosed in an '#ifdef TCP_HHOOK' block. This broke the build for VNET images. Enclose the error variable itself in an #ifdef block. Submitted by: Shawn Webb <shawn.webb at hardenedbsd.org> Reported by: Shawn Webb <shawn.webb at hardenedbsd.org> PointyHat to: jtl	2016-10-15 00:29:15 +00:00
Jonathan T. Looney	6d172f58a2	The code currently resets the keepalive timer each time a packet is received on a TCP session that has entered the ESTABLISHED state. This results in a lot of calls to reset the keepalive timer. This patch changes the behavior so we set the keepalive timer for the keepalive idle time (TP_KEEPIDLE). When the keepalive timer fires, it will first check to see if the session has been idle for TP_KEEPIDLE ticks. If not, it will reschedule the keepalive timer for the time the session will have been idle for TP_KEEPIDLE ticks. For a session with regular communication, the keepalive timer should fire approximately once every TP_KEEPIDLE ticks. For sessions with irregular communication, the keepalive timer might fire more often. But, the disruption from a periodic keepalive timer should be less than the regular cost of resetting the keepalive timer on every packet. (FWIW, this change saved approximately 1.73% of the busy CPU cycles on a particular test system with a heavy TCP output load. Of course, the actual impact is very specific to the particular hardware and workload.) Reviewed by: gallatin, rrs MFC after: 2 weeks Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D8243	2016-10-14 14:57:43 +00:00
Gleb Smirnoff	cc94f0c2d7	- Revert r300854, r303657 which tried to fix regression from r297225. - Fix the regression proper way using RO_RTFREE(). Submitted by: ae	2016-10-13 20:15:47 +00:00
Gleb Smirnoff	ec7bbf1f79	With build without TCP_HHOOK and with INVARIANTS. Before mutex.h came via sys/hhook.h -> sys/rmlock.h -> sys/mutex.h.	2016-10-13 18:02:29 +00:00
Michael Tuexen	859422cc12	Mark the socket as un-writable when it is 1-to-1 and the SCTP association is freed. MFC after: 1 month	2016-10-13 13:53:01 +00:00
Michael Tuexen	4c7fb0cf6e	Whitespace changes. MFC after: 1 month	2016-10-13 13:38:14 +00:00
Jonathan T. Looney	68bd7ed102	The TFO server-side code contains some changes that are not conditioned on the TCP_RFC7413 kernel option. This change removes those few instructions from the packet processing path. While not strictly necessary, for the sake of consistency, I applied the new IS_FASTOPEN macro to all places in the packet processing path that used the (t_flags & TF_FASTOPEN) check. Reviewed by: hiren Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D8219	2016-10-12 19:06:50 +00:00
Jonathan T. Looney	4527476029	Currently, when tcp_input() receives a packet on a session that matches a TCPCB, it checks (so->so_options & SO_ACCEPTCONN) to determine whether or not the socket is a listening socket. However, this causes the code to access a different cacheline. If we first check if the socket is in the LISTEN state, we can avoid accessing so->so_options when processing packets received for ESTABLISHED sessions. If INVARIANTS is defined, the code still needs to access both variables to check that so->so_options is consistent with the state. Reviewed by: gallatin MFC after: 1 week Sponsored by: Netflix	2016-10-12 02:30:33 +00:00
Jonathan T. Looney	bd79708dbf	In the TCP stack, the hhook(9) framework provides hooks for kernel modules to add actions that run when a TCP frame is sent or received on a TCP session in the ESTABLISHED state. In the base tree, this functionality is only used for the h_ertt module, which is used by the cc_cdg, cc_chd, cc_hd, and cc_vegas congestion control modules. Presently, we incur overhead to check for hooks each time a TCP frame is sent or received on an ESTABLISHED TCP session. This change adds a new compile-time option (TCP_HHOOK) to determine whether to include the hhook(9) framework for TCP. To retain backwards compatibility, I added the TCP_HHOOK option to every configuration file that already defined "options INET". (Therefore, this patch introduces no functional change. In order to see a functional difference, you need to compile a custom kernel without the TCP_HHOOK option.) This change will allow users to easily exclude this functionality from their kernel, should they wish to do so. Note that any users who use a custom kernel configuration and use one of the congestion control modules listed above will need to add the TCP_HHOOK option to their kernel configuration. Reviewed by: rrs, lstewart, hiren (previous version), sjg (makefiles only) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D8185	2016-10-12 02:16:42 +00:00
Mark Johnston	d748f7efcd	Lock the ND prefix list and add refcounting for prefixes. This change extends the nd6 lock to protect the ND prefix list as well as the list of advertising routers associated with each prefix. To handle cases where the nd6 lock must be dropped while iterating over either the prefix or default router lists, a generation counter is used to track modifications to the lists. Additionally, a new mutex is used to serialize prefix on-link/off-link transitions. This mutex must be acquired before the nd6 lock and is held while updating the routing table in nd6_prefix_onlink() and nd6_prefix_offlink(). Reviewed by: ae, tuexen (SCTP bits) Tested by: Jason Wolfe <jason@llnw.com>, Larry Rosenman <ler@lerctr.org> MFC after: 2 months Differential Revision: https://reviews.freebsd.org/D8125	2016-10-07 21:10:53 +00:00
Jonathan T. Looney	3ac125068a	Remove "long" variables from the TCP stack (not including the modular congestion control framework). Reviewed by: gnn, lstewart (partial) Sponsored by: Juniper Networks, Netflix Differential Revision: (multiple) Tested by: Limelight, Netflix	2016-10-06 16:28:34 +00:00
Jonathan T. Looney	0dda76b82b	If the new window size is less than the old window size, skip the calculations to check if we should advertise a larger window. Reviewed by: gnn MFC after: 2 weeks Sponsored by: Juniper Networks, Netflix Differential Revision: https://reviews.freebsd.org/D7076 Tested by: Limelight, Netflix	2016-10-06 16:09:45 +00:00
Jonathan T. Looney	15c825712e	Correctly calculate snd_max in persist case. In the persist case, take the SYN and FIN flags into account when updating the sequence space sent. Reviewed by: gnn MFC after: 2 weeks Sponsored by: Juniper Networks, Netflix Differential Revision: https://reviews.freebsd.org/D7075 Tested by: Limelight, Netflix	2016-10-06 16:00:48 +00:00
Jonathan T. Looney	55a429a6dc	Remove declaration of un-defined function tcp_seq_subtract(). Reviewed by: gnn MFC after: 1 week Sponsored by: Juniper Networks, Netflix Differential Revision: https://reviews.freebsd.org/D7055	2016-10-06 15:57:15 +00:00
Kevin Lo	c2b5ba7661	Remove an alias if_list, use if_link consistently. Reviewed by: tuexen Differential Revision: https://reviews.freebsd.org/D8075	2016-10-06 00:51:27 +00:00
Eric van Gyzen	2d9db0bc63	Add GARP retransmit capability A single gratuitous ARP (GARP) is always transmitted when an IPv4 address is added to an interface, and that is usually sufficient. However, in some circumstances, such as when a shared address is passed between cluster nodes, this single GARP may occasionally be dropped or lost. This can lead to neighbors on the network link working with a stale ARP cache and sending packets destined for that address to the node that previously owned the address, which may not respond. To avoid this situation, GARP retransmissions can be enabled by setting the net.link.ether.inet.garp_rexmit_count sysctl to a value greater than zero. The setting represents the maximum number of retransmissions. The interval between retransmissions is calculated using an exponential backoff algorithm, doubling each time, so the retransmission intervals are: {1, 2, 4, 8, 16, ...} (seconds). Due to the exponential backoff algorithm used for the interval between GARP retransmissions, the maximum number of retransmissions is limited to 16 for sanity. This limit corresponds to a maximum interval between retransmissions of 2^16 seconds ~= 18 hours. Increasing this limit is possible, but sending out GARPs spaced days apart would be of little use. Submitted by: David A. Bright <david.a.bright@dell.com> MFC after: 1 month Relnotes: yes Sponsored by: Dell EMC Differential Revision: https://reviews.freebsd.org/D7695	2016-10-02 01:42:45 +00:00
Rick Macklem	00b460ffc5	r297225 broke udp_output() for the case where the "addr" argument is NULL and the function jumps to the "release:" label. For this case, the "inp" was write locked, but the code attempted to read unlock it. This patch fixes the problem. This case could occur for NFS over UDP mounts, where the server was down for a few minutes under certain circumstances. Reported by: bde Tested by: bde Reviewed by: gnn MFC after: 2 weeks	2016-10-01 19:39:09 +00:00
Hiren Panchasara	8a56c64533	This adds a sysctl which allows you to disable the TCP hostcache. This is handy during testing of network related changes where cached entries may pollute your results, or during known congestion events where you don't want to unfairly penalize hosts. Prior to r232346 this would have meant you would break any connection with a sub 1500 MTU, as the hostcache was authoritative. All entries as they stand today should simply be used to pre populate values for efficiency. Submitted by: Jason Wolfe (j at nitrology dot com) Reviewed by: rwatson, sbruno, rrs , bz (earlier version) MFC after: 2 weeks Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D6198	2016-09-30 00:10:57 +00:00
Kurt Lidl	1d7ee746e6	Properly preserve ip_tos bits for IPv4 packets Restructure code slightly to save ip_tos bits earlier. Fix the bug where the ip_tos field is zeroed out before assigning to the iptos variable. Restore the ip_tos and ip_ver fields only if they have been zeroed during the pseudo-header checksum calculation. Reviewed by: cem, gnn, hiren MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D8053	2016-09-29 19:45:24 +00:00
Julien Charbon	c1b19923a3	Fix an issue with accept_filter introduced with r261242: As a side effect of r261242 when using accept_filter the first call to soisconnected() is done earlier in tcp_input() instead of tcp_do_segment() context. Restore the expected behaviour. Note: This call to soisconnected() seems to be extraneous in all cases (with or without accept_filter). Will be addressed in a separate commit. PR: 212920 Reported by: Alexey Tested by: Alexey, jch Sponsored by: Verisign, Inc. MFC after: 1 week	2016-09-29 11:18:48 +00:00
Kevin Lo	c7641cd18d	Remove ifa_list, use ifa_link (structure field) instead. While here, prefer if_addrhead (FreeBSD) to if_addrlist (BSD compat) naming for the interface address list in sctp_bsd_addr.c Reviewed by: tuexen Differential Revision: https://reviews.freebsd.org/D8051	2016-09-28 13:29:11 +00:00
Mariusz Zaborski	85b0f9de11	capsicum: propagate rights on accept(2) Descriptor returned by accept(2) should inherits capabilities rights from the listening socket. PR: 201052 Reviewed by: emaste, jonathan Discussed with: many Differential Revision: https://reviews.freebsd.org/D7724	2016-09-22 09:58:46 +00:00
Michael Tuexen	5cb9165556	Fix the handling of unordered fragmented user messages using DATA chunks. There were two bugs: * There was an accounting bug resulting in reporting a too small a_rwnd. * There are a bug when abandoning messages in the reassembly queue. MFC after: 4 weeks	2016-09-21 08:28:18 +00:00
Kevin Lo	c3bef61e58	Remove the 4.3BSD compatible macro m_copy(), use m_copym() instead. Reviewed by: gnn Differential Revision: https://reviews.freebsd.org/D7878	2016-09-15 07:41:48 +00:00
Michael Tuexen	5a17b6ad98	Ensure that the IPPROTO_TCP level socket options * TCP_KEEPINIT * TCP_KEEPINTVL * TCP_KEEPIDLE * TCP_KEEPCNT always always report the values currently used when getsockopt() is used. This wasn't the case when the sysctl-inherited default values where used. Ensure that the IPPROTO_TCP level socket option TCP_INFO has the TCPI_OPT_ECN flag set in the tcpi_options field when ECN support has been negotiated successfully. Reviewed by: rrs, jtl, hiren MFC after: 1 month Differential Revision: 7833	2016-09-14 14:48:00 +00:00
Dimitry Andric	6c01c0e0c6	With clang 3.9.0, compiling sys/netinet/igmp.c results in the following warning: sys/netinet/igmp.c:546:21: error: implicit conversion from 'int' to 'char' changes value from 148 to -108 [-Werror,-Wconstant-conversion] p->ipopt_list[0] = IPOPT_RA; /* Router Alert Option / ~ ^~~~~~~~ sys/netinet/ip.h:153:19: note: expanded from macro 'IPOPT_RA' #define IPOPT_RA 148 / router alert */ ^~~ This is because ipopt_list is an array of char, so IPOPT_RA is wrapped to a negative value. It would be nice to change ipopt_list to an array of u_char, but it changes the signature of the public struct ipoption, so add an explicit cast to suppress the warning. Reviewed by: imp MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D7777	2016-09-04 17:23:10 +00:00
Hiren Panchasara	06b99bd826	Adjust TCP module fastpath after r304803's cc_ack_received() changes. Reported by: hiren, bz, np Reviewed by: rrs Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D7664	2016-08-26 19:23:17 +00:00
Hiren Panchasara	e7106d6be2	Update TCPS_HAVERCVDFIN() macro to correctly include all states a connection can be in after receiving a FIN. FWIW, NetBSD has this change for quite some time. This has been tested at Netflix and Limelight in production traffic. Reported by: Sam Kumar <samkumar99 at gmail.com> on transport@ Reviewed by: rrs MFC after: 4 weeks Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D7475	2016-08-26 17:48:54 +00:00
Michael Tuexen	91843cf34e	Fix a bug, where no SACK is sent when receiving a FORWARD-TSN or I-FORWARD-TSN chunk before any DATA or I-DATA chunk. Thanks to Julian Cordes for finding this problem and prividing packetdrill scripts to reporduce the issue. MFC after: 3 days	2016-08-26 07:49:23 +00:00
Lawrence Stewart	4b7b743c16	Pass the number of segments coalesced by LRO up the stack by repurposing the tso_segsz pkthdr field during RX processing, and use the information in TCP for more correct accounting and as a congestion control input. This is only a start, and an audit of other uses for the data is left as future work. Reviewed by: gallatin, rrs Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D7564	2016-08-25 13:33:32 +00:00
Michael Tuexen	884d8c53e6	When aborting an association, send the ABORT before notifying the upper layer. For the kernel this doesn't matter, for the userland stack, it does. While there, silence a clang warning when compiling it in userland.	2016-08-24 06:22:53 +00:00
Ryan Stone	23424a2021	Temporarily disable the optimization from r304436 r304436 attempted to optimize the handling of incoming UDP packet by only making an expensive call to in_broadcast() if the mbuf was marked as an broadcast packet. Unfortunately, this cannot work in the case of point-to- point L2 protocols like PPP, which have no notion of "broadcast". Discussions on how to properly fix r304436 are ongoing, but in the meantime disable the optimization to ensure that no existing network setups are broken. Reported by: bms	2016-08-22 15:27:37 +00:00

1 2 3 4 5 ...

5663 Commits