freebsd-skq

Author	SHA1	Message	Date
Michael Tuexen	8e1b295f09	Fix the PR-SCTP behaviour. This is done by rrs@. MFC after: 3 days	2016-07-17 13:14:51 +00:00
Michael Tuexen	26f0250adf	Add missing sctps_reasmusrmsgs counter. Joint work with rrs@. MFC after: 3 days	2016-07-17 08:31:21 +00:00
Michael Tuexen	643fd575da	Deal with a portential memory allocation failure, which was reported by the clang static code analyzer. Joint work with rrs@. MFC after: 3 days	2016-07-16 12:25:37 +00:00
Michael Tuexen	b4bf72213a	Don't free a data chunk twice. Found by the clang static code analyzer running for the userland stack. MFC after: 3 days	2016-07-16 08:11:43 +00:00
Michael Tuexen	56d2f7d8e5	Address a potential memory leak found a the clang static code analyzer running on the userland stack. MFC after: 3 days	2016-07-16 07:48:01 +00:00
Jonathan T. Looney	24b9bb5614	The TCPPCAP debugging feature caches recently-used mbufs for use in debugging TCP connections. This commit provides a mechanism to free those mbufs when the system is under memory pressure. Because this will result in lost debugging information, the behavior is controllable by a sysctl. The default setting is to free the mbufs. Reviewed by: gnn Approved by: re (gjb) Differential Revision: https://reviews.freebsd.org/D6931 Input from: novice_techie.com	2016-07-06 16:17:13 +00:00
Nathan Whitehorn	96c85efb4b	Replace a number of conflations of mp_ncpus and mp_maxid with either mp_maxid or CPU_FOREACH() as appropriate. This fixes a number of places in the kernel that assumed CPU IDs are dense in [0, mp_ncpus) and would try, for example, to run tasks on CPUs that did not exist or to allocate too few buffers on systems with sparse CPU IDs in which there are holes in the range and mp_maxid > mp_ncpus. Such circumstances generally occur on systems with SMT, but on which SMT is disabled. This patch restores system operation at least on POWER8 systems configured in this way. There are a number of other places in the kernel with potential problems in these situations, but where sparse CPU IDs are not currently known to occur, mostly in the ARM machine-dependent code. These will be fixed in a follow-up commit after the stable/11 branch. PR: kern/210106 Reviewed by: jhb Approved by: re (glebius)	2016-07-06 14:09:49 +00:00
Michael Tuexen	e75f31c1d0	This patch fixes two bugs related to the setting of the I-Bit for SCTP DATA and I-DATA chunks. * For fragmented user messages, set the I-Bit only on the last fragment. * When using explicit EOR mode, set the I-Bit on the last fragment, whenever SCTP_SACK_IMMEDIATELY was set in snd_flags for any of the send() calls. Approved by: re (hrs) MFC after: 1 week	2016-06-30 06:06:35 +00:00
Michael Tuexen	ab3373140d	This patch fixes two bugs related to the SCTP message recovery for messages which have been put on the send queue: * Do not report any DATA or I-DATA chunk padding. * Correctly deal with the I-DATA chunk header instead of the DATA chunk header when the I-DATA extension is used. Approved by: re (kib) MFC after: 1 week	2016-06-26 16:38:42 +00:00
Michael Tuexen	d1b52c6a01	This patch fixes a locking bug when a send() call blocks on an SCTP socket and the association is aborted by the peer. Approved by: re (kib) MFC after: 1 week	2016-06-26 12:41:02 +00:00
Bjoern A. Zeeb	42644eb88e	Try to avoid a 2nd conditional by re-writing the loop, pause, and escape clause another time. Submitted by: jhb Approved by: re (gjb) MFC after: 12 days	2016-06-23 21:32:52 +00:00
Navdeep Parhar	f22bfc72f8	Add spares to struct ifnet and socket for packet pacing and/or general use. Update comments regarding the spare fields in struct inpcb. Bump __FreeBSD_version for the changes to the size of the structures. Reviewed by: gnn@ Approved by: re@ (gjb@) Sponsored by: Chelsio Communications	2016-06-23 21:07:15 +00:00
Michael Tuexen	9de217ce56	Fix a bug in the handling of non-blocking SCTP 1-to-1 sockets. When using this code in the userland stack, it could result in a loop. This happened on iOS. However, I was not able to reproduce this when using the code in the kernel. Thanks to Eugen-Andrei Gavriloaie for reporting the issue and proving detailed information to find the root of the problem. Approved by: re (gjb) MFC after: 1 week	2016-06-23 19:27:29 +00:00
Bjoern A. Zeeb	361279a4fb	In VNET TCP teardown Do not sleep unconditionally but only if we have any TCP connections left. Submitted by: zec Approved by: re (hrs) MFC after: 13 days	2016-06-23 11:55:15 +00:00
Michael Tuexen	55b8cd93ef	Don't consider the socket when processing an incoming ICMP/ICMP6 packet, which was triggered by an SCTP packet. Whether a socket exists, is just not relevant. Approved by: re (kib) MFC after: 1 week	2016-06-23 09:13:15 +00:00
Bjoern A. Zeeb	b54e08e11a	Check the V_tcbinfo.ipi_count to hit 0 before doing the full TCP cleanup. That way timers can finish cleanly and we do not gamble with a DELAY(). Reviewed by: gnn, jtl Approved by: re (gjb) Obtained from: projects/vnet MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D6923	2016-06-23 00:34:03 +00:00
Bjoern A. Zeeb	252a46f324	No longer mark TCP TW zone NO_FREE. Timewait code does a proper cleanup after itself. Reviewed by: gnn Approved by: re (gjb) Obtained from: projects/vnet MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D6922	2016-06-23 00:32:58 +00:00
Bjoern A. Zeeb	89856f7e2d	Get closer to a VIMAGE network stack teardown from top to bottom rather than removing the network interfaces first. This change is rather larger and convoluted as the ordering requirements cannot be separated. Move the pfil(9) framework to SI_SUB_PROTO_PFIL, move Firewalls and related modules to their own SI_SUB_PROTO_FIREWALL. Move initialization of "physical" interfaces to SI_SUB_DRIVERS, move virtual (cloned) interfaces to SI_SUB_PSEUDO. Move Multicast to SI_SUB_PROTO_MC. Re-work parts of multicast initialisation and teardown, not taking the huge amount of memory into account if used as a module yet. For interface teardown we try to do as many of them as we can on SI_SUB_INIT_IF, but for some this makes no sense, e.g., when tunnelling over a higher layer protocol such as IP. In that case the interface has to go along (or before) the higher layer protocol is shutdown. Kernel hhooks need to go last on teardown as they may be used at various higher layers and we cannot remove them before we cleaned up the higher layers. For interface teardown there are multiple paths: (a) a cloned interface is destroyed (inside a VIMAGE or in the base system), (b) any interface is moved from a virtual network stack to a different network stack ("vmove"), or (c) a virtual network stack is being shut down. All code paths go through if_detach_internal() where we, depending on the vmove flag or the vnet state, make a decision on how much to shut down; in case we are destroying a VNET the individual protocol layers will cleanup their own parts thus we cannot do so again for each interface as we end up with, e.g., double-frees, destroying locks twice or acquiring already destroyed locks. When calling into protocol cleanups we equally have to tell them whether they need to detach upper layer protocols ("ulp") or not (e.g., in6_ifdetach()). Provide or enahnce helper functions to do proper cleanup at a protocol rather than at an interface level. Approved by: re (hrs) Obtained from: projects/vnet Reviewed by: gnn, jhb Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D6747	2016-06-21 13:48:49 +00:00
Andrey V. Elsukov	4c10540274	Cleanup unneded include "opt_ipfw.h". It was used for conditional build IPFIREWALL_FORWARD support. But IPFIREWALL_FORWARD option was removed a long time ago.	2016-06-09 05:48:34 +00:00
Michael Tuexen	63d5b56815	Use a separate MID counter for ordered und unordered messages for each outgoing stream. Thanks to Jens Hoelscher for reporting the issue. MFC after: 1 week	2016-06-08 17:57:42 +00:00
Sepherosa Ziehau	36ad8372d4	net: Use M_HASHTYPE_OPAQUE_HASH if the mbuf flowid has hash properties Reviewed by: hps, erj, tuexen Sponsored by: Microsoft OSTC Differential Revision: https://reviews.freebsd.org/D6688	2016-06-07 04:51:50 +00:00
Bjoern A. Zeeb	b941cb8d60	Add a `show igi_list` command to DDB to debug IGMP state. Obtained from: projects/vnet MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2016-06-06 22:26:18 +00:00
Bjoern A. Zeeb	a6c96fc2f0	Destroy the mutex last. In this case it should not matter, but generally cleanup code might still acquire it thus try to be consistent destroying locks late. Obtained from: projects/vnet MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2016-06-06 13:04:22 +00:00
George V. Neville-Neil	8b2b91eee0	Add missing constants from RFCs 4443 and 6550	2016-06-06 00:35:45 +00:00
Bjoern A. Zeeb	484149def8	Introduce a per-VNET flag to enable/disable netisr prcessing on that VNET. Add accessor functions to toggle the state per VNET. The base system (vnet0) will always enable itself with the normal registration. We will share the registered protocol handlers in all VNETs minimising duplication and management. Upon disabling netisr processing for a VNET drain the netisr queue from packets for that VNET. Update netisr consumers to (de)register on a per-VNET start/teardown using VNET_SYS(UN)INIT functionality. The change should be transparent for non-VIMAGE kernels. Reviewed by: gnn (, hiren) Obtained from: projects/vnet MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D6691	2016-06-03 13:57:10 +00:00
Hans Petter Selasky	ec6689059d	Use insertion sort instead of bubble sort in TCP LRO. Replacing the bubble sort with insertion sort gives an 80% reduction in runtime on average, with randomized keys, for small partitions. If the keys are pre-sorted, insertion sort runs in linear time, and even if the keys are reversed, insertion sort is faster than bubble sort, although not by much. Update comment describing "tcp_lro_sort()" while at it. Differential Revision: https://reviews.freebsd.org/D6619 Sponsored by: Mellanox Technologies Tested by: Netflix Suggested by: Pieter de Goeje <pieter@degoeje.nl> Reviewed by: ed, gallatin, gnn, transport	2016-06-03 08:35:07 +00:00
Michael Tuexen	273e31638f	Get struct sctp_net_route in-sync with struct route again.	2016-06-03 07:43:04 +00:00
Michael Tuexen	565cccce37	Store the peers vtag in host byte order in the cookie, since all consumers expect it that way. This fixes the vtag when sending en ERROR chunk. MFC after: 1 week	2016-06-03 07:24:41 +00:00
George V. Neville-Neil	6d76822688	This change re-adds L2 caching for TCP and UDP, as originally added in D4306 but removed due to other changes in the system. Restore the llentry pointer to the "struct route", and use it to cache the L2 lookup (ARP or ND6) as appropriate. Submitted by: Mike Karels Differential Revision: https://reviews.freebsd.org/D6262	2016-06-02 17:51:29 +00:00
Bjoern A. Zeeb	3f58662dd9	The pr_destroy field does not allow us to run the teardown code in a specific order. VNET_SYSUNINITs however are doing exactly that. Thus remove the VIMAGE conditional field from the domain(9) protosw structure and replace it with VNET_SYSUNINITs. This also allows us to change some order and to make the teardown functions file local static. Also convert divert(4) as it uses the same mechanism ip(4) and ip6(4) use internally. Slightly reshuffle the SI_SUB_* fields in kernel.h and add a new ones, e.g., for pfil consumers (firewalls), partially for this commit and for others to come. Reviewed by: gnn, tuexen (sctp), jhb (kernel.h) Obtained from: projects/vnet MFC after: 2 weeks X-MFC: do not remove pr_destroy Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D6652	2016-06-01 10:14:04 +00:00
Michael Tuexen	3d48d25be7	Add PR_CONNREQUIRED for SOCK_STREAM sockets using SCTP. This is required to signal connetion setup on non-blocking sockets via becoming writable. This still allows for implicit connection setup. MFC after: 1 week	2016-05-30 18:24:23 +00:00
Michael Tuexen	7b7f31e6cf	Fix a byte order issue for the scope stored in the SCTP cookie. MFC after: 1 week	2016-05-30 11:18:39 +00:00
Sepherosa Ziehau	425b763928	tcp: Don't prematurely drop receiving-only connections If the connection was persistent and receiving-only, several (12) sporadic device insufficient buffers would cause the connection be dropped prematurely: Upon ENOBUFS in tcp_output() for an ACK, retransmission timer is started. No one will stop this retransmission timer for receiving- only connection, so the retransmission timer promises to expire and t_rxtshift is promised to be increased. And t_rxtshift will not be reset to 0, since no RTT measurement will be done for receiving-only connection. If this receiving-only connection lived long enough (e.g. >350sec, given the RTO starts from 200ms), and it suffered 12 sporadic device insufficient buffers, i.e. t_rxtshift >= 12, this receiving-only connection would be dropped prematurely by the retransmission timer. We now assert that for data segments, SYNs or FINs either rexmit or persist timer was wired upon ENOBUFS. And don't set rexmit timer for other cases, i.e. ENOBUFS upon ACKs. Discussed with: lstewart, hiren, jtl, Mike Karels MFC after: 3 weeks Sponsored by: Microsoft OSTC Differential Revision: https://reviews.freebsd.org/D5872	2016-05-30 03:31:37 +00:00
Gleb Smirnoff	6351b3857b	Plug route reference underleak that happens with FLOWTABLE after r297225. Submitted by: Mike Karels <mike karels.net>	2016-05-27 17:31:02 +00:00
Don Lewis	91336b403a	Import Dummynet AQM version 0.2.1 (CoDel, FQ-CoDel, PIE and FQ-PIE). Centre for Advanced Internet Architectures Implementing AQM in FreeBSD * Overview <http://caia.swin.edu.au/freebsd/aqm/index.html> * Articles, Papers and Presentations <http://caia.swin.edu.au/freebsd/aqm/papers.html> * Patches and Tools <http://caia.swin.edu.au/freebsd/aqm/downloads.html> Overview Recent years have seen a resurgence of interest in better managing the depth of bottleneck queues in routers, switches and other places that get congested. Solutions include transport protocol enhancements at the end-hosts (such as delay-based or hybrid congestion control schemes) and active queue management (AQM) schemes applied within bottleneck queues. The notion of AQM has been around since at least the late 1990s (e.g. RFC 2309). In recent years the proliferation of oversized buffers in all sorts of network devices (aka bufferbloat) has stimulated keen community interest in four new AQM schemes -- CoDel, FQ-CoDel, PIE and FQ-PIE. The IETF AQM working group is looking to document these schemes, and independent implementations are a corner-stone of the IETF's process for confirming the clarity of publicly available protocol descriptions. While significant development work on all three schemes has occured in the Linux kernel, there is very little in FreeBSD. Project Goals This project began in late 2015, and aims to design and implement functionally-correct versions of CoDel, FQ-CoDel, PIE and FQ_PIE in FreeBSD (with code BSD-licensed as much as practical). We have chosen to do this as extensions to FreeBSD's ipfw/dummynet firewall and traffic shaper. Implementation of these AQM schemes in FreeBSD will: * Demonstrate whether the publicly available documentation is sufficient to enable independent, functionally equivalent implementations * Provide a broader suite of AQM options for sections the networking community that rely on FreeBSD platforms Program Members: * Rasool Al Saadi (developer) * Grenville Armitage (project lead) Acknowledgements: This project has been made possible in part by a gift from the Comcast Innovation Fund. Submitted by: Rasool Al-Saadi <ralsaadi@swin.edu.au> X-No objection: core MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D6388	2016-05-26 21:40:13 +00:00
John Baldwin	052a5418e8	Don't reuse the source mbuf in tcp_respond() if it is not writable. Not all mbufs passed up from device drivers are M_WRITABLE(). In particular, the Chelsio T4/T5 driver uses a feature called "buffer packing" to receive multiple frames in a single receive buffer. The mbufs for these frames all share the same external storage so are treated as read-only by the rest of the stack when multiple frames are in flight. Previously tcp_respond() would blindly overwrite read-only mbufs when INVARIANTS was disabled or panic with an assertion failure if INVARIANTS was enabled. Note that the new case is a bit of a mix of the two other cases in tcp_respond(). The TCP and IP headers must be copied explicitly into the new mbuf instead of being inherited (similar to the m == NULL case), but the addresses and ports must be swapped in the reply (similar to the m != NULL case). Reviewed by: glebius	2016-05-26 18:35:37 +00:00
Michael Tuexen	4f3b84b524	Make struct sctp_paddrthlds compliant to RFC 7829.	2016-05-26 11:38:26 +00:00
Hans Petter Selasky	fc271df341	Use optimised complexity safe sorting routine instead of the kernel's "qsort()". The kernel's "qsort()" routine can in worst case spend O(N*N) amount of comparisons before the input array is sorted. It can also recurse a significant amount of times using up the kernel's interrupt thread stack. The custom sorting routine takes advantage of that the sorting key is only 64 bits. Based on set and cleared bits in the sorting key it partitions the array until it is sorted. This process has a recursion limit of 64 times, due to the number of set and cleared bits which can occur. Compiled with -O2 the sorting routine was measured to use 64-bytes of stack. Multiplying this by 64 gives a maximum stack consumption of 4096 bytes for AMD64. The same applies to the execution time, that the array to be sorted will not be traversed more than 64 times. When serving roughly 80Gb/s with 80K TCP connections, the old method consisting of "qsort()" and "tcp_lro_mbuf_compare_header()" used 1.4% CPU, while the new "tcp_lro_sort()" used 1.1% for LRO related sorting as measured by Intel Vtune. The testing was done using a sysctl to toggle between "qsort()" and "tcp_lro_sort()". Differential Revision: https://reviews.freebsd.org/D6472 Sponsored by: Mellanox Technologies Tested by: Netflix Reviewed by: gallatin, rrs, sephe, transport	2016-05-26 11:10:31 +00:00
Michael Tuexen	f88d0cfe7a	When sending in ICMP response to an SCTP packet, * include the SCTP common header, if possible * include the first 8 bytes of the INIT chunk, if possible This provides the necesary information for the receiver of the ICMP packet to process it. MFC after: 1 week	2016-05-25 22:16:11 +00:00
Michael Tuexen	6d7270a580	Send an ICMP packet indicating destination unreachable/protocol unreachable if we don't handle the packet in the kernel and not in userspace. MFC after: 1 week	2016-05-25 15:54:21 +00:00
Michael Tuexen	ad2cbb09ef	Count packets as not being delivered only if they are neither processed by a kernel handler nor by a raw socket. MFC after: 1 week	2016-05-25 13:48:26 +00:00
Don Lewis	883054b4c3	Change net.inet.tcp.ecn.enable sysctl mib from a binary off/on control to a three way setting. 0 - Totally disable ECN. (no change) 1 - Enable ECN if incoming connections request it. Outgoing connections will request ECN. (no change from present != 0 setting) 2 - Enable ECN if incoming connections request it. Outgoing conections will not request ECN. Change the default value of net.inet.tcp.ecn.enable from 0 to 2. Linux version 2.4.20 and newer, Solaris, and Mac OS X 10.5 and newer have similar capabilities. The actual values above match Linux, and the default matches the current Linux default. Reviewed by: eadler MFC after: 1 month MFH: yes Sponsored by: https://reviews.freebsd.org/D6386	2016-05-19 22:20:35 +00:00
Gleb Smirnoff	f59d975e10	Tiny refactor of r294869/r296881: use defines to mask the VNET() macro. Suggested by: bz	2016-05-17 23:14:17 +00:00
Randall Stewart	5105a92c49	This small change adopts the excellent suggestion for using named structures in the add of a new tcp-stack that came in late to me via email after the last commit. It also makes it so that a new stack may optionally get a callback during a retransmit timeout. This allows the new stack to clear specific state (think sack scoreboards or other such structures). Sponsored by: Netflix Inc. Differential Revision: http://reviews.freebsd.org/D6303	2016-05-17 09:53:22 +00:00
Andrey V. Elsukov	2685841b38	Make named objects set-aware. Now it is possible to create named objects with the same name in different sets. Add optional manage_sets() callback to objects rewriting framework. It is intended to implement handler for moving and swapping named object's sets. Add ipfw_obj_manage_sets() function that implements generic sets handler. Use new callback to implement sets support for lookup tables. External actions objects are global and they don't support sets. Modify eaction_findbyname() to reflect this. ipfw(8) now may fail to move rules or sets, because some named objects in target set may have conflicting names. Note that ipfw_obj_ntlv type was changed, but since lookup tables actually didn't support sets, this change is harmless. Obtained from: Yandex LLC Sponsored by: Yandex LLC	2016-05-17 07:47:23 +00:00
Mark Johnston	565e7fd3bc	opt_kdtrace.h is not needed for SDT probes as of r258541.	2016-05-15 20:04:43 +00:00
Mark Johnston	e8800f3c2f	Fix a few style issues in the ICMP sysctl descriptions. MFC after: 1 week	2016-05-15 03:19:53 +00:00
Michael Tuexen	574679afe9	Fix a locking bug which only shows up on Mac OS X. MFC after: 1 week	2016-05-14 13:44:49 +00:00
Michael Tuexen	5f05199c19	Fix a bug introduced by the implementation of I-DATA support. There was the requirement that two structures are in sync, which is not valid anymore. Therefore don't rely on this in the code anymore. Thanks to Radek Malcic for reporting the issue. He found this when using the userland stack. MFC after: 1 week	2016-05-13 09:11:41 +00:00
Michael Tuexen	fd60718d17	Retire net.inet.sctp.strict_sacks and net.inet.sctp.strict_data_order sysctl's, since they where only there to interop with non-conformant implementations. This should not be a problem anymore.	2016-05-12 16:34:59 +00:00

1 2 3 4 5 ...

5572 Commits