freebsd-skq

Author	SHA1	Message	Date
Richard Scheffenegger	cce999b38f	Fix style and comment around concave/convex regions in TCP cubic. In cubic, the concave region is when snd_cwnd starts growing slower towards max_cwnd (cwnd at the time of the congestion event), and the convex region is when snd_cwnd starts to grow faster and eventually appearing like slow-start like growth. PR: 238478 Reviewed by: tuexen (mentor), rgrimes (mentor) Approved by: tuexen (mentor), rgrimes (mentor) MFC after: 2 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D24657	2020-07-21 16:21:52 +00:00
Richard Scheffenegger	66ba9aafcf	Add MODULE_VERSION to TCP loadable congestion control modules. Without versioning information, using preexisting loader / linker code is not easily possible when another module may have dependencies on pre-loaded modules, and also doesn't allow the automatic loading of dependent modules. No functional change of the actual modules. Reviewed by: tuexen (mentor), rgrimes (mentor) Approved by: tuexen (mentor), rgrimes (mentor) MFC after: 2 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D25744	2020-07-20 23:47:27 +00:00
Michael Tuexen	8745f898c4	Add reference counts for inp/stcb/net when timers are running. This avoids a use-after-free reported for the userland stack. Thanks to Taylor Brandstetter for suggesting a patch for the userland stack. MFC after: 1 week	2020-07-19 12:34:19 +00:00
Michael Tuexen	05bceec68e	Remove code which is not needed. MFC after: 1 week	2020-07-18 13:10:02 +00:00
Michael Tuexen	7f0ad2274b	Improve the locking of address lists by adding some asserts and rearranging the addition of address such that the lock is not given up during checking and adding. MFC after: 1 week	2020-07-17 15:09:49 +00:00
Michael Tuexen	f903a308a1	(Re)-allow 0.0.0.0 to be used as an address in connect() for TCP In r361752 an error handling was introduced for using 0.0.0.0 or 255.255.255.255 as the address in connect() for TCP, since both addresses can't be used. However, the stack maps 0.0.0.0 implicitly to a local address and at least two regressions were reported. Therefore, re-allow the usage of 0.0.0.0. While there, change the error indicated when using 255.255.255.255 from EAFNOSUPPORT to EACCES as mentioned in the man-page of connect(). Reviewed by: rrs MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D25401	2020-07-16 16:46:24 +00:00
Michael Tuexen	504ee6a001	Improve the error handling in generating ASCONF chunks. In case of errors, the cleanup was not consistent. Thanks to Felix Weinrank for fuzzing the userland stack and making me aware of the issue. MFC after: 1 week	2020-07-14 20:32:50 +00:00
Michael Tuexen	cd7518203d	Cleanup, no functional change intended. This file is only compiled if INET or INET6 is defined. So there is no need for checking that. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D25635	2020-07-12 18:34:09 +00:00
Michael Tuexen	83b8204f61	(Re)activate SCTP system calls when compiling SCTP support into the kernel r363079 introduced the possibility of loading the SCTP stack as a module in addition to compiling it into the kernel. As part of this, the registration of the system calls was removed and put into the loading of the module. Therefore, the system calls are not registered anymore when compiling the SCTP into the kernel. This patch addresses that. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D25632	2020-07-12 14:50:12 +00:00
Michael Tuexen	4ef7c2f28f	Whitespace changes due to upstreaming r363079.	2020-07-10 16:59:06 +00:00
Mark Johnston	052c5ec4d0	Provide support for building SCTP as a loadable module. With this change, a kernel compiled with "options SCTP_SUPPORT" and without "options SCTP" supports dynamic loading of the SCTP stack. Currently sctp.ko cannot be unloaded since some prerequisite teardown logic is not yet implemented. Attempts to unload the module will return EOPNOTSUPP. Discussed with: tuexen MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21997	2020-07-10 14:56:05 +00:00
Michael Tuexen	6ddc843832	Fix a use-after-free bug for the userland stack. The kernel stack is not affected. Thanks to Mark Wodrich from Google for finding and reporting the bug. MFC after: 1 week	2020-07-10 11:15:10 +00:00
Michael Tuexen	b6734d8f4a	Optimize flushing of receive queues. This addresses an issue found and reported for the userland stack in https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=21243 MFC after: 1 week	2020-07-09 16:18:42 +00:00
Michael Tuexen	fcbfdc0ab6	Improve consistency. MFC after: 1 week	2020-07-08 16:23:40 +00:00
Michael Tuexen	ef9095c72a	Fix error description. MFC after: 1 week	2020-07-08 16:04:06 +00:00
Michael Tuexen	c96d7c373e	Don't accept FORWARD-TSN chunks when I-FORWARD-TSN was negotiated and vice versa. MFC after: 1 week	2020-07-08 15:49:30 +00:00
Michael Tuexen	32df1c9ebb	Improve handling of PKTDROP chunks. This includes the input validation to address two issues found by ossfuzz testing the userland stack: * https://oss-fuzz.com/testcase-detail/5387560242380800 * https://oss-fuzz.com/testcase-detail/4887954068865024 and adding support for I-DATA chunks in addition to DATA chunks.	2020-07-08 12:25:19 +00:00
Richard Scheffenegger	c201ce0b4a	Fix KASSERT during tcp_newtcpcb when low on memory While testing with system default cc set to cubic, and running a memory exhaustion validation, FreeBSD panics for a missing inpcb reference / lock. Reviewed by: rgrimes (mentor), tuexen (mentor) Approved by: rgrimes (mentor), tuexen (mentor) MFC after: 3 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D25583	2020-07-07 12:10:59 +00:00
Alexander V. Chernikov	6ad7446c6f	Complete conversions from fib<4\|6>_lookup_nh_<basic\|ext> to fib<4\|6>_lookup(). fib[46]_lookup_nh_ represents pre-epoch generation of fib api, providing less guarantees over pointer validness and requiring on-stack data copying. With no callers remaining, remove fib[46]_lookup_nh_ functions. Submitted by: Neel Chauhan <neel AT neelc DOT org> Differential Revision: https://reviews.freebsd.org/D25445	2020-07-02 21:04:08 +00:00
Michael Tuexen	e54b7cd007	Fix the cleanup handling in a error path for TCP BBR. Reported by: syzbot+df7899c55c4cc52f5447@syzkaller.appspotmail.com Reviewed by: rscheff Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D25486	2020-07-01 17:17:06 +00:00
Mark Johnston	d16a2e4784	Fix a possible next-hop refcount leak when handling IPSec traffic. It may be possible to fix this by deferring the lookup, but let's keep the initial change simple to make MFCs easier. PR: 246951 Reviewed by: melifaro MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D25519	2020-07-01 15:42:48 +00:00
Michael Tuexen	7a3f60e7f5	Fix a bug introduced in https://svnweb.freebsd.org/changeset/base/362173 Reported by: syzbot+f3a6fccfa6ae9d3ded29@syzkaller.appspotmail.com MFC after: 1 week	2020-06-30 21:50:05 +00:00
Michael Tuexen	e99ce3eac5	Don't send packets containing ERROR chunks in response to unknown chunks when being in a state where the verification tag to be used is not known yet. MFC after: 1 week	2020-06-28 14:11:36 +00:00
Michael Tuexen	f2f66ef6d2	Don't check ch for not being NULL, since that is true. MFC after: 1 week	2020-06-28 11:12:03 +00:00
John Baldwin	4a711b8d04	Use zfree() instead of explicit_bzero() and free(). In addition to reducing lines of code, this also ensures that the full allocation is always zeroed avoiding possible bugs with incorrect lengths passed to explicit_bzero(). Suggested by: cem Reviewed by: cem, delphij Approved by: csprng (cem) Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D25435	2020-06-25 20:17:34 +00:00
Michael Tuexen	132c073866	Fix the acconting for fragmented unordered messages when using interleaving. This was reported for the userland stack in https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=19321 MFC after: 1 week	2020-06-24 14:47:51 +00:00
Richard Scheffenegger	6e26dd0dbe	TCP: fix cubic RTO reaction. Proper TCP Cubic operation requires the knowledge of the maximum congestion window prior to the last congestion event. This restores and improves a bugfix previously added by jtl@ but subsequently removed due to a revert. Reported by: chengc_netapp.com Reviewed by: chengc_netapp.com, tuexen (mentor) Approved by: tuexen (mentor), rgrimes (mentor) MFC after: 2 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D25133	2020-06-24 13:52:53 +00:00
Richard Scheffenegger	9dc7d8a246	TCP: make after-idle work for transactional sessions. The use of t_rcvtime as proxy for the last transmission fails for transactional IO, where the client requests data before the server can respond with a bulk transfer. Set aside a dedicated variable to actually track the last locally sent segment going forward. Reported by: rrs Reviewed by: rrs, tuexen (mentor) Approved by: tuexen (mentor), rgrimes (mentor) MFC after: 2 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D25016	2020-06-24 13:42:42 +00:00
Michael Tuexen	87c0bf77d9	Fix alignment issue manifesting in the userland stack. MFC after: 1 wwek	2020-06-23 23:05:05 +00:00
Michael Tuexen	b88082dd39	No need to include netinet/sctp_crc32.h twice.	2020-06-22 14:36:14 +00:00
Mark Johnston	e6db509d10	Move the definition of SCTP's system_base_info into sctp_crc32.c. This file is the only SCTP source file compiled into the kernel when SCTP_SUPPORT is configured. sctp_delayed_checksum() references a couple of counters defined in system_base_info, so the change allows these counters to be referenced in a kernel compiled without "options SCTP". Submitted by: tuexen MFC with: r362338	2020-06-22 14:01:31 +00:00
Michael Tuexen	c5d9e5c99e	Cleanup the defintion of struct sctp_getaddresses. This stucture is used by the IPPROTO_SCTP level socket options SCTP_GET_PEER_ADDRESSES and SCTP_GET_LOCAL_ADDRESSES, which are used by libc to implement sctp_getladdrs() and sctp_getpaddrs(). These changes allow an old libc to work on a newer kernel.	2020-06-21 23:12:56 +00:00
Bjoern A. Zeeb	e387af1fa8	Rather than zeroing MAXVIFS times size of pointer [r362289] (still better than sizeof pointer before [r354857]), we need to zero MAXVIFS times the size of the struct. All good things come in threes; I hope this is it on this one. PR: 246629, 206583 Reported by: kib MFC after: ASAP	2020-06-21 22:09:30 +00:00
Michael Tuexen	171edd2110	Fix the build for an INET6 only configuration. The fix from the last commit is actually needed twice... MFC after: 1 week	2020-06-21 09:56:09 +00:00
Michael Tuexen	5087b6e732	Set a variable also in the case of an INET6 only kernel MFC after: 1 week	2020-06-20 23:48:57 +00:00
Michael Tuexen	ed82c2edd6	Use a struct sockaddr_in pr struct sockaddr_in6 as the option value for the IPPROTO_SCTP level socket options SCTP_BINDX_ADD_ADDR and SCTP_BINDX_REM_ADDR. These socket option are intended for internal use only to implement sctp_bindx(). This is one user of struct sctp_getaddresses less. struct sctp_getaddresses is strange and will be changed shortly.	2020-06-20 21:06:02 +00:00
Michael Tuexen	7621bd5ead	Cleanup the adding and deleting of addresses via sctp_bindx(). There is no need to use the association identifier, so remove it. While there, cleanup the code a bit. MFC after: 1 week	2020-06-20 20:20:16 +00:00
Michael Tuexen	7a9dbc33f9	Remove last argument of sctp_addr_mgmt_ep_sa(), since it is not used. MFC after: 1 week	2020-06-19 12:35:29 +00:00
Mark Johnston	95033af923	Add the SCTP_SUPPORT kernel option. This is in preparation for enabling a loadable SCTP stack. Analogous to IPSEC/IPSEC_SUPPORT, the SCTP_SUPPORT kernel option must be configured in order to support a loadable SCTP implementation. Discussed with: tuexen MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2020-06-18 19:32:34 +00:00
Bjoern A. Zeeb	ce19cceb8d	When converting the static arrays to mallocarray() in r356621 I missed one place where we now need to multiply the size of the struct with the number of entries. This lead to problems when restarting user space daemons, as the cleanup was never properly done, resulting in MRT_ADD_VIF EADDRINUSE. Properly zero all array elements to avoid this problem. PR: 246629, 206583 Reported by: (many) MFC after: 4 days Sponsored by: Rubicon Communications, LLC (d/b/a "Netgate")	2020-06-17 21:04:38 +00:00
Bjoern A. Zeeb	b7b3d237e7	The call into ifa_ifwithaddr() needs to be epoch protected; ortherwise we'll panic on an assertion. While here, leave a comment that the ifp was never protected and stable (as glebius pointed out) and this needs to be fixed properly. Discovered while working on: PR 246629 Reviewed by: glebius MFC after: 4 days Sponsored by: Rubicon Communications, LLC (d/b/a "Netgate")	2020-06-17 20:58:37 +00:00
Michael Tuexen	2d87bacde4	Allow the self reference to be NULL in case the timer was stopped. Submitted by: Timo Voelker MFC after: 1 week	2020-06-17 15:27:45 +00:00
Tom Jones	d88fe3d964	Add header definition for RFC4340, Datagram Congestion Control Protocol Add a header definition for DCCP as defined in RFC4340. This header definition is required to perform validation when receiving and forwarding DCCP packets. We do not currently support DCCP. Reviewed by: gallatin, bz Approved by: bz (co-mentor) MFC after: 1 week MFC with: 350749 Differential Revision: https://reviews.freebsd.org/D21179	2020-06-17 13:27:13 +00:00
Randall Stewart	95ef69c63c	iSo in doing final checks on OCA firmware with all the latest tweaks the dup-ack checking packet drill script was failing with a number of unexpected acks. So it turns out if you have the default recvwin set up to 1Meg (like OCA's do) and you have no window scaling (like the dupack checking code) then we have another case where we are always trying to update the rwnd and sending an ack when we should not. Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D25298	2020-06-16 18:16:45 +00:00
Randall Stewart	4d418f8da8	So it turns out rack has a shortcoming in dup-ack counting. It counts the dupacks but then does not properly respond to them. This is because a few missing bits are not present. BBR actually does properly respond (though it also sends a TLP which is interesting and maybe something to fix).. Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D25294	2020-06-16 12:26:23 +00:00
Michael Tuexen	b231bff8b2	Allocate the mbuf for the signature in the COOKIE or the correct size. While there, do also do some cleanups. MFC after: 1 week	2020-06-14 16:05:08 +00:00
Michael Tuexen	4471043177	Cleanups, no functional change. MFC after: 1 week	2020-06-14 09:50:00 +00:00
Michael Tuexen	d60bdf8569	Remove usage of empty macro. MFC after: 1 week	2020-06-13 21:23:26 +00:00
Michael Tuexen	64c8fc5de8	Simpify a condition, no functional change. MFC after: 1 week	2020-06-13 18:38:59 +00:00
Randall Stewart	f092a3c71c	So it turns out with the right window scaling you can get the code in all stacks to always want to do a window update, even when no data can be sent. Now in cases where you are not pacing thats probably ok, you just send an extra window update or two. However with bbr (and rack if its paced) every time the pacer goes off its going to send a "window update". Also in testing bbr I have found that if we are not responding to data right away we end up staying in startup but incorrectly holding a pacing gain of 192 (a loss). This is because the idle window code does not restict itself to only work with PROBE_BW. In all other states you dont want it doing a PROBE_BW state change. Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D25247	2020-06-12 19:56:19 +00:00
Michael Tuexen	3ee11586b2	Whitespace change due to upstream cleanup. MFC after: 1 week	2020-06-12 16:40:10 +00:00
Michael Tuexen	2f9e6db0be	More cleanups due to ifdef cleanup done upstream MFC after: 1 week	2020-06-12 16:31:13 +00:00
Michael Tuexen	306c2ba375	Small cleanup due to upstream ifdef cleanups. MFC after: 1 week	2020-06-12 10:13:23 +00:00
Michael Tuexen	28397ac1ed	Non-functional changes due to upstream cleanup. MFC after: 1 week	2020-06-11 13:34:09 +00:00
Richard Scheffenegger	2fda0a6f3a	Prevent TCP Cubic to abruptly increase cwnd after app-limited Cubic calculates the new cwnd based on absolute time elapsed since the start of an epoch. A cubic epoch is started on congestion events, or once the congestion avoidance phase is started, after slow-start has completed. When a sender is application limited for an extended amount of time and subsequently a larger volume of data becomes ready for sending, Cubic recalculates cwnd with a lingering cubic epoch. This recalculation of the cwnd can induce a massive increase in cwnd, causing a burst of data to be sent at line rate by the sender. This adds a flag to reset the cubic epoch once a session transitions from app-limited to cwnd-limited to prevent the above effect. Reviewed by: chengc_netapp.com, tuexen (mentor) Approved by: tuexen (mentor), rgrimes (mentor) MFC after: 3 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D25065	2020-06-10 07:32:02 +00:00
Richard Scheffenegger	6907bbae18	Prevent TCP Cubic to abruptly increase cwnd after slow-start Introducing flags to track the initial Wmax dragging and exit from slow-start in TCP Cubic. This prevents sudden jumps in the caluclated cwnd by cubic, especially when the flow is application limited during slow start (cwnd can not grow as fast as expected). The downside is that cubic may remain slightly longer in the concave region before starting the convex region beyond Wmax again. Reviewed by: chengc_netapp.com, tuexen (mentor) Approved by: tuexen (mentor), rgrimes (mentor, blanket) MFC after: 3 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D23655	2020-06-09 21:07:58 +00:00
Michael Tuexen	5fb132abbb	Whitespace cleanups and removal of a stale comment. MFC after: 1 week	2020-06-08 20:23:20 +00:00
Randall Stewart	e854dd38ac	An important statistic in determining if a server process (or client) is being delayed is to know the time to first byte in and time to first byte out. Currently we have no way to know these all we have is t_starttime. That (t_starttime) tells us what time the 3 way handshake completed. We don't know when the first request came in or how quickly we responded. Nor from a client perspective do we know how long from when we sent out the first byte before the server responded. This small change adds the ability to track the TTFB's. This will show up in BB logging which then can be pulled for later analysis. Note that currently the tracking is via the ticks variable of all three variables. This provides a very rough estimate (hz=1000 its 1ms). A follow-on set of work will be to change all three of these values into something with a much finer resolution (either microseconds or nanoseconds), though we may want to make the resolution configurable so that on lower powered machines we could still use the much cheaper ticks variable. Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D24902	2020-06-08 11:48:07 +00:00
Michael Tuexen	70486b27ae	Retire SCTP_SO_LOCK_TESTING. This was intended to test the locking used in the MacOS X kernel on a FreeBSD system, to make use of WITNESS and other debugging infrastructure. This hasn't been used for ages, to take it out to reduce the #ifdef complexity. MFC after: 1 week	2020-06-07 14:39:20 +00:00
Michael Tuexen	3f53d62236	Fix typo in comment. Submitted by Orgad Shaneh for the userland stack. MFC after: 1 week	2020-06-06 21:26:34 +00:00
Michael Tuexen	2cf3347109	Non-functional changes due to cleanup (upstream removing of Panda support) of the code MFC after: 1 week	2020-06-06 18:20:09 +00:00
Randall Stewart	2cf21ae559	We should never allow either the broadcast or IN_ADDR_ANY to be connected to or sent to. This was fond when working with Michael Tuexen and Skyzaller. Skyzaller seems to want to use either of these two addresses to connect to at times. And it really is an error to do so, so lets not allow that behavior. Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D24852	2020-06-03 14:16:40 +00:00
Randall Stewart	f1ea4e4120	This fixes a couple of skyzaller crashes. Most of them have to do with TFO. Even the default stack had one of the issues: 1) We need to make sure for rack that we don't advance snd_nxt beyond iss when we are not doing fast open. We otherwise can get a bunch of SYN's sent out incorrectly with the seq number advancing. 2) When we complete the 3-way handshake we should not ever append to reassembly if the tlen is 0, if TFO is enabled prior to this fix we could still call the reasemmbly. Note this effects all three stacks. 3) Rack like its cousin BBR should track if a SYN is on a send map entry. 4) Both bbr and rack need to only consider len incremented on a SYN if the starting seq is iss, otherwise we don't increment len which may mean we return without adding a sendmap entry. This work was done in collaberation with Michael Tuexen, thanks for all the testing! Sponsored by: Netflix Inc Differential Revision: https://reviews.freebsd.org/D25000	2020-06-03 14:07:31 +00:00
Michael Tuexen	d442a65733	Restrict enabling TCP-FASTOPEN to end-points in CLOSED or LISTEN state Enabling TCP-FASTOPEN on an end-point which is in a state other than CLOSED or LISTEN, is a bug in the application. So it should not work. Also the TCP code does not (and needs not to) handle this. While there, also simplify the setting of the TF_FASTOPEN flag. This issue was found by running syzkaller. Reviewed by: rrs MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D25115	2020-06-03 13:51:53 +00:00
Alexander V. Chernikov	da187ddb3d	* Add rib_<add\|del\|change>_route() functions to manipulate the routing table. The main driver for the change is the need to improve notification mechanism. Currently callers guess the operation data based on the rtentry structure returned in case of successful operation result. There are two problems with this appoach. First is that it doesn't provide enough information for the upcoming multipath changes, where rtentry refers to a new nexthop group, and there is no way of guessing which paths were added during the change. Second is that some rtentry fields can change during notification and protecting from it by requiring customers to unlock rtentry is not desired. Additionally, as the consumers such as rtsock do know which operation they request in advance, making explicit add/change/del versions of the functions makes sense, especially given the functions don't share a lot of code. With that in mind, introduce rib_cmd_info notification structure and rib_<add\|del\|change>_route() functions, with mandatory rib_cmd_info pointer. It will be used in upcoming generalized notifications. * Move definitions of the new functions and some other functions/structures used for the routing table manipulation to a separate header file, net/route/route_ctl.h. net/route.h is a frequently used file included in ~140 places in kernel, and 90% of the users don't need these definitions. Reviewed by: ae Differential Revision: https://reviews.freebsd.org/D25067	2020-06-01 20:49:42 +00:00
Alexander V. Chernikov	e7403d0230	Revert r361704, it accidentally committed merged D25067 and D25070.	2020-06-01 20:40:40 +00:00
Alexander V. Chernikov	79674562b8	* Add rib_<add\|del\|change>_route() functions to manipulate the routing table. The main driver for the change is the need to improve notification mechanism. Currently callers guess the operation data based on the rtentry structure returned in case of successful operation result. There are two problems with this appoach. First is that it doesn't provide enough information for the upcoming multipath changes, where rtentry refers to a new nexthop group, and there is no way of guessing which paths were added during the change. Second is that some rtentry fields can change during notification and protecting from it by requiring customers to unlock rtentry is not desired. Additionally, as the consumers such as rtsock do know which operation they request in advance, making explicit add/change/del versions of the functions makes sense, especially given the functions don't share a lot of code. With that in mind, introduce rib_cmd_info notification structure and rib_<add\|del\|change>_route() functions, with mandatory rib_cmd_info pointer. It will be used in upcoming generalized notifications. * Move definitions of the new functions and some other functions/structures used for the routing table manipulation to a separate header file, net/route/route_ctl.h. net/route.h is a frequently used file included in ~140 places in kernel, and 90% of the users don't need these definitions. Reviewed by: ae Differential Revision: https://reviews.freebsd.org/D25067	2020-06-01 20:32:02 +00:00
Alexander V. Chernikov	a37a5246ca	Use fib[46]_lookup() in mtu calculations. fib[46]_lookup_nh_ represents pre-epoch generation of fib api, providing less guarantees over pointer validness and requiring on-stack data copying. Conversion is straight-forwarded, as the only 2 differences are requirement of running in network epoch and the need to handle RTF_GATEWAY case in the caller code. Differential Revision: https://reviews.freebsd.org/D24974	2020-05-28 08:00:08 +00:00
Alexander V. Chernikov	3553b3007f	Switch ip_output/icmp_reflect rt lookup calls with fib4_lookup. fib4_lookup_nh_ represents pre-epoch generation of fib api, providing less guarantees over pointer validness and requiring on-stack data copying. Conversion is straight-forwarded, as the only 2 differences are requirement of running in network epoch and the need to handle RTF_GATEWAY case in the caller code. Reviewed by: ae Differential Revision: https://reviews.freebsd.org/D24976	2020-05-28 07:31:53 +00:00
Alexander V. Chernikov	7bfc98af12	Switch gif(4) path verification to fib[46]_check_urfp(). fibX_lookup_nh_ represents pre-epoch generation of fib api, providing less guarantees over pointer validness and requiring on-stack data copying. Use specialized fib[46]_check_urpf() from newer KPI instead, to allow removal of older KPI. Reviewed by: ae Differential Revision: https://reviews.freebsd.org/D24978	2020-05-28 07:26:18 +00:00
Emmanuel Vadot	77c68315f6	bbr: Use arc4random_uniform from libkern. This unbreak LINT build Reported by: jenkins, melifaro	2020-05-23 19:52:20 +00:00
Alexander V. Chernikov	4d2c2509f2	Move <add\|del\|change>_route() functions to route_ctl.c in preparation of multipath control plane changed described in D24141. Currently route.c contains core routing init/teardown functions, route table manipulation functions and various helper functions, resulting in >2KLOC file in total. This change moves most of the route table manipulation parts to a dedicated file, simplifying planned multipath changes and making route.c more manageable. Differential Revision: https://reviews.freebsd.org/D24870	2020-05-23 19:06:57 +00:00
Richard Scheffenegger	e68cde59c3	DCTCP: update alpha only once after loss recovery. In mixed ECN marking and loss scenarios it was found, that the alpha value of DCTCP is updated two times. The second update happens with freshly initialized counters indicating to ECN loss. Overall this leads to alpha not adjusting as quickly as expected to ECN markings, and therefore lead to excessive loss. Reported by: Cheng Cui Reviewed by: chengc_netapp.com, rrs, tuexen (mentor) Approved by: tuexen (mentor) MFC after: 2 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D24817	2020-05-21 21:42:49 +00:00
Richard Scheffenegger	af2fb894c9	With RFC3168 ECN, CWR SHOULD only be sent with new data Overly conservative data receivers may ignore the CWR flag on other packets, and keep ECE latched. This can result in continous reduction of the congestion window, and very poor performance when ECN is enabled. Reviewed by: rgrimes (mentor), rrs Approved by: rgrimes (mentor), tuexen (mentor) MFC after: 3 days Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D23364	2020-05-21 21:33:15 +00:00
Richard Scheffenegger	8e0511652b	Retain only mutually supported TCP options after simultaneous SYN When receiving a parallel SYN in SYN-SENT state, remove all the options only we supported locally before sending the SYN,ACK. This addresses a consistency issue on parallel opens. Also, on such a parallel open, the stack could be coaxed into running with timestamps enabled, even if administratively disabled. Reviewed by: tuexen (mentor) Approved by: tuexen (mentor) MFC after: 2 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D23371	2020-05-21 21:26:21 +00:00
Richard Scheffenegger	6e16d87751	Handle ECN handshake in simultaneous open While testing simultaneous open TCP with ECN, found that negotiation fails to arrive at the expected final state. Reviewed by: tuexen (mentor) Approved by: tuexen (mentor), rgrimes (mentor) MFC after: 2 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D23373	2020-05-21 21:15:25 +00:00
Mark Johnston	591b09b486	Define a module version for accept filter modules. Otherwise accept filters compiled into the kernel do not preempt preloaded accept filter modules. Then, the preloaded file registers its accept filter module before the kernel, and the kernel's attempt fails since duplicate accept filter list entries are not permitted. This causes the preloaded file's module to be released, since module_register_init() does a lookup by name, so the preloaded file is unloaded, and the accept filter's callback points to random memory since preload_delete_name() unmaps the file on x86 as of r336505. Add a new ACCEPT_FILTER_DEFINE macro which wraps the accept filter and module definitions, and ensures that a module version is defined. PR: 245870 Reported by: Thomas von Dein <freebsd@daemon.de> MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2020-05-19 18:35:08 +00:00
Michael Tuexen	999f86d67d	Replace snprintf() by SCTP_SNPRINTF() and let SCTP_SNPRINTF() map to snprintf() on FreeBSD. This allows to check for failures of snprintf() on platforms other than FreeBSD kernel.	2020-05-19 07:23:35 +00:00
Michael Tuexen	821bae7cf3	Revert r361209: cem noted that on FreeBSD snprintf() can not fail and code should not check for that. A followup commit will replace the usage of snprintf() in the SCTP sources with a variadic macro SCTP_SNPRINTF, which will simply map to snprintf() on FreeBSD and do a checking similar to r361209 on other platforms.	2020-05-19 07:21:11 +00:00
Mike Karels	2fdbcbea76	Fix NULL-pointer bug from r361228. Note that in_pcb_lport and in_pcb_lport_dest can be called with a NULL local address for IPv6 sockets; handle it. Found by syzkaller. Reported by: cem MFC after: 1 month	2020-05-19 01:05:13 +00:00
Mike Karels	2510235150	Allow TCP to reuse local port with different destinations Previously, tcp_connect() would bind a local port before connecting, forcing the local port to be unique across all outgoing TCP connections for the address family. Instead, choose a local port after selecting the destination and the local address, requiring only that the tuple is unique and does not match a wildcard binding. Reviewed by: tuexen (rscheff, rrs previous version) MFC after: 1 month Sponsored by: Forcepoint LLC Differential Revision: https://reviews.freebsd.org/D24781	2020-05-18 22:53:12 +00:00
Michael Tuexen	6863ab0b8b	Remove assignment without effect. MFC after: 3 days	2020-05-18 19:48:38 +00:00
Michael Tuexen	9e8c9c9ef6	Don't check an unsigned variable for being negative. MFC after: 3 days.	2020-05-18 19:35:46 +00:00
Michael Tuexen	04ce5df9c9	Remove redundant assignment. MFC after: 3 days	2020-05-18 19:23:01 +00:00
Michael Tuexen	bca1802890	Cleanup, no functional change intended. MFC after: 3 days	2020-05-18 18:42:43 +00:00
Michael Tuexen	6395219ae3	Avoid an integer underflow. MFC after: 3 days	2020-05-18 18:32:58 +00:00
Michael Tuexen	60017c8e88	Remove redundant check. MFC after: 3 days	2020-05-18 18:27:10 +00:00
Michael Tuexen	88116b7e4d	Fix logical condition by looking at usecs. This issue was found by cpp-check running on the userland stack. MFC after: 3 days	2020-05-18 15:02:15 +00:00
Michael Tuexen	00023d8a87	Whitespace change. MFC after: 3 days	2020-05-18 15:00:18 +00:00
Michael Tuexen	e708e2a4f4	Handle failures of snprintf(). MFC after: 3 days	2020-05-18 10:07:01 +00:00
Michael Tuexen	da8c34c382	Non-functional changes, cleanups. MFC after: 3 days	2020-05-17 22:31:38 +00:00
Alexander V. Chernikov	174fb9dbb1	Remove redundant checks for nhop validity. Currently NH_IS_VALID() simly aliases to RT_LINK_IS_UP(), so we're checking the same thing twice. In the near future the implementation of this check will be simpler, as there are plans to introduce control-plane interface status monitoring similar to ipfw interface tracker.	2020-05-17 15:32:36 +00:00
Michael Tuexen	daf143413a	Ensure that an stcb is not dereferenced when it is about to be freed. This issue was found by SYZKALLER. MFC after: 3 days	2020-05-16 19:26:39 +00:00
Ed Maste	65a1d63665	libalias: retire cuseeme support The CU-SeeMe videoconferencing client and associated protocol is at this point a historical artifact; there is no need to retain support for this protocol today. Reviewed by: philip, markj, allanjude Relnotes: Yes Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D24790	2020-05-16 02:29:10 +00:00
Michael Tuexen	e240ce42bf	Allow only IPv4 addresses in sendto() for TCP on AF_INET sockets. This problem was found by looking at syzkaller reproducers for some other problems. Reviewed by: rrs Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D24831	2020-05-15 14:06:37 +00:00
Randall Stewart	777b88d60f	This fixes several skyzaller issues found with the help of Michael Tuexen. There was some accounting errors with TCPFO for bbr and also for both rack and bbr there was a FO case where we should be jumping to the just_return_nolock label to exit instead of returning 0. This of course caused no timer to be running and thus the stuck sessions. Reported by: Michael Tuexen and Skyzaller Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D24852	2020-05-15 14:00:12 +00:00
Ed Maste	46701f31be	libalias: fix potential memory disclosure from ftp module admbugs: 956 Submitted by: markj Reported by: Vishnu Dev TJ working with Trend Micro Zero Day Initiative Security: FreeBSD-SA-20:13.libalias Security: CVE-2020-7455 Security: ZDI-CAN-10849	2020-05-12 16:38:28 +00:00
Ed Maste	6461c83e09	libalias: validate packet lengths before accessing headers admbugs: 956 Submitted by: ae Reported by: Lucas Leong (@_wmliang_) of Trend Micro Zero Day Initiative Reported by: Vishnu working with Trend Micro Zero Day Initiative Security: FreeBSD-SA-20:12.libalias	2020-05-12 16:33:04 +00:00
Michael Tuexen	86fd36c502	Fix a copy and paste error introduced in r360878. Reported-by: syzbot+a0863e972771f2f0d4b3@syzkaller.appspotmail.com Reported-by: syzbot+4481757e967ba83c445a@syzkaller.appspotmail.com MFC after: 3 days	2020-05-11 22:47:20 +00:00
Alexander V. Chernikov	1d1a743e9f	Fix NOINET[6] build by using af-independent route lookup function. Reported by: rpokala	2020-05-11 20:41:03 +00:00
Andrew Gallatin	6043ac201a	Ktls: never skip stamping tags for NIC TLS The newer RACK and BBR TCP stacks have added a mechanism to disable hardware packet pacing for TCP retransmits. This mechanism works by skipping the send-tag stamp on rate-limited connections when the TCP stack calls ip_output() with the IP_NO_SND_TAG_RL flag set. When doing NIC TLS, we must ignore this flag, as NIC TLS packets must always be stamped. Failure to stamp a NIC TLS packet will result in crypto issues. Reviewed by: hselasky, rrs Sponsored by: Netflix, Mellanox	2020-05-11 19:17:33 +00:00
Michael Tuexen	83ed508055	Ensure that the SCTP iterator runs with an stcb and inp, which belong to each other. Reported by: syzbot+82d39d14f2f765e38db0@syzkaller.appspotmail.com MFC after: 3 days	2020-05-10 22:54:30 +00:00
Michael Tuexen	9d176904ae	Remove trailing whitespace.	2020-05-10 17:43:42 +00:00
Michael Tuexen	efd5e69291	Ensure that we have a path when starting the T3 RXT timer. Reported by: syzbot+f2321629047f89486fa3@syzkaller.appspotmail.com MFC after: 3 days	2020-05-10 17:19:19 +00:00
Michael Tuexen	8123bbf186	Only drop DATA chunk with lower priorities as specified in RFC 7496. This issue was found by looking at a reproducer generated by syzkaller. MFC after: 3 days	2020-05-10 10:03:10 +00:00
Randall Stewart	b1ddcbc62c	When in the SYN-SENT state bbr and rack will not properly send an ACK but instead start the D-ACK timer. This causes so_reuseport_lb_test to fail since it slows down how quickly the program runs until the timeout occurs and fails the test Sponsored by: Netflix inc. Differential Revision: https://reviews.freebsd.org/D24747	2020-05-07 20:29:38 +00:00
Randall Stewart	8717b8f1bb	NF has an internal option that changes the tcp_mcopy_m routine slightly (has a few extra arguments). Recently that changed to only have one arg extra so that two ifdefs around the call are no longer needed. Lets take out the extra ifdef and arg. Sponsored by: Netflix Inc Differential Revision: https://reviews.freebsd.org/D24736	2020-05-07 10:46:02 +00:00
Michael Tuexen	cb9fb7b2cb	Avoid underflowing a variable, which would result in taking more data from the stream queues then needed. Thanks to Timo Voelker for finding this bug and providing a fix. MFC after: 3 days	2020-05-05 19:54:30 +00:00
Michael Tuexen	d3c3d6f99c	Fix the computation of the numbers of entries of the mapping array to look at when generating a SACK. This was wrong in case of sequence numbers wrap arounds. Thanks to Gwenael FOURRE for reporting the issue for the userland stack: https://github.com/sctplab/usrsctp/issues/462 MFC after: 3 days	2020-05-05 17:52:44 +00:00
Michael Tuexen	51a5392297	Add net epoch support back, which was taken out by accident in https://svnweb.freebsd.org/changeset/base/360639 Reviewed by: rrs Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D24694	2020-05-04 23:05:11 +00:00
Randall Stewart	570045a0fc	This fixes two issues found by ankitraheja09@gmail.com 1) When BBR retransmits the syn it was messing up the snd_max 2) When we need to send a RST we might not send it when we should Reported by: ankitraheja09@gmail.com Sponsored by: Netflix.com Differential Revision: https://reviews.freebsd.org/D24693	2020-05-04 23:02:58 +00:00
Michael Tuexen	7985fd7e76	Enter the net epoch before calling the output routine in TCP BBR. This was only triggered when setting the IPPROTO_TCP level socket option TCP_DELACK. This issue was found by runnning an instance of SYZKALLER. Reviewed by: rrs Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D24690	2020-05-04 22:02:49 +00:00
Randall Stewart	963fb2ad94	This commit brings things into sync with the advancements that have been made in rack and adds a few fixes in BBR. This also removes any possibility of incorrectly doing OOB data the stacks do not support it. Should fix the skyzaller crashes seen in the past. Still to fix is the BBR issue just reported this weekend with the SYN and on sending a RST. Note that this version of rack can now do pacing as well. Sponsored by:Netflix Inc Differential Revision:https://reviews.freebsd.org/D24576	2020-05-04 20:28:53 +00:00
Randall Stewart	d3b6c96b7d	Adjust the fb to have a way to ask the underlying stack if it can support the PRUS option (OOB). And then have the new function call that to validate and give the correct error response if needed to the user (rack and bbr do not support obsoleted OOB data). Sponsoered by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D24574	2020-05-04 20:19:57 +00:00
Alexander V. Chernikov	9e02229580	Remove now-unused rt_ifp,rt_ifa,rt_gateway,rt_mtu rte fields. After converting routing subsystem customers to use nexthop objects defined in r359823, some fields in struct rtentry became unused. This commit removes rt_ifp, rt_ifa, rt_gateway and rt_mtu from struct rtentry along with the code initializing and updating these fields. Cleanup of the remaining fields will be addressed by D24669. This commit also changes the implementation of the RTM_CHANGE handling. Old implementation tried to perform the whole operation under radix WLOCK, resulting in slow performance and hacks like using RTF_RNH_LOCKED flag. New implementation looks up the route nexthop under radix RLOCK, creates new nexthop and tries to update rte nhop pointer. Only last part is done under WLOCK. In the hypothetical scenarious where multiple rtsock clients repeatedly issue RTM_CHANGE requests for the same route, route may get updated between read and update operation. This is addressed by retrying the operation multiple (3) times before returning failure back to the caller. Differential Revision: https://reviews.freebsd.org/D24666	2020-05-04 14:31:45 +00:00
Gleb Smirnoff	61664ee700	Step 4.2: start divorce of M_EXT and M_EXTPG They have more differencies than similarities. For now there is lots of code that would check for M_EXT only and work correctly on M_EXTPG buffers, so still carry M_EXT bit together with M_EXTPG. However, prepare some code for explicit check for M_EXTPG. Reviewed by: gallatin Differential Revision: https://reviews.freebsd.org/D24598	2020-05-03 00:37:16 +00:00
Gleb Smirnoff	6edfd179c8	Step 4.1: mechanically rename M_NOMAP to M_EXTPG Reviewed by: gallatin Differential Revision: https://reviews.freebsd.org/D24598	2020-05-03 00:21:11 +00:00
Gleb Smirnoff	7b6c99d08d	Step 3: anonymize struct mbuf_ext_pgs and move all its fields into mbuf within m_epg namespace. All edits except the 'struct mbuf' declaration and mb_dupcl() were done mechanically with sed: s/->m_ext_pgs.nrdy/->m_epg_nrdy/g s/->m_ext_pgs.hdr_len/->m_epg_hdrlen/g s/->m_ext_pgs.trail_len/->m_epg_trllen/g s/->m_ext_pgs.first_pg_off/->m_epg_1st_off/g s/->m_ext_pgs.last_pg_len/->m_epg_last_len/g s/->m_ext_pgs.flags/->m_epg_flags/g s/->m_ext_pgs.record_type/->m_epg_record_type/g s/->m_ext_pgs.enc_cnt/->m_epg_enc_cnt/g s/->m_ext_pgs.tls/->m_epg_tls/g s/->m_ext_pgs.so/->m_epg_so/g s/->m_ext_pgs.seqno/->m_epg_seqno/g s/->m_ext_pgs.stailq/->m_epg_stailq/g Reviewed by: gallatin Differential Revision: https://reviews.freebsd.org/D24598	2020-05-03 00:12:56 +00:00
Richard Scheffenegger	14558b9953	Introduce a lower bound of 2 MSS to TCP Cubic. Running TCP Cubic together with ECN could end up reducing cwnd down to 1 byte, if the receiver continously sets the ECE flag, resulting in very poor transmission speeds. In line with RFC6582 App. B, a lower bound of 2 MSS is introduced, as well as a typecast to prevent any potential integer overflows during intermediate calculation steps of the adjusted cwnd. Reported by: Cheng Cui Reviewed by: tuexen (mentor) Approved by: tuexen (mentor) MFC after: 2 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D23353	2020-04-30 11:11:28 +00:00
Richard Scheffenegger	9028b6e0d9	Prevent premature shrinking of the scaled receive window which can cause a TCP client to use invalid or stale TCP sequence numbers for ACK packets. Packets with old sequence numbers are ignored and not used to update the send window size. This might cause the TCP session to hang indefinitely under some circumstances. Reported by: Cui Cheng Reviewed by: tuexen (mentor), rgrimes (mentor) Approved by: tuexen (mentor), rgrimes (mentor) MFC after: 3 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D24515	2020-04-29 22:01:33 +00:00
Richard Scheffenegger	b2ade6b166	Correctly set up the initial TCP congestion window in all cases, by not including the SYN bit sequence space in cwnd related calculations. Snd_und is adjusted explicitly in all cases, outside the cwnd update, instead. This fixes an off-by-one conformance issue with regular TCP sessions not using Appropriate Byte Counting (RFC3465), sending one more packet during the initial window than expected. PR: 235256 Reviewed by: tuexen (mentor), rgrimes (mentor) Approved by: tuexen (mentor), rgrimes (mentor) MFC after: 3 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D19000	2020-04-29 21:48:52 +00:00
Alexander V. Chernikov	e7d8af4f65	Move route_temporal.c and route_var.h to net/route. Nexthop objects implementation, defined in r359823, introduced sys/net/route directory intended to hold all routing-related code. Move recently-introduced route_temporal.c and private route_var.h header there. Differential Revision: https://reviews.freebsd.org/D24597	2020-04-28 19:14:09 +00:00
Alexander V. Chernikov	4043ee3cd7	Convert rtalloc_mpath_fib() users to the new KPI. New fib[46]_lookup() functions support multipath transparently. Given that, switch the last rtalloc_mpath_fib() calls to dib4_lookup() and eliminate the function itself. Note: proper flowid generation (especially for the outbound traffic) is a bigger topic and will be handled in a separate review. This change leaves flowid generation intact. Differential Revision: https://reviews.freebsd.org/D24595	2020-04-28 08:06:56 +00:00
Alexander V. Chernikov	1b0051bada	Eliminate now-unused parts of old routing KPI. r360292 switched most of the remaining routing customers to a new KPI, leaving a bunch of wrappers for old routing lookup functions unused. Remove them from the tree as a part of routing cleanup. Differential Revision: https://reviews.freebsd.org/D24569	2020-04-28 07:25:34 +00:00
John Baldwin	f1f9347546	Initial support for kernel offload of TLS receive. - Add a new TCP_RXTLS_ENABLE socket option to set the encryption and authentication algorithms and keys as well as the initial sequence number. - When reading from a socket using KTLS receive, applications must use recvmsg(). Each successful call to recvmsg() will return a single TLS record. A new TCP control message, TLS_GET_RECORD, will contain the TLS record header of the decrypted record. The regular message buffer passed to recvmsg() will receive the decrypted payload. This is similar to the interface used by Linux's KTLS RX except that Linux does not return the full TLS header in the control message. - Add plumbing to the TOE KTLS interface to request either transmit or receive KTLS sessions. - When a socket is using receive KTLS, redirect reads from soreceive_stream() into soreceive_generic(). - Note that this interface is currently only defined for TLS 1.1 and 1.2, though I believe we will be able to reuse the same interface and structures for 1.3.	2020-04-27 23:17:19 +00:00
John Baldwin	ec1db6e13d	Add the initial sequence number to the TLS enable socket option. This will be needed for KTLS RX. Reviewed by: gallatin Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D24451	2020-04-27 22:31:42 +00:00
Randall Stewart	e570d231f4	This change does a small prepratory step in getting the latest rack and bbr in from the NF repo. When those come in the OOB data handling will be fixed where Skyzaller crashes. Differential Revision: https://reviews.freebsd.org/D24575	2020-04-27 16:30:29 +00:00
Alexander V. Chernikov	55f57ca9ac	Convert debugnet to the new routing KPI. Introduce new fib[46]_lookup_debugnet() functions serving as a special interface for the crash-time operations. Underlying implementation will try to return lookup result if datastructures are not corrupted, avoding locking. Convert debugnet to use fib4_lookup_debugnet() and switch it to use nexthops instead of rtentries. Reviewed by: cem Differential Revision: https://reviews.freebsd.org/D24555	2020-04-26 18:42:38 +00:00
Alexander V. Chernikov	17cb6ddba8	Fix order of arguments in fib[46]_lookup calls in SCTP. r360292 introduced the wrong order, resulting in returned nhops not being referenced, despite the fact that references were requested. That lead to random GPF after using SCTP sockets. Special defined macro like IPV[46]_SCOPE_GLOBAL will be introduced soon to reduce the chance of putting arguments in wrong order. Reported-by: syzbot+5c813c01096363174684@syzkaller.appspotmail.com	2020-04-26 13:02:42 +00:00
Alexander V. Chernikov	454d389645	Fix LINT build #2 after r360292. Pointyhat to: melifaro	2020-04-25 11:35:38 +00:00
Alexander V. Chernikov	ac99fd86d4	Fix LINT build broken by r360292.	2020-04-25 10:31:56 +00:00
Alexander V. Chernikov	983066f05b	Convert route caching to nexthop caching. This change is build on top of nexthop objects introduced in r359823. Nexthops are separate datastructures, containing all necessary information to perform packet forwarding such as gateway interface and mtu. Nexthops are shared among the routes, providing more pre-computed cache-efficient data while requiring less memory. Splitting the LPM code and the attached data solves multiple long-standing problems in the routing layer, drastically reduces the coupling with outher parts of the stack and allows to transparently introduce faster lookup algorithms. Route caching was (re)introduced to minimise (slow) routing lookups, allowing for notably better performance for large TCP senders. Caching works by acquiring rtentry reference, which is protected by per-rtentry mutex. If the routing table is changed (checked by comparing the rtable generation id) or link goes down, cache record gets withdrawn. Nexthops have the same reference counting interface, backed by refcount(9). This change merely replaces rtentry with the actual forwarding nextop as a cached object, which is mostly mechanical. Other moving parts like cache cleanup on rtable change remains the same. Differential Revision: https://reviews.freebsd.org/D24340	2020-04-25 09:06:11 +00:00
Alexander V. Chernikov	9e88f47c8f	Unbreak LINT-NOINET[6] builds broken in r360191. Reported by: np	2020-04-23 06:55:33 +00:00
Michael Tuexen	8262311cbe	Improve input validation when processing AUTH chunks. Thanks to Natalie Silvanovich from Google for finding and reporting the issue found by her in the SCTP userland stack. MFC after: 3 days X-MFC with: https://svnweb.freebsd.org/changeset/base/360193	2020-04-22 21:22:33 +00:00
Michael Tuexen	97feba891d	Improve input validation when processing AUTH chunks. Thanks to Natalie Silvanovich from Google for finding and reporting the issue found by her in the SCTP userland stack. MFC after: 3 days	2020-04-22 12:47:46 +00:00
Alexander V. Chernikov	8d6708ba80	Convert TOE routing lookups to the new routing KPI. Reviewed by: np Differential Revision: https://reviews.freebsd.org/D24388	2020-04-22 07:53:43 +00:00
Richard Scheffenegger	bb410f9ff2	revert rS360143 - Correctly set up initial cwnd due to syzkaller panics found Reported by: tuexen Approved by: tuexen (mentor) Sponsored by: NetApp, Inc.	2020-04-22 00:16:42 +00:00
Richard Scheffenegger	73b7696693	Correctly set up the initial TCP congestion window in all cases, by adjust snd_una right after the connection initialization, to include the one byte in sequence space occupied by the SYN bit. This does not change the regular ACK processing, while making the BYTES_THIS_ACK macro to work properly. PR: 235256 Reviewed by: tuexen (mentor), rgrimes (mentor) Approved by: tuexen (mentor), rgrimes (mentor) MFC after: 2 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D19000	2020-04-21 13:05:44 +00:00
Jonathan T. Looney	5d6e356cb0	Avoid calling protocol drain routines more than once per reclamation event. mb_reclaim() calls the protocol drain routines for each protocol in each domain. Some protocols exist in more than one domain and share drain routines. In the case of SCTP, it also uses the same drain routine for its SOCK_SEQPACKET and SOCK_STREAM entries in the same domain. On systems with INET, INET6, and SCTP all defined, mb_reclaim() calls sctp_drain() four times. On systems with INET and INET6 defined, mb_reclaim() calls tcp_drain() twice. mb_reclaim() is the only in-tree caller of the pr_drain protocol entry. Eliminate this duplication by ensuring that each pr_drain routine is only specified for one protocol entry in one domain. Reviewed by: tuexen MFC after: 2 weeks Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D24418	2020-04-16 20:17:24 +00:00
Alexander V. Chernikov	539642a29d	Add nhop parameter to rti_filter callback. One of the goals of the new routing KPI defined in r359823 is to entirely hide`struct rtentry` from the consumers. It will allow to improve routing subsystem internals and deliver more features much faster. This change is one of the ongoing changes to eliminate direct struct rtentry field accesses. Additionally, with the followup multipath changes, single rtentry can point to multiple nexthops. With that in mind, convert rti_filter callback used when traversing the routing table to accept pair (rt, nhop) instead of nexthop. Reviewed by: ae Differential Revision: https://reviews.freebsd.org/D24440	2020-04-16 17:20:18 +00:00
Richard Scheffenegger	d7ca3f780d	Reduce default TCP delayed ACK timeout to 40ms. Reviewed by: kbowling, tuexen Approved by: tuexen (mentor) MFC after: 2 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D23281	2020-04-16 15:59:23 +00:00
Alexander V. Chernikov	9ac7c6cfed	Convert IP/IPv6 forwarding, ICMP processing and IP PCB laddr selection to the new routing KPI. Reviewed by: ae Differential Revision: https://reviews.freebsd.org/D24245	2020-04-14 23:06:25 +00:00
Michael Tuexen	b89af8e16d	Improve the TCP blackhole detection. The principle is to reduce the MSS in two steps and try each candidate two times. However, if two candidates are the same (which is the case in TCP/IPv6), this candidate was tested four times. This patch ensures that each candidate actually reduced the MSS and is only tested 2 times. This reduces the time window of missclassifying a temporary outage as an MTU issue. Reviewed by: jtl MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D24308	2020-04-14 16:35:05 +00:00
Andrew Gallatin	23feb56348	KTLS: Re-work unmapped mbufs to carry ext_pgs in the mbuf itself. While the original implementation of unmapped mbufs was a large step forward in terms of reducing cache misses by enabling mbufs to carry more than a single page for sendfile, they are rather cache unfriendly when accessing the ext_pgs metadata and data. This is because the ext_pgs part of the mbuf is allocated separately, and almost guaranteed to be cold in cache. This change takes advantage of the fact that unmapped mbufs are never used at the same time as pkthdr mbufs. Given this fact, we can overlap the ext_pgs metadata with the mbuf pkthdr, and carry the ext_pgs meta directly in the mbuf itself. Similarly, we can carry the ext_pgs data (TLS hdr/trailer/array of pages) directly after the existing m_ext. In order to be able to carry 5 pages (which is the minimum required for a 16K TLS record which is not perfectly aligned) on LP64, I've had to steal ext_arg2. The only user of this in the xmit path is sendfile, and I've adjusted it to use arg1 when using unmapped mbufs. This change is almost entirely mechanical, except that we change mb_alloc_ext_pgs() to no longer allow allocating pkthdrs, the change to avoid ext_arg2 as mentioned above, and the removal of the ext_pgs zone, This change saves roughly 2% "raw" CPU (~59% -> 57%), or over 3% "scaled" CPU on a Netflix 100% software kTLS workload at 90+ Gb/s on Broadwell Xeons. In a follow-on commit, I plan to remove some hacks to avoid access ext_pgs fields of mbufs, since they will now be in cache. Many thanks to glebius for helping to make this better in the Netflix tree. Reviewed by: hselasky, jhb, rrs, glebius (early version) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D24213	2020-04-14 14:46:06 +00:00
Alexander V. Chernikov	6722086045	Plug netmask NULL check during route addition causing kernel panic. This bug was introduced by the r359823. Reported by: hselasky	2020-04-14 13:12:22 +00:00
Kristof Provost	1d126e9b94	carp: Widen epoch coverage Fix panics related to calling code which expects to be running inside the NET_EPOCH from outside that epoch. This leads to panics (with INVARIANTS) such as this one: panic: Assertion in_epoch(net_epoch_preempt) failed at /usr/src/sys/netinet/if_ether.c:373 cpuid = 7 time = 1586095719 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0090819700 vpanic() at vpanic+0x182/frame 0xfffffe0090819750 panic() at panic+0x43/frame 0xfffffe00908197b0 arprequest_internal() at arprequest_internal+0x59e/frame 0xfffffe00908198c0 arp_announce_ifaddr() at arp_announce_ifaddr+0x20/frame 0xfffffe00908198e0 carp_master_down_locked() at carp_master_down_locked+0x10d/frame 0xfffffe0090819910 carp_master_down() at carp_master_down+0x79/frame 0xfffffe0090819940 softclock_call_cc() at softclock_call_cc+0x13f/frame 0xfffffe00908199f0 softclock() at softclock+0x7c/frame 0xfffffe0090819a20 ithread_loop() at ithread_loop+0x279/frame 0xfffffe0090819ab0 fork_exit() at fork_exit+0x80/frame 0xfffffe0090819af0 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0090819af0 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- Widen the NET_EPOCH to cover the relevant (callback / task) code. Differential Revision: https://reviews.freebsd.org/D24302	2020-04-12 16:09:21 +00:00
Alexander V. Chernikov	a666325282	Introduce nexthop objects and new routing KPI. This is the foundational change for the routing subsytem rearchitecture. More details and goals are available in https://reviews.freebsd.org/D24141 . This patch introduces concept of nexthop objects and new nexthop-based routing KPI. Nexthops are objects, containing all necessary information for performing the packet output decision. Output interface, mtu, flags, gw address goes there. For most of the cases, these objects will serve the same role as the struct rtentry is currently serving. Typically there will be low tens of such objects for the router even with multiple BGP full-views, as these objects will be shared between routing entries. This allows to store more information in the nexthop. New KPI: struct nhop_object fib4_lookup(uint32_t fibnum, struct in_addr dst, uint32_t scopeid, uint32_t flags, uint32_t flowid); struct nhop_object fib6_lookup(uint32_t fibnum, const struct in6_addr dst6, uint32_t scopeid, uint32_t flags, uint32_t flowid); These 2 function are intended to replace all all flavours of <in_\|in6_>rtalloc[1]<_ign><_fib>, mpath functions and the previous fib[46]-generation functions. Upon successful lookup, they return nexthop object which is guaranteed to exist within current NET_EPOCH. If longer lifetime is desired, one can specify NHR_REF as a flag and get a referenced version of the nexthop. Reference semantic closely resembles rtentry one, allowing sed-style conversion. Additionally, another 2 functions are introduced to support uRPF functionality inside variety of our firewalls. Their primary goal is to hide the multipath implementation details inside the routing subsystem, greatly simplifying firewalls implementation: int fib4_lookup_urpf(uint32_t fibnum, struct in_addr dst, uint32_t scopeid, uint32_t flags, const struct ifnet src_if); int fib6_lookup_urpf(uint32_t fibnum, const struct in6_addr dst6, uint32_t scopeid, uint32_t flags, const struct ifnet src_if); All functions have a separate scopeid argument, paving way to eliminating IPv6 scope embedding and allowing to support IPv4 link-locals in the future. Structure changes: * rtentry gets new 'rt_nhop' pointer, slightly growing the overall size. * rib_head gets new 'rnh_preadd' callback pointer, slightly growing overall sz. Old KPI: During the transition state old and new KPI will coexists. As there are another 4-5 decent-sized conversion patches, it will probably take a couple of weeks. To support both KPIs, fields not required by the new KPI (most of rtentry) has to be kept, resulting in the temporary size increase. Once conversion is finished, rtentry will notably shrink. More details: * architectural overview: https://reviews.freebsd.org/D24141 * list of the next changes: https://reviews.freebsd.org/D24232 Reviewed by: ae,glebius(initial version) Differential Revision: https://reviews.freebsd.org/D24232	2020-04-12 14:30:00 +00:00
Michael Tuexen	07ddae2822	Revert https://svnweb.freebsd.org/changeset/base/359809 The intended change was sp->next.tqe_next = NULL; sp->next.tqe_prev = NULL; which doesn't fix the issue I'm seeing and the committed fix is not the intended fix due to copy-and-paste. Thanks a lot to Conrad Meyer for making me aware of the problem. Reported by: cem	2020-04-12 09:31:36 +00:00
Michael Tuexen	9803dbb3ea	Zero out pointers for consistency. This was found by running syzkaller on an INVARIANTS kernel. MFC after: 3 days	2020-04-11 20:36:54 +00:00
Alexander V. Chernikov	4684d3cbcb	Remove per-AF radix_mpath initializtion functions. Split their functionality by moving random seed allocation to SYSINIT and calling (new) generic multipath function from standard IPv4/IPv5 RIB init handlers. Differential Revision: https://reviews.freebsd.org/D24356	2020-04-11 07:37:08 +00:00
Warner Losh	28540ab153	Fix copyright year and eliminate the obsolete all rights reserved line. Reviewed by: rrs@	2020-04-08 17:55:45 +00:00
Michael Tuexen	f4cb790a35	Do more argument validation under INVARIANTS when starting/stopping an SCTP timer. MFC after: 1 week	2020-04-06 13:58:13 +00:00
Alexander V. Chernikov	66bc03d415	Use interface fib for proxyarp checks. Before the change, proxyarp checks for src and dst addresses were performed using default fib, breaking multi-fib scenario. PR: 245181 Submitted by: Scott Aitken (original version) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D24244	2020-04-02 20:06:37 +00:00
Michael Tuexen	413c3db101	Allow the TCP backhole detection to be disabled at all, enabled only for IPv4, enabled only for IPv6, and enabled for IPv4 and IPv6. The current blackhole detection might classify a temporary outage as an MTU issue and reduces permanently the MSS. Since the consequences of such a reduction due to a misclassification are much more drastically for IPv4 than for IPv6, allow the administrator to enable it for IPv6 only. Reviewed by: bcr@ (man page), Richard Scheffenegger Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D24219	2020-03-31 15:54:54 +00:00
Mark Johnston	9b1d850be8	Remove the "config" taskqgroup and its KPIs. Equivalent functionality is already provided by taskqueue(9), just use that instead. MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2020-03-30 14:24:03 +00:00
Michael Tuexen	9aca687811	Small cleanup by using a variable just assigned. MFC after: 1 week	2020-03-28 22:35:04 +00:00
Michael Tuexen	25ec355353	Handle integer overflows correctly when converting msecs and secs to ticks and vice versa. These issues were caught by recently added panic() calls on INVARIANTS systems. Reported by: syzbot+b44787b4be7096cd1590@syzkaller.appspotmail.com Reported by: syzbot+35f82d22805c1e899685@syzkaller.appspotmail.com MFC after: 1 week	2020-03-28 20:25:45 +00:00
Ed Maste	c012cfe68a	sys/netinet: remove spurious doubled ;s	2020-03-27 23:10:18 +00:00
Michael Tuexen	d5d190f2f9	Some more uint32_t cleanups, no functional change. MFC after: 1 week	2020-03-27 21:48:52 +00:00
Michael Tuexen	239e5865df	Use uint32_t where it is expected to be used. No functional change. MFC after: 1 week	2020-03-27 11:08:11 +00:00
Michael Tuexen	7c63520c42	Remove an optimization, which was incorrect a couple of times and therefore doesn't seem worth to be there. In this case COOKIE where not retransmitted anymore, when the socket was already closed. MFC after: 1 week	2020-03-25 18:20:37 +00:00
Michael Tuexen	37686ccf08	Improve consistency in debug output. MFC after: 1 week	2020-03-25 18:14:12 +00:00
Michael Tuexen	24187cfe72	Revert https://svnweb.freebsd.org/changeset/base/357829 This introduces a regression reported by koobs@ when running a pyhton test suite on a loaded system. This patch resulted in a failing accept() call, when the association was setup and gracefully shutdown by the peer before accept was called. So the following packetdrill script would fail: +0.0 socket(..., SOCK_STREAM, IPPROTO_SCTP) = 3 +0.0 bind(3, ..., ...) = 0 +0.0 listen(3, 1) = 0 +0.0 < sctp: INIT[flgs=0, tag=1, a_rwnd=15000, os=1, is=1, tsn=1] +0.0 > sctp: INIT_ACK[flgs=0, tag=2, a_rwnd=..., os=..., is=..., tsn=1, ...] +0.1 < sctp: COOKIE_ECHO[flgs=0, len=..., val=...] +0.0 > sctp: COOKIE_ACK[flgs=0] +0.0 < sctp: DATA[flgs=BE, len=116, tsn=1, sid=0, ssn=0, ppid=0] +0.0 > sctp: SACK[flgs=0, cum_tsn=1, a_rwnd=..., gaps=[], dups=[]] +0.0 < sctp: SHUTDOWN[flgs=0, cum_tsn=0] +0.0 > sctp: SHUTDOWN_ACK[flgs=0] +0.0 < sctp: SHUTDOWN_COMPLETE[flgs=0] +0.0 accept(3, ..., ...) = 4 +0.0 close(3) = 0 +0.0 recv(4, ..., 4096, 0) = 100 +0.0 recv(4, ..., 4096, 0) = 0 +0.0 close(4) = 0 Reported by: koops@	2020-03-25 15:29:01 +00:00
Michael Tuexen	23e3c0880d	Use consistent debug output. MFC after: 1 week	2020-03-25 13:19:41 +00:00
Michael Tuexen	e056fafd92	Don't restore the vnet too early in error cases. MFC after: 1 week	2020-03-25 13:18:37 +00:00
Michael Tuexen	7522682e5e	Only call panic when building with INVARIANTS. MFC after: 1 week	2020-03-24 23:04:07 +00:00
Michael Tuexen	a412576e36	Another cleanup of the timer code. Also be more pedantic about the parameters of the timer start and stop routines. Several inconsistencies have been fixed in earlier commits. Now they will be catched when running an INVARIANTS system. MFC after: 1 week	2020-03-24 22:44:36 +00:00
Michael Tuexen	d084818d9d	Cleanup the file and add two ASSERT variants for locks, which will be used shortly. MFC after: 1 week	2020-03-23 12:17:13 +00:00
Michael Tuexen	a57fb68b92	More timer cleanups, no functional change. MFC after: 1 week	2020-03-21 16:12:19 +00:00
Michael Tuexen	fa8ceba9ca	Remove a set, but unused variable. MFC after: 1 week	2020-03-20 14:49:44 +00:00
Michael Tuexen	2bdebd0ce3	A a missing NET_EPOCH_ENTER/NET_EPOCH_EXIT pair. This was affecting implicit connection setups via sendmsg(). Reported by: syzbot+febbe3383a0e9b700c1b@syzkaller.appspotmail.com Reported by: syzbot+dca98631455d790223ca@syzkaller.appspotmail.com Reported by: syzbot+5a71a7760d6bcf11b8cd@syzkaller.appspotmail.com Reported by: syzbot+da64217e140444c49f00@syzkaller.appspotmail.com	2020-03-19 23:07:52 +00:00
Michael Tuexen	6fb7b4fbdb	Consistently provide arguments for timer start and stop routines. This is another step in cleaning up timer handling. MFC after: 1 week	2020-03-19 21:01:16 +00:00
Michael Tuexen	e95b3d7faf	Cleanup the stream reset and asconf timer. MFC after: 1 week	2020-03-19 18:55:54 +00:00
Michael Tuexen	42078d5ada	The MTU candidates MUST be a multiple of 4, so make them so. MFC after: 1 week	2020-03-19 14:37:28 +00:00
Michael Tuexen	0554e01d8b	Handle the timers in a consistent sequence according to the definition of the timer type. Just a cleanup, no functional change intended. MFC after: 1 week	2020-03-17 19:20:12 +00:00
Andrew Gallatin	ee7a9e506e	Avoid a cache miss accessing an mbuf ext_pgs pointer when doing SW kTLS. For a Netflix 90Gb/s 100% TLS software kTLS workload, this reduces the CPI of tcp_m_copym() from ~3.5 to ~2.5 as reported by vtune. Reviewed by: jtl, rrs Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D23998	2020-03-16 14:03:27 +00:00
Michael Tuexen	7ca6e2963f	Use KMOD_TCPSTAT_INC instead of TCPSTAT_INC for RACK and BBR, since these are kernel modules. Also add a KMOD_TCPSTAT_ADD and use that instead of TCPSTAT_ADD. Reviewed by: jtl@, rrs@ MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D23904	2020-03-12 15:37:41 +00:00
Andrew Gallatin	98085bae8c	make lacp's use_numa hashing aware of send tags When I did the use_numa support, I missed the fact that there is a separate hash function for send tag nic selection. So when use_numa is enabled, ktls offload does not work properly, as it does not reliably allocate a send tag on the proper egress nic since different egress nics are selected for send-tag allocation and packet transmit. To fix this, this change: - refectors lacp_select_tx_port_by_hash() and lacp_select_tx_port() to make lacp_select_tx_port_by_hash() always called by lacp_select_tx_port() - pre-shifts flowids to convert them to hashes when calling lacp_select_tx_port_by_hash() - adds a numa_domain field to if_snd_tag_alloc_params - plumbs the numa domain into places where we allocate send tags In testing with NIC TLS setup on a NUMA machine, I see thousands of output errors before the change when enabling kern.ipc.tls.ifnet.permitted=1. After the change, I see no errors, and I see the NIC sysctl counters showing active TLS offload sessions. Reviewed by: rrs, hselasky, jhb Sponsored by: Netflix	2020-03-09 13:44:51 +00:00
Hiroki Sato	d726e6331b	Fix an issue of net.inet.igmp.stats handler. The header of (struct igmpstat) could be cleared by sysctl(3). This can be reproduced by "netstat -s -z -p igmp". PR: 244584 MFC after: 1 week	2020-03-07 08:41:10 +00:00
Michael Tuexen	9c04fdfd34	When using automatically generated flow labels and using TCP SYN cookies, use the same flow label for the segments sent during the handshake and after the handshake. This fixes a bug by making sure that sc_flowlabel is always stored in network byte order. Reviewed by: bz@ MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D23957	2020-03-04 16:41:25 +00:00
Bjoern A. Zeeb	d2b8fd0da1	Add new ICMPv6 counters for Anti-DoS limits. Add four new counters for ND6 related Anti-DoS measures. We split these out into a separate upfront commit so that we only change the struct size one time. Implementations using them will follow. PR: 157410 Reviewed by: melifaro MFC after: 2 weeks X-MFC: cannot really MFC this without breaking netstat Sponsored by: Netflix (initially) Differential Revision: https://reviews.freebsd.org/D22711	2020-03-04 16:20:59 +00:00
Michael Tuexen	6605e5791f	Don't send an uninitilised traffic class in the IPv6 header, when sending a TCP segment from the TCP SYN cache (like a SYN-ACK). This fix initialises it to zero. This is correct for the ECN bits, but is does not honor the DSCP what an application might have set via the IPPROTO_IPV6 level socket options IPV6_TCLASS. That will be fixed separately. Reviewed by: Richard Scheffenegger MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D23900	2020-03-04 12:22:53 +00:00
Bjoern A. Zeeb	4e1a3ff884	tcp_hpts: make RSS kernel compile again. Add proper #includes, and #ifdefs and some style fixes to make RSS kernels compile again. There are still possible issues with uin16_t vs. uint_t cpuid which I am not going near. Reviewed by: gallatin Differential Revision: https://reviews.freebsd.org/D23726	2020-03-03 14:15:30 +00:00
Michael Tuexen	7e1e491f60	Remove stale definitions. The removed definitions are not used right now and are incompatible with the correct ones in RFC 3168. Submitted by: Richard Scheffenegger Differential Revision: https://reviews.freebsd.org/D23903	2020-03-01 12:34:27 +00:00
Pawel Biernacki	7029da5c36	Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many) r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly marked). Use it in preparation for a general review of all nodes. This is non-functional change that adds annotations to SYSCTL_NODE and SYSCTL_PROC nodes using one of the soon-to-be-required flags. Mark all obvious cases as MPSAFE. All entries that haven't been marked as MPSAFE before are by default marked as NEEDGIANT Approved by: kib (mentor, blanket) Commented by: kib, gallatin, melifaro Differential Revision: https://reviews.freebsd.org/D23718	2020-02-26 14:26:36 +00:00
Randall Stewart	d7313dc6f5	This commit expands tcp_ratelimit to be able to handle cards like the mlx-c5 and c6 that require a "setup" routine before the tcp_ratelimit code can declare and use a rate. I add the setup routine to if_var as well as fix tcp_ratelimit to call it. I also revisit the rates so that in the case of a mlx card of type c5/6 we will use about 100 rates concentrated in the range where the most gain can be had (1-200Mbps). Note that I have tested these on a c5 and they work and perform well. In fact in an unloaded system they pace right to the correct rate (great job mlx!). There will be a further commit here from Hans that will add the respective changes to the mlx driver to support this work (which I was testing with). Sponsored by: Netflix Inc. Differential Revision: ttps://reviews.freebsd.org/D23647	2020-02-26 13:48:33 +00:00
Pawel Biernacki	295a18d184	Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (14 of many) r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly marked). Use it in preparation for a general review of all nodes. This is non-functional change that adds annotations to SYSCTL_NODE and SYSCTL_PROC nodes using one of the soon-to-be-required flags. Approved by: kib (mentor, blanket) Differential Revision: https://reviews.freebsd.org/D23639	2020-02-24 10:47:18 +00:00
Pawel Biernacki	10b49b2302	Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (6 of many) r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly marked). Use it in preparation for a general review of all nodes. Mark all nodes in pf, pfsync and carp as MPSAFE. Reviewed by: kp Approved by: kib (mentor, blanket) Differential Revision: https://reviews.freebsd.org/D23634	2020-02-21 16:23:00 +00:00
Michael Tuexen	64f29eb1df	Remove an unused timer type. MFC after: 1 week	2020-02-20 15:37:44 +00:00
Michael Tuexen	868b51f234	Epochify SCTP.	2020-02-18 21:25:17 +00:00
Michael Tuexen	ba0d525006	Remove unused function.	2020-02-18 19:41:55 +00:00
Michael Tuexen	a610bb2120	Fix the non-default stream schedulers such that do not interleave user messages when it is now allowed. Thanks to Christian Wright for reporting the issue for the userland stack and providing a fix for the priority scheduler. MFC after: 1 week	2020-02-17 18:05:03 +00:00
Michael Tuexen	6b8fba3c5c	Don't use uninitialised stack memory if the sysctl variable net.inet.tcp.hostcache.enable is set to 0. The bug resulted in using possibly a too small MSS value or wrong initial retransmission timer settings. Possibly the value used for ssthresh was also wrong. Submitted by: Richard Scheffenegger Reviewed by: Cheng Cui, rgrimes@, tuexen@ MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D23687	2020-02-17 14:54:21 +00:00
Hans Petter Selasky	bacb11c9ed	Fix kernel panic while trying to read multicast stream. When VIMAGE is enabled make sure the "m_pkthdr.rcvif" pointer is set for all mbufs being input by the IGMP/MLD6 code. Else there will be a NULL-pointer dereference in the netisr code when trying to set the VNET based on the incoming mbuf. Add an assert to catch this when queueing mbufs on a netisr to make debugging of similar cases easier. Found by: Vladislav V. Prodan PR: 244002 Reviewed by: bz@ MFC after: 1 week Sponsored by: Mellanox Technologies	2020-02-17 09:46:32 +00:00
Mateusz Guzik	6b25673f3f	sctp: use new capsicum helpers	2020-02-15 01:29:40 +00:00
Michael Tuexen	a357466592	sack_newdata and snd_recover hold the same value. Therefore, use only a single instance: use snd_recover also where sack_newdata was used. Submitted by: Richard Scheffenegger Differential Revision: https://reviews.freebsd.org/D18811	2020-02-13 15:14:46 +00:00
Michael Tuexen	33f8cfdfe4	Whitespace cleanup. No functional change. Sponsored by: Netflix, Inc.	2020-02-13 13:58:34 +00:00
Michael Tuexen	56ccb48fd6	Don't panic under INVARIANTS when we can't allocate memory for storing a vtag in time wait. This issue was found by running syzkaller. MFC after: 1 week	2020-02-12 17:05:10 +00:00
Michael Tuexen	ca3de626ec	Mark the socket as disconnected when freeing the association the first time. This issue was found by running syzkaller. MFC after: 1 week	2020-02-12 17:02:15 +00:00
Randall Stewart	348404bce1	Lets get the real correct version.. gessh. I need more coffee evidently. Sponsored by: Netflix	2020-02-12 15:26:56 +00:00

... 2 3 4 5 6 ...

6832 Commits