freebsd-dev

Author	SHA1	Message	Date
Alex Richardson	bc596e5632	libalias: Fix -Wcast-align compiler warnings This fixes -Wcast-align warnings caused by the underaligned `struct ip`. This also silences them in the public functions by changing the function signature from char * to void *. This is source and binary compatible and avoids the -Wcast-align warning. Reviewed By: ae, gbe (manpages) Differential Revision: https://reviews.freebsd.org/D27882	2021-01-19 21:23:24 +00:00
Alexander V. Chernikov	f879876721	Fix IPv4 fib bsearch4() lookup array construction. Current code didn't properly handle the case with nested prefixes like 10.0.0.0/24 && 10.0.0.0/25.	2021-01-17 20:32:26 +00:00
Alexander V. Chernikov	81728a538d	Split rtinit() into multiple functions. rtinit[1]() is a function used to add or remove interface address prefix routes, similar to ifa_maintain_loopback_route(). It was intended to be family-agnostic. There is a problem with this approach in reality. 1) IPv6 code does not use it for the ifa routes. There is a separate layer, nd6_prelist_(), providing interface for maintaining interface routes. Its part, responsible for the actual route table interaction, mimics rtenty() code. 2) rtinit tries to combine multiple actions in the same function: constructing proper route attributes and handling iterations over multiple fibs, for the non-zero net.add_addr_allfibs use case. It notably increases the code complexity. 3) dstaddr handling. flags parameter re-uses RTF_ flags. As there is no special flag for p2p connections, host routes and p2p routes are handled in the same way. Additionally, mapping IFA flags to RTF flags makes the interface pretty messy. It make rtinit() to clash with ifa_mainain_loopback_route() for IPV4 interface aliases. 4) rtinit() is the last customer passing non-masked prefixes to rib_action(), complicating rib_action() implementation. 5) rtinit() coupled ifa announce/withdrawal notifications, producing "false positive" ifa messages in certain corner cases. To address all these points, the following has been done: * rtinit() has been split into multiple functions: - Route attribute construction were moved to the per-address-family functions, dealing with (2), (3) and (4). - funnction providing net.add_addr_allfibs handling and route rtsock notificaions is the new routing table inteface. - rtsock ifa notificaion has been moved out as well. resulting set of funcion are only responsible for the actual route notifications. Side effects: * /32 alias does not result in interface routes (/32 route and "host" route) * RTF_PINNED is now set for IPv6 prefixes corresponding to the interface addresses Differential revision: https://reviews.freebsd.org/D28186	2021-01-16 22:42:41 +00:00
Michael Tuexen	d2b3ceddcc	tcp: add sysctl to tolerate TCP segments missing timestamps When timestamp support has been negotiated, TCP segements received without a timestamp should be discarded. However, there are broken TCP implementations (for example, stacks used by Omniswitch 63xx and 64xx models), which send TCP segments without timestamps although they negotiated timestamp support. This patch adds a sysctl variable which tolerates such TCP segments and allows to interoperate with broken stacks. Reviewed by: jtl@, rscheff@ Differential Revision: https://reviews.freebsd.org/D28142 Sponsored by: Netflix, Inc. PR: 252449 MFC after: 1 week	2021-01-14 19:28:25 +01:00
Michael Tuexen	cc3c34859e	tcp: fix handling of TCP RST segments missing timestamps A TCP RST segment should be processed even it is missing TCP timestamps. Reported by: dmgk@, kevans@ Reviewed by: rscheff@, dmgk@ Sponsored by: Netflix, Inc. MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D28143	2021-01-14 14:39:35 +01:00
Mateusz Guzik	6b3a9a0f3d	Convert remaining cap_rights_init users to cap_rights_init_one semantic patch: @@ expression rights, r; @@ - cap_rights_init(&rights, r) + cap_rights_init_one(&rights, r)	2021-01-12 13:16:10 +00:00
Alexander V. Chernikov	2defbe9f0e	Use rn_match instead of doing indirect calls in fib_algo. Relevant inet/inet6 code has the control over deciding what the RIB lookup function currently is. With that in mind, explicitly set it to the current value (rn_match) in the datapath lookups. This avoids cost on indirect call. Differential Revision: https://reviews.freebsd.org/D28066	2021-01-11 23:30:35 +00:00
Alexander V. Chernikov	0da3f8c98d	Bump amount of queued packets in for unresolved ARP/NDP entries to 16. Currently default behaviour is to keep only 1 packet per unresolved entry. Ability to queue more than one packet was added 10 years ago, in r215207, though the default value was kep intact. Things have changed since that time. Systems tend to initiate multiple connections at once for a variety of reasons. For example, recent kern/252278 bug report describe happy-eyeball DNS behaviour sending multiple requests to the DNS server. The primary driver for upper value for the queue length determination is memory consumption. Remote actors should not be able to easily exhaust local memory by sending packets to unresolved arp/ND entries. For now, bump value to 16 packets, to match Darwin implementation. The proper approach would be to switch the limit to calculate memory consumption instead of packet count and limit based on memory. We should MFC this with a variation of D22447. Reviewers: #manpages, #network, bz, emaste Reviewed By: emaste, gbe(doc), jilles(doc) MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D28068	2021-01-11 19:51:11 +00:00
Mark Johnston	501159696c	igmp: Avoid leaking mbuf when source validation fails PR: 252504 Submitted by: Panagiotis Tsolakos <panagiotis.tsolakos@gmail.com> MFC after: 3 days	2021-01-08 13:32:04 -05:00
Alexander V. Chernikov	d68cf57b7f	Refactor rt_addrmsg() and rt_routemsg(). Summary: * Refactor rt_addrmsg(): make V_rt_add_addr_allfibs decision locally. * Fix rt_routemsg() and multipath by accepting nexthop instead of interface pointer. * Refactor rtsock_routemsg(): avoid accessing rtentry fields directly. * Simplify in_addprefix() by moving prefix search to a separate function. Reviewers: #network Subscribers: imp, ae, bz Differential Revision: https://reviews.freebsd.org/D28011	2021-01-07 19:38:19 +00:00
Michael Tuexen	a7aa5eea4f	sctp: improve handling of aborted associations Don't clear a flag, when the structure already has been freed. Reported by: syzbot+07667d16c96779c737b4@syzkaller.appspotmail.com	2021-01-01 15:59:10 +01:00
Alexander V. Chernikov	f733d9701b	Fix default route handling in radix4_lockless algo. Improve nexthop debugging. Reported by: Florian Smeets <flo at smeets.xyz>	2020-12-26 22:51:02 +00:00
Alexander V. Chernikov	f5baf8bb12	Add modular fib lookup framework. This change introduces framework that allows to dynamically attach or detach longest prefix match (lpm) lookup algorithms to speed up datapath route tables lookups. Framework takes care of handling initial synchronisation, route subscription, nhop/nhop groups reference and indexing, dataplane attachments and fib instance algorithm setup/teardown. Framework features automatic algorithm selection, allowing for picking the best matching algorithm on-the-fly based on the amount of routes in the routing table. Currently framework code is guarded under FIB_ALGO config option. An idea is to enable it by default in the next couple of weeks. The following algorithms are provided by default: IPv4: * bsearch4 (lockless binary search in a special IP array), tailored for small-fib (<16 routes) * radix4_lockless (lockless immutable radix, re-created on every rtable change), tailored for small-fib (<1000 routes) * radix4 (base system radix backend) * dpdk_lpm4 (DPDK DIR24-8-based lookups), lockless datastrucure, optimized for large-fib (D27412) IPv6: * radix6_lockless (lockless immutable radix, re-created on every rtable change), tailed for small-fib (<1000 routes) * radix6 (base system radix backend) * dpdk_lpm6 (DPDK DIR24-8-based lookups), lockless datastrucure, optimized for large-fib (D27412) Performance changes: Micro benchmarks (I7-7660U, single-core lookups, 2048k dst, code in D27604): IPv4: 8 routes: radix4: ~20mpps radix4_lockless: ~24.8mpps bsearch4: ~69mpps dpdk_lpm4: ~67 mpps 700k routes: radix4_lockless: 3.3mpps dpdk_lpm4: 46mpps IPv6: 8 routes: radix6_lockless: ~20mpps dpdk_lpm6: ~70mpps 100k routes: radix6_lockless: 13.9mpps dpdk_lpm6: 57mpps Forwarding benchmarks: + 10-15% IPv4 forwarding performance (small-fib, bsearch4) + 25% IPv4 forwarding performance (full-view, dpdk_lpm4) + 20% IPv6 forwarding performance (full-view, dpdk_lpm6) Control: Framwork adds the following runtime sysctls: List algos * net.route.algo.inet.algo_list: bsearch4, radix4_lockless, radix4 * net.route.algo.inet6.algo_list: radix6_lockless, radix6, dpdk_lpm6 Debug level (7=LOG_DEBUG, per-route) net.route.algo.debug_level: 5 Algo selection (currently only for fib 0): net.route.algo.inet.algo: bsearch4 net.route.algo.inet6.algo: radix6_lockless Support for manually changing algos in non-default fib will be added soon. Some sysctl names will be changed in the near future. Differential Revision: https://reviews.freebsd.org/D27401	2020-12-25 11:33:17 +00:00
Michael Tuexen	0ec2ce0d32	Improve input validation for parameters in ASCONF and ASCONF-ACK chunks Thanks to Tolya Korniltsev for drawing my attention to this part of the code by reporting an issue for the userland stack.	2020-12-23 18:03:47 +01:00
Andrew Gallatin	a034518ac8	Filter TCP connections to SO_REUSEPORT_LB listen sockets by NUMA domain In order to efficiently serve web traffic on a NUMA machine, one must avoid as many NUMA domain crossings as possible. With SO_REUSEPORT_LB, a number of workers can share a listen socket. However, even if a worker sets affinity to a core or set of cores on a NUMA domain, it will receive connections associated with all NUMA domains in the system. This will lead to cross-domain traffic when the server writes to the socket or calls sendfile(), and memory is allocated on the server's local NUMA node, but transmitted on the NUMA node associated with the TCP connection. Similarly, when the server reads from the socket, he will likely be reading memory allocated on the NUMA domain associated with the TCP connection. This change provides a new socket ioctl, TCP_REUSPORT_LB_NUMA. A server can now tell the kernel to filter traffic so that only incoming connections associated with the desired NUMA domain are given to the server. (Of course, in the case where there are no servers sharing the listen socket on some domain, then as a fallback, traffic will be hashed as normal to all servers sharing the listen socket regardless of domain). This allows a server to deal only with traffic that is local to its NUMA domain, and avoids cross-domain traffic in most cases. This patch, and a corresponding small patch to nginx to use TCP_REUSPORT_LB_NUMA allows us to serve 190Gb/s of kTLS encrypted https media content from dual-socket Xeons with only 13% (as measured by pcm.x) cross domain traffic on the memory controller. Reviewed by: jhb, bz (earlier version), bcr (man page) Tested by: gonzo Sponsored by: Netfix Differential Revision: https://reviews.freebsd.org/D21636	2020-12-19 22:04:46 +00:00
Michael Tuexen	0066de1c4b	Harden the handling of outgoing streams in case of an restart or INIT collision. This avouds an out-of-bounce access in case the peer can break the cookie signature. Thanks to Felix Wilhelm from Google for reporting the issue. MFC after: 1 week	2020-12-13 23:51:51 +00:00
Michael Tuexen	aa6db9a045	Clean up more resouces of an existing SCTP association in case of a restart. This fixes a use-after-free scenario, which was reported by Felix Wilhelm from Google in case a peer is able to modify the cookie. However, this can also be triggered by an assciation restart under some specific conditions. MFC after: 1 week	2020-12-12 22:23:45 +00:00
Richard Scheffenegger	0e1d7c25c5	Add TCP feature Proportional Rate Reduction (PRR) - RFC6937 PRR improves loss recovery and avoids RTOs in a wide range of scenarios (ACK thinning) over regular SACK loss recovery. PRR is disabled by default, enable by net.inet.tcp.do_prr = 1. Performance may be impeded by token bucket rate policers at the bottleneck, where net.inet.tcp.do_prr_conservate = 1 should be enabled in addition. Submitted by: Aris Angelogiannopoulos Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D18892	2020-12-04 11:29:27 +00:00
Alexander V. Chernikov	d1d941c5b9	Remove RADIX_MPATH config option. ROUTE_MPATH is the new config option controlling new multipath routing implementation. Remove the last pieces of RADIX_MPATH-related code and the config option. Reviewed by: glebius Differential Revision: https://reviews.freebsd.org/D27244	2020-11-29 19:43:33 +00:00
Alexander V. Chernikov	b712e3e343	Refactor fib4/fib6 functions. No functional changes. * Make lookup path of fib<4\|6>_lookup_debugnet() separate functions (fib<46>_lookup_rt()). These will be used in the control plane code requiring unlocked radix operations and actual prefix pointer. * Make lookup part of fib<4\|6>_check_urpf() separate functions. This change simplifies the switch to alternative lookup implementations, which helps algorithmic lookups introduction. * While here, use static initializers for IPv4/IPv6 keys Differential Revision: https://reviews.freebsd.org/D27405	2020-11-29 13:41:49 +00:00
Michael Tuexen	75fcd27ac2	Fix two occurences of a typo in a comment introduced in r367530. Reported by: lstewart@ MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D27148	2020-11-23 10:13:56 +00:00
Alexander V. Chernikov	7511a63825	Refactor rib iterator functions. * Make rib_walk() order of arguments consistent with the rest of RIB api * Add rib_walk_ext() allowing to exec callback before/after iteration. * Rename rt_foreach_fib_walk_del -> rib_foreach_table_walk_del * Rename rt_forach_fib_walk -> rib_foreach_table_walk * Move rib_foreach_table_walk{_del} to route/route_helpers.c * Slightly refactor rib_foreach_table_walk{_del} to make the implementation consistent and prepare for upcoming iterator optimizations. Differential Revision: https://reviews.freebsd.org/D27219	2020-11-22 20:21:10 +00:00
Michael Tuexen	47384244f9	Fix an issue I introuced in r367530: tcp_twcheck() can be called with to == NULL for SYN segments. So don't assume tp != NULL. Thanks to jhb@ for reporting and suggesting a fix. PR: 250499 MFC after: 1 week XMFC-with: r367530 Sponsored by: Netflix, Inc.	2020-11-20 13:00:28 +00:00
Ed Maste	360d1232ab	ip_fastfwd: style(9) tidy for r367628 Discussed with: gnn MFC with: r367628	2020-11-13 18:25:07 +00:00
George V. Neville-Neil	d65d6d5aa9	Followup pointed out by ae@	2020-11-13 13:07:44 +00:00
George V. Neville-Neil	8ad114c082	An earlier commit effectively turned out the fast forwading path due to its lack of support for ICMP redirects. The following commit adds redirects to the fastforward path, again allowing for decent forwarding performance in the kernel. Reviewed by: ae, melifaro Sponsored by: Rubicon Communications, LLC (d/b/a "Netgate")	2020-11-12 21:58:47 +00:00
Michael Tuexen	283c76c7c3	RFC 7323 specifies that: * TCP segments without timestamps should be dropped when support for the timestamp option has been negotiated. * TCP segments with timestamps should be processed normally if support for the timestamp option has not been negotiated. This patch enforces the above. PR: 250499 Reviewed by: gnn, rrs MFC after: 1 week Sponsored by: Netflix, Inc Differential Revision: https://reviews.freebsd.org/D27148	2020-11-09 21:49:40 +00:00
Michael Tuexen	e597bae4ee	Fix a potential use-after-free bug introduced in https://svnweb.freebsd.org/changeset/base/363046 Thanks to Taylor Brandstetter for finding this issue using fuzz testing and reporting it in https://github.com/sctplab/usrsctp/issues/547	2020-11-09 13:12:07 +00:00
Mitchell Horne	b02c4e5c78	igmp: convert igmpstat to use PCPU counters Currently there is no locking done to protect this structure. It is likely okay due to the low-volume nature of IGMP, but allows for the possibility of underflow. This appears to be one of the only holdouts of the conversion to counter(9) which was done for most protocol stat structures around 2013. This also updates the visibility of this stats structure so that it can be consumed from elsewhere in the kernel, consistent with the vast majority of VNET_PCPUSTAT structures. Reviewed by: kp Sponsored by: NetApp, Inc. Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D27023	2020-11-08 18:49:23 +00:00
Richard Scheffenegger	4d0770f172	Prevent premature SACK block transmission during loss recovery Under specific conditions, a window update can be sent with outdated SACK information. Some clients react to this by subsequently delaying loss recovery, making TCP perform very poorly. Reported by: chengc_netapp.com Reviewed by: rrs, jtl MFC after: 2 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D24237	2020-11-08 18:47:05 +00:00
John Baldwin	36e0a362ac	Add m_snd_tag_alloc() as a wrapper around if_snd_tag_alloc(). This gives a more uniform API for send tag life cycle management. Reviewed by: gallatin, hselasky Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D27000	2020-10-29 23:28:39 +00:00
John Baldwin	98d7a8d9cd	Call m_snd_tag_rele() to free send tags. Send tags are refcounted and if_snd_tag_free() is called by m_snd_tag_rele() when the last reference is dropped on a send tag. Reviewed by: gallatin, hselasky Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D26995	2020-10-29 22:18:56 +00:00
John Baldwin	7552deb2a0	Remove an extra if_ref(). In r348254, if_snd_tag_alloc() routines were changed to bump the ifp refcount via m_snd_tag_init(). This function wasn't in the tree at the time and wasn't updated for the new semantics, so was still doing a separate bump after if_snd_tag_alloc() returned. Reviewed by: gallatin Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D26999	2020-10-29 22:16:59 +00:00
John Baldwin	aebfdc1fec	Store the new send tag in the right place. r350501 added the 'st' parameter, but did not pass it down to if_snd_tag_alloc(). Reviewed by: gallatin Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D26997	2020-10-29 22:14:34 +00:00
John Baldwin	521eac97f3	Support hardware rate limiting (pacing) with TLS offload. - Add a new send tag type for a send tag that supports both rate limiting (packet pacing) and TLS offload (mostly similar to D22669 but adds a separate structure when allocating the new tag type). - When allocating a send tag for TLS offload, check to see if the connection already has a pacing rate. If so, allocate a tag that supports both rate limiting and TLS offload rather than a plain TLS offload tag. - When setting an initial rate on an existing ifnet KTLS connection, set the rate in the TCP control block inp and then reset the TLS send tag (via ktls_output_eagain) to reallocate a TLS + ratelimit send tag. This allocates the TLS send tag asynchronously from a task queue, so the TLS rate limit tag alloc is always sleepable. - When modifying a rate on a connection using KTLS, look for a TLS send tag. If the send tag is only a plain TLS send tag, assume we failed to allocate a TLS ratelimit tag (either during the TCP_TXTLS_ENABLE socket option, or during the send tag reset triggered by ktls_output_eagain) and ignore the new rate. If the send tag is a ratelimit TLS send tag, change the rate on the TLS tag and leave the inp tag alone. - Lock the inp lock when setting sb_tls_info for a socket send buffer so that the routines in tcp_ratelimit can safely dereference the pointer without needing to grab the socket buffer lock. - Add an IFCAP_TXTLS_RTLMT capability flag and associated administrative controls in ifconfig(8). TLS rate limit tags are only allocated if this capability is enabled. Note that TLS offload (whether unlimited or rate limited) always requires IFCAP_TXTLS[46]. Reviewed by: gallatin, hselasky Relnotes: yes Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D26691	2020-10-29 00:23:16 +00:00
John Baldwin	ce39811544	Save the current TCP pacing rate in t_pacing_rate. Reviewed by: gallatin, gnn Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D26875	2020-10-29 00:03:19 +00:00
Richard Scheffenegger	3767427354	TCP Cubic: improve reaction to (and rollback from) RTO 1. fix compliancy issue of CUBIC RTO handling according to RFC8312 section 4.7 2. add CUBIC CC_RTO_ERR handling Submitted by: chengc_netapp.com Reviewed by: rrs, tuexen, rscheff MFC after: 2 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D26808	2020-10-24 16:11:46 +00:00
Richard Scheffenegger	39a12f0178	tcp: move cwnd and ssthresh updates into cc modules This will pave the way of setting ssthresh differently in TCP CUBIC, according to RFC8312 section 4.7. No functional change, only code movement. Submitted by: chengc_netapp.com Reviewed by: rrs, tuexen, rscheff MFC after: 2 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D26807	2020-10-24 16:09:18 +00:00
Mark Johnston	4caea9b169	icmp6: Count packets dropped due to an invalid hop limit Pad the icmp6stat structure so that we can add more counters in the future without breaking compatibility again, last done in r358620. Annotate the rarely executed error paths with __predict_false while here. Reviewed by: bz, melifaro Sponsored by: NetApp, Inc. Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D26578	2020-10-19 17:07:19 +00:00
Alexander V. Chernikov	0c325f53f1	Implement flowid calculation for outbound connections to balance connections over multiple paths. Multipath routing relies on mbuf flowid data for both transit and outbound traffic. Current code fills mbuf flowid from inp_flowid for connection-oriented sockets. However, inp_flowid is currently not calculated for outbound connections. This change creates simple hashing functions and starts calculating hashes for TCP,UDP/UDP-Lite and raw IP if multipath routes are present in the system. Reviewed by: glebius (previous version),ae Differential Revision: https://reviews.freebsd.org/D26523	2020-10-18 17:15:47 +00:00
Alexander V. Chernikov	fa8b3fcb4c	Simplify NET_EPOCH_EXIT in inp_join_group(). Suggested by: kib	2020-10-18 12:03:36 +00:00
Alexander V. Chernikov	337418adf1	Fix sleepq_add panic happening with too wide net epoch in mcast control. PR: 250413 Reported by: Christopher Hall <hsw at bitmark.com> Reviewed by: ae Differential Revision: https://reviews.freebsd.org/D26827	2020-10-17 20:33:09 +00:00
Michael Tuexen	a92d501617	Improve the handling of cookie life times. The staleness reported in an error cause is in us, not ms. Enforce limits on the life time via sysct; and socket options consistently. Update the description of the sysctl variable to use the right unit. Also do some minor cleanups. This also fixes an interger overflow issue if the peer can modify the cookie. This was reported by Felix Weinrank by fuzz testing the userland stack and in https://oss-fuzz.com/testcase-detail/4800394024452096 MFC after: 3 days	2020-10-16 10:44:48 +00:00
Andrey V. Elsukov	6952c3e1ac	Implement SIOCGIFALIAS. It is lightweight way to check if an IPv4 address exists. Submitted by: Roy Marples Reviewed by: gnn, melifaro MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D26636	2020-10-14 09:22:54 +00:00
Andrey V. Elsukov	3f740d4393	Join to AllHosts multicast group again when adding an existing IPv4 address. When SIOCAIFADDR ioctl configures an IPv4 address that is already exist, it removes old ifaddr. When this IPv4 address is only one configured on the interface, this also leads to leaving from AllHosts multicast group. Then an address is added again, but due to the bug, this doesn't lead to joining to AllHosts multicast group. Submitted by: yannis.planus_alstomgroup.com Reviewed by: gnn MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D26757	2020-10-13 19:34:36 +00:00
Bjoern A. Zeeb	506512b170	ip_mroute: fix the viftable export sysctl It seems that in r354857 I got more than one thing wrong. Convert the SYSCTL_OPAQUE to a SYSCTL_PROC to properly export the these days allocated and not longer static per-vnet viftable array. This fixes a problem with netstat -g which would show bogus information for the IPv4 Virtual Interface Table. PR: 246626 Reported by: Ozkan KIRIK (ozkan.kirik gmail.com) MFC after: 3 days	2020-10-11 00:01:00 +00:00
Richard Scheffenegger	4b72ae16ed	Stop sending tiny new data segments during SACK recovery Consider the currently in-use TCP options when calculating the amount of new data to be injected during SACK loss recovery. That addresses the effect that very small (new) segments could be injected on partial ACKs while still performing a SACK loss recovery. Reported by: Liang Tian Reviewed by: tuexen, chengc_netapp.com MFC after: 2 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D26446	2020-10-09 12:44:56 +00:00
Richard Scheffenegger	868aabb470	Add IP(V6)_VLAN_PCP to set 802.1 priority per-flow. This adds a new IP_PROTO / IPV6_PROTO setsockopt (getsockopt) option IP(V6)_VLAN_PCP, which can be set to -1 (interface default), or explicitly to any priority between 0 and 7. Note that for untagged traffic, explicitly adding a priority will insert a special 801.1Q vlan header with vlan ID = 0 to carry the priority setting Reviewed by: gallatin, rrs MFC after: 2 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D26409	2020-10-09 12:06:43 +00:00
Richard Scheffenegger	5432120028	Extend netstat to display TCP stack and detailed congestion state (2) Extend netstat to display TCP stack and detailed congestion state Adding the "-c" option used to show detailed per-connection congestion control state for TCP sessions. This is one summary patch, which adds the relevant variables into xtcpcb. As previous "spare" space is used, these changes are ABI compatible. Reviewed by: tuexen MFC after: 2 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D26518	2020-10-09 10:55:19 +00:00
Michael Tuexen	e7a39b856a	Minor cleanups. MFC after: 3 days	2020-10-07 15:22:48 +00:00
John Baldwin	9aed26b906	Check if_capenable, not if_capabilities when enabling rate limiting. if_capabilities is a read-only mask of supported capabilities. if_capenable is a mask under administrative control via ifconfig(8). Reviewed by: gallatin Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D26690	2020-10-06 18:02:33 +00:00
Michael Tuexen	6f155d690b	Reset delayed SACK state when restarting an SCTP association. MFC after: 3 days	2020-10-06 14:26:05 +00:00
Michael Tuexen	b954d81662	Ensure variables are initialized before used. MFC after: 3 days	2020-10-06 11:29:08 +00:00
Michael Tuexen	6176f9d6df	Remove dead stores reported by clang static code analysis MFC after: 3 days	2020-10-06 11:08:52 +00:00
Michael Tuexen	11daa73adc	Cleanup, no functional change intended. MFC after: 3 days	2020-10-06 10:41:04 +00:00
Michael Tuexen	c8e55b3c0c	Whitespace changes. MFC after: 3 days	2020-10-06 09:51:40 +00:00
Michael Tuexen	9f2d6263bb	Use __func__ instead of __FUNCTION__ for consistency. MFC after: 3 days	2020-10-04 15:37:34 +00:00
Michael Tuexen	d0ed75b3b1	Cleanup, no functional change intended. MFC after: 3 days	2020-10-04 15:22:14 +00:00
Alexander V. Chernikov	fedeb08b6a	Introduce scalable route multipath. This change is based on the nexthop objects landed in D24232. The change introduces the concept of nexthop groups. Each group contains the collection of nexthops with their relative weights and a dataplane-optimized structure to enable efficient nexthop selection. Simular to the nexthops, nexthop groups are immutable. Dataplane part gets compiled during group creation and is basically an array of nexthop pointers, compiled w.r.t their weights. With this change, `rt_nhop` field of `struct rtentry` contains either nexthop or nexthop group. They are distinguished by the presense of NHF_MULTIPATH flag. All dataplane lookup functions returns pointer to the nexthop object, leaving nexhop groups details inside routing subsystem. User-visible changes: The change is intended to be backward-compatible: all non-mpath operations should work as before with ROUTE_MPATH and net.route.multipath=1. All routes now comes with weight, default weight is 1, maximum is 2^24-1. Current maximum multipath group width is statically set to 64. This will become sysctl-tunable in the followup changes. Using functionality: * Recompile kernel with ROUTE_MPATH * set net.route.multipath to 1 route add -6 2001:db8::/32 2001:db8::2 -weight 10 route add -6 2001:db8::/32 2001:db8::3 -weight 20 netstat -6On Nexthop groups data Internet6: GrpIdx NhIdx Weight Slots Gateway Netif Refcnt 1 ------- ------- ------- --------------------------------------- --------- 1 13 10 1 2001:db8::2 vlan2 14 20 2 2001:db8::3 vlan2 Next steps: * Land outbound hashing for locally-originated routes ( D26523 ). * Fix net/bird multipath (net/frr seems to work fine) * Add ROUTE_MPATH to GENERIC * Set net.route.multipath=1 by default Tested by: olivier Reviewed by: glebius Relnotes: yes Differential Revision: https://reviews.freebsd.org/D26449	2020-10-03 10:47:17 +00:00
Michael Tuexen	b15f541113	Improve the input validation and processing of cookies. This avoids setting the association in an inconsistent state, which could result in a use-after-free situation. This can be triggered by a malicious peer, if the peer can modify the cookie without the local endpoint recognizing it. Thanks to Ned Williamson for reporting the issue. MFC after: 3 days	2020-09-29 09:36:06 +00:00
Michael Tuexen	fbc6840bae	Minor cleanup. MFC after: 3 days	2020-09-28 14:11:53 +00:00
Michael Tuexen	1d1b4bce53	Cleanup, no functional change intended. MFC after: 3 days	2020-09-27 13:32:02 +00:00
Michael Tuexen	8f269b8242	Improve the handling of receiving unordered and unreliable user messages using DATA chunks. Don't use fsn_included when not being sure that it is set to an appropriate value. If the default is used, which is -1, this can result in SCTP associaitons not making any user visible progress. Thanks to Yutaka Takeda for reporting this issue for the the userland stack in https://github.com/pion/sctp/issues/138. MFC after: 3 days	2020-09-27 13:24:01 +00:00
Richard Scheffenegger	e399566123	TCP: send full initial window when timestamps are in use The fastpath in tcp_output tries to send out full segments, and avoid sending partial segments by comparing against the static t_maxseg variable. That value does not consider tcp options like timestamps, while the initial window calculation is using the correct dynamic tcp_maxseg() function. Due to this interaction, the last, full size segment is considered too short and not sent out immediately. Reviewed by: tuexen MFC after: 2 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D26478	2020-09-25 10:38:19 +00:00
Richard Scheffenegger	1567c937e2	TCP newreno: improve after_idle ssthresh Adjust ssthresh in after_idle to the maximum of the prior ssthresh, or 3/4 of the prior cwnd. See RFC2861 section 2 for an in depth explanation for the rationale around this. As newreno is the default "fall-through" reaction, most tcp variants will benefit from this. Reviewed by: tuexen MFC after: 2 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D22438	2020-09-25 10:23:14 +00:00
Michael Tuexen	b6db274d1e	Whitespace changes. MFC after: 3 days	2020-09-24 12:26:06 +00:00
Alexander V. Chernikov	2259a03020	Rework part of routing code to reduce difference to D26449. * Split rt_setmetrics into get_info_weight() and rt_set_expire_info(), as these two can be applied at different entities and at different times. * Start filling route weight in route change notifications * Pass flowid to UDP/raw IP route lookups * Rework nd6_subscription_cb() and sysctl_dumpentry() to prepare for the fact that rtentry can contain multiple nexthops. Differential Revision: https://reviews.freebsd.org/D26497	2020-09-21 20:02:26 +00:00
Alexander V. Chernikov	1440f62266	Remove unused nhop_ref_any() function. Remove "opt_mpath.h" header where not needed. No functional changes.	2020-09-20 21:32:52 +00:00
Mitchell Horne	374ce2488a	Initialize some local variables earlier Move the initialization of these variables to the beginning of their respective functions. On our end this creates a small amount of unneeded churn, as these variables are properly initialized before their first use in all cases. However, changing this benefits at least one downstream consumer (NetApp) by allowing local and future modifications to these functions to be made without worrying about where the initialization occurs. Reviewed by: melifaro, rscheff Sponsored by: NetApp, Inc. Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D26454	2020-09-18 14:01:10 +00:00
Navdeep Parhar	b092fd6c97	if_vxlan(4): add support for hardware assisted checksumming, TSO, and RSS. This lets a VXLAN pseudo-interface take advantage of hardware checksumming (tx and rx), TSO, and RSS if the NIC is capable of performing these operations on inner VXLAN traffic. A VXLAN interface inherits the capabilities of its vxlandev interface if one is specified or of the interface that hosts the vxlanlocal address. If other interfaces will carry traffic for that VXLAN then they must have the same hardware capabilities. On transmit, if_vxlan verifies that the outbound interface has the required capabilities and then translates the CSUM_ flags to their inner equivalents. This tells the hardware ifnet that it needs to operate on the inner frame and not the outer VXLAN headers. An event is generated when a VXLAN ifnet starts. This allows hardware drivers to configure their devices to expect VXLAN traffic on the specified incoming port. On receive, the hardware does RSS and checksum verification on the inner frame. if_vxlan now does a direct netisr dispatch to take full advantage of RSS. It is not very clear why it didn't do this already. Future work: Rx: it should be possible to avoid the first trip up the protocol stack to get the frame to if_vxlan just so it can decapsulate and requeue for a second trip up the stack. The hardware NIC driver could directly call an if_vxlan receive routine for VXLAN traffic instead. Rx: LRO. depends on what happens with the previous item. There will have to to be a mechanism to indicate that it's time for if_vxlan to flush its LRO state. Reviewed by: kib@ Relnotes: Yes Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D25873	2020-09-18 02:37:57 +00:00
Navdeep Parhar	72cc43df17	Add a knob to allow zero UDP checksums for UDP/IPv6 traffic on the given UDP port. This will be used by some upcoming changes to if_vxlan(4). RFC 7348 (VXLAN) says that the UDP checksum "SHOULD be transmitted as zero. When a packet is received with a UDP checksum of zero, it MUST be accepted for decapsulation." But the original IPv6 RFCs did not allow zero UDP checksum. RFC 6935 attempts to resolve this. Reviewed by: kib@ Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D25873	2020-09-18 02:21:15 +00:00
Michael Tuexen	42d7560796	Export the name of the congestion control. This will be used by sockstat and netstat. Reviewed by: rscheff MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D26412	2020-09-13 09:06:50 +00:00
Richard Scheffenegger	e74e64a191	cc_mod: remove unused CCF_DELACK definition During the DCTCP improvements, use of CCF_DELACK was removed. This change is just to rename the unused flag bit to prevent use of it, without also re-implementing the tcp_input and tcp_output interfaces. No functional change. Reviewed by: chengc_netapp.com, tuexen MFC after: 2 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D26181	2020-09-10 00:46:38 +00:00
Randall Stewart	285385ba56	So it turns out that syzkaller hit another crash. It has to do with switching stacks with a SENT_FIN outstanding. Both rack and bbr will only send a FIN if all data is ack'd so this must be enforced. Also if the previous stack sent the FIN we need to make sure in rack that when we manufacture the "unknown" sends that we include the proper HAS_FIN bits. Note for BBR we take a simpler approach and just refuse to switch. Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D26269	2020-09-09 11:11:50 +00:00
Bjoern A. Zeeb	67d224ef43	bbr: remove unused static function bbr_log_type_hrdwtso() is a file local static unused function. Remove it to avoid warnings on kernel compiles. Reviewed by: gallatin Differential Revision: https://reviews.freebsd.org/D26331	2020-09-05 00:20:32 +00:00
Mateusz Guzik	662c13053f	net: clean up empty lines in .c and .h files	2020-09-01 21:19:14 +00:00
Alexander V. Chernikov	a624ca3dff	Move net/route/shared.h definitions to net/route/route_var.h. No functional changes. net/route/shared.h was created in the inital phases of nexthop conversion. It was intended to serve the same purpose as route_var.h - share definitions of functions and structures between the routing subsystem components. At that time route_var.h was included by many files external to the routing subsystem, which largerly defeats its purpose. As currently this is not the case anymore and amount of route_var.h includes is roughly the same as shared.h, retire the latter in favour of the former.	2020-08-28 22:50:20 +00:00
Michael Tuexen	404ff76bda	Fix a regression with the explicit EOR mode I introduced in r364268. A short MFC time as discussed with the secteam. Reported by: Taylor Brandstetter MFC after: 1 day	2020-08-28 20:05:18 +00:00
Michael Tuexen	1951fa791e	RFC 3465 defines a limit L used in TCP slow start for limiting the number of acked bytes as described in Section 2.2 of that document. This patch ensures that this limit is not also applied in congestion avoidance. Applying this limit also in congestion avoidance can result in using less bandwidth than allowed. Reported by: l.tian.email@gmail.com Reviewed by: rrs, rscheff MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D26120	2020-08-25 09:42:03 +00:00
Warner Losh	773e541e8d	Use devctl.h instead of bus.h to reduce newbus pollution. There's no need for these parts of the kernel to know about newbus, so narrow what is included to devctl.h for device_notify_*. Suggested by: kib@	2020-08-21 00:03:24 +00:00
Andrew Gallatin	b99781834f	TCP: remove special treatment for hardware (ifnet) TLS Remove most special treatment for ifnet TLS in the TCP stack, except for code to avoid mixing handshakes and bulk data. This code made heroic efforts to send down entire TLS records to NICs. It was added to improve the PCIe bus efficiency of older TLS offload NICs which did not keep state per-session, and so would need to re-DMA the first part(s) of a TLS record if a TLS record was sent in multiple TCP packets or TSOs. Newer TLS offload NICs do not need this feature. At Netflix, we've run extensive QoE tests which show that this feature reduces client quality metrics, presumably because the effort to send TLS records atomically causes the server to both wait too long to send data (leading to buffers running dry), and to send too much data at once (leading to packet loss). Reviewed by: hselasky, jhb, rrs Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D26103	2020-08-19 17:59:06 +00:00
Richard Scheffenegger	ad7a0eb189	TCP Cubic: recalculate cwnd for every ACK. Since cubic calculates cwnd based on absolute time, retaining RFC3465 (ABC) once-per-window updates can lead to dramatic changes of cwnd in the convex region. Updating cwnd for each incoming ack minimizes this delta, preventing unintentional line-rate bursts. Reviewed by: chengc_netapp.com, tuexen (mentor) MFC after: 2 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D26060	2020-08-18 19:34:31 +00:00
Michael Tuexen	d7351394da	Fix two bugs I introduced in r362563. Found by running syzkaller. MFC after: 3 days	2020-08-18 19:25:03 +00:00
Michael Tuexen	d59f3890c3	Remove a line which is needed and was added in https://svnweb.freebsd.org/changeset/base/364268 MFC after: 3 days	2020-08-16 13:31:14 +00:00
Michael Tuexen	f5d30f7f76	Improve the handling of concurrent send() calls for SCTP sockets, especially when having the explicit EOR mode enabled. Reported by: Megan2013678@protonmail.com Reported by: syzbot+bc02585076c3cc977f9b@syzkaller.appspotmail.com MFC after: 3 days	2020-08-16 11:50:37 +00:00
Michael Tuexen	04996cb74b	Enter epoch earlier. This is needed because we are exiting it also in error cases. MFC after: 1 week	2020-08-15 11:22:07 +00:00
Alexander V. Chernikov	2f23f45b20	Simplify dom_<rtattach\|rtdetach>. Remove unused arguments from dom_rtattach/dom_rtdetach functions and make them return/accept 'struct rib_head' instead of 'void **'. Declare inet/inet6 implementations in the relevant _var.h headers similar to domifattach / domifdetach. Add rib_subscribe_internal() function to accept subscriptions to the rnh directly. Differential Revision: https://reviews.freebsd.org/D26053	2020-08-14 21:29:56 +00:00
Richard Scheffenegger	a459638fc4	TCP Cubic: Have Fast Convergence Heuristic work for ECN, and align concave region The Cubic concave region was not aligned nicely for the very first exit from slow start, where a 50% cwnd reduction is done instead of the normal 30%. This addresses an issue, where a short line-rate burst could result from that sudden jump of cwnd. In addition, the Fast Convergence Heuristic has been expanded to work also with ECN induced congestion response. Submitted by: chengc_netapp.com Reported by: chengc_netapp.com Reviewed by: tuexen (mentor), rgrimes (mentor) Approved by: tuexen (mentor), rgrimes (mentor) MFC after: 3 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D25976	2020-08-13 16:45:55 +00:00
Richard Scheffenegger	2bb6dfabbe	TCP Cubic: After leaving slowstart fix unintended cwnd jump. Initializing K to zero in D23655 introduced a miscalculation, where cwnd would suddenly jump to cwnd_max instead of gradually increasing, after leaving slow-start. Properly calculating K instead of resetting it to zero resolves this issue. Also making sure, that cwnd is recalculated at the earliest opportunity once slow-start is over. Reported by: chengc_netapp.com Reviewed by: chengc_netapp.com, tuexen (mentor), rgrimes (mentor) Approved by: tuexen (mentor), rgrimes (mentor) MFC after: 3 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D25746	2020-08-13 16:38:51 +00:00
Richard Scheffenegger	f359d6ebbc	Improve SACK support code for RFC6675 and PRR Adding proper accounting of sacked_bytes and (per-ACK) delivered data to the SACK scoreboard. This will allow more aspects of RFC6675 to be implemented as well as Proportional Rate Reduction (RFC6937). Prior to this change, the pipe calculation controlled with net.inet.tcp.rfc6675_pipe was also susceptible to incorrect results when more than 3 (or 4) holes in the sequence space were present, which can no longer all fit into a single ACK's SACK option. Reviewed by: kbowling, rgrimes (mentor) Approved by: rgrimes (mentor, blanket) MFC after: 3 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D18624	2020-08-13 16:30:09 +00:00
Hans Petter Selasky	b453d3d239	Use a static initializer for the multicast free tasks. This makes the SYSINIT() function updated in r364072 superfluous. Suggested by: glebius@ MFC after: 1 week Sponsored by: Mellanox Technologies	2020-08-11 08:31:40 +00:00
Michael Tuexen	cf8a49ab6e	Fix the following issues related to the TCP SYN-cache: * Let the accepted TCP/IPv4 socket inherit the configured TTL and TOS value. * Let the accepted TCP/IPv6 socket inherit the configured Hop Limit. * Use the configured Hop Limit and Traffic Class when sending IPv6 packets. Reviewed by: rrs, lutz_donnerhacke.de MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D25909	2020-08-10 20:24:48 +00:00
Bjoern A. Zeeb	f9461246a2	MC: add a note with reference to the discussion and history as-to why we are where we are now. The main thing is to try to get rid of the delayed freeing to avoid blocking on the taskq when shutting down vnets. X-Timeout: if you still see this before 14-RELEASE remove it.	2020-08-10 10:58:43 +00:00
Hans Petter Selasky	3689652c65	Make sure the multicast release tasks are properly drained when destroying a VNET or a network interface. Else the inm release tasks, both IPv4 and IPv6 may cause a panic accessing a freed VNET or network interface. Reviewed by: jmg@ Discussed with: bz@ Differential Revision: https://reviews.freebsd.org/D24914 MFC after: 1 week Sponsored by: Mellanox Technologies	2020-08-10 10:46:08 +00:00
Hans Petter Selasky	a95ef9d38d	Use proper prototype for SYSINIT() functions. Mark the unused argument using the __unused macro. Discussed with: kib@ MFC after: 1 week Sponsored by: Mellanox Technologies	2020-08-10 10:40:19 +00:00
Michael Tuexen	1bea15e601	Improve the ECN negotiation when the TCP SYN-cache is used by making sure that * ECN is disabled if the client sends an non-ECN-setup SYN segment. * ECN is disabled is the ECN-setup SYN-ACK segment is retransmitted more than net.inet.tcp.ecn.maxretries times. Reviewed by: rscheff MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D26008	2020-08-08 19:39:38 +00:00
Bjoern A. Zeeb	a9839c4aee	IPV6_PKTINFO support for v4-mapped IPv6 sockets When using v4-mapped IPv6 sockets with IPV6_PKTINFO we do not respect the given v4-mapped src address on the IPv4 socket. Implement the needed functionality. This allows single-socket UDP applications (such as OpenVPN) to work better on FreeBSD. Requested by: Gert Doering (gert greenie.net), pfsense Tested by: Gert Doering (gert greenie.net) Reviewed by: melifaro MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D24135	2020-08-07 15:13:53 +00:00
Randall Stewart	8315f1ea26	The recent changes to move the ref count increment back from the end of the function created an issue. If one of the routines returns NULL during setup we have inp's with extra references (which is why the increment was at the end). Also the stack switch return code was being ignored and actually has meaning if the stack cannot take over it should return NULL. Fix both of these situation by being sure to test the return code and of course in any case of return NULL (there are 3) make sure we properly reduce the ref count. Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D25903	2020-07-31 10:03:32 +00:00
Michael Tuexen	205f3e1597	Clear the pointer to the socket when closing it also in case of an ungraceful operation. This fixes a use-after-free bug found and reported by Taylor Brandstetter of Google by testing the userland stack. MFC after: 1 week	2020-07-23 19:43:49 +00:00
Michael Tuexen	91e04f9e7a	Detect and handle an invalid reassembly constellation, which results in a memory leak. Thanks to Felix Weinrank for finding this issue using fuzz testing the userland stack. MFC after: 1 week	2020-07-23 01:35:24 +00:00
Richard Scheffenegger	cce999b38f	Fix style and comment around concave/convex regions in TCP cubic. In cubic, the concave region is when snd_cwnd starts growing slower towards max_cwnd (cwnd at the time of the congestion event), and the convex region is when snd_cwnd starts to grow faster and eventually appearing like slow-start like growth. PR: 238478 Reviewed by: tuexen (mentor), rgrimes (mentor) Approved by: tuexen (mentor), rgrimes (mentor) MFC after: 2 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D24657	2020-07-21 16:21:52 +00:00
Richard Scheffenegger	66ba9aafcf	Add MODULE_VERSION to TCP loadable congestion control modules. Without versioning information, using preexisting loader / linker code is not easily possible when another module may have dependencies on pre-loaded modules, and also doesn't allow the automatic loading of dependent modules. No functional change of the actual modules. Reviewed by: tuexen (mentor), rgrimes (mentor) Approved by: tuexen (mentor), rgrimes (mentor) MFC after: 2 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D25744	2020-07-20 23:47:27 +00:00
Michael Tuexen	8745f898c4	Add reference counts for inp/stcb/net when timers are running. This avoids a use-after-free reported for the userland stack. Thanks to Taylor Brandstetter for suggesting a patch for the userland stack. MFC after: 1 week	2020-07-19 12:34:19 +00:00
Michael Tuexen	05bceec68e	Remove code which is not needed. MFC after: 1 week	2020-07-18 13:10:02 +00:00
Michael Tuexen	7f0ad2274b	Improve the locking of address lists by adding some asserts and rearranging the addition of address such that the lock is not given up during checking and adding. MFC after: 1 week	2020-07-17 15:09:49 +00:00
Michael Tuexen	f903a308a1	(Re)-allow 0.0.0.0 to be used as an address in connect() for TCP In r361752 an error handling was introduced for using 0.0.0.0 or 255.255.255.255 as the address in connect() for TCP, since both addresses can't be used. However, the stack maps 0.0.0.0 implicitly to a local address and at least two regressions were reported. Therefore, re-allow the usage of 0.0.0.0. While there, change the error indicated when using 255.255.255.255 from EAFNOSUPPORT to EACCES as mentioned in the man-page of connect(). Reviewed by: rrs MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D25401	2020-07-16 16:46:24 +00:00
Michael Tuexen	504ee6a001	Improve the error handling in generating ASCONF chunks. In case of errors, the cleanup was not consistent. Thanks to Felix Weinrank for fuzzing the userland stack and making me aware of the issue. MFC after: 1 week	2020-07-14 20:32:50 +00:00
Michael Tuexen	cd7518203d	Cleanup, no functional change intended. This file is only compiled if INET or INET6 is defined. So there is no need for checking that. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D25635	2020-07-12 18:34:09 +00:00
Michael Tuexen	83b8204f61	(Re)activate SCTP system calls when compiling SCTP support into the kernel r363079 introduced the possibility of loading the SCTP stack as a module in addition to compiling it into the kernel. As part of this, the registration of the system calls was removed and put into the loading of the module. Therefore, the system calls are not registered anymore when compiling the SCTP into the kernel. This patch addresses that. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D25632	2020-07-12 14:50:12 +00:00
Michael Tuexen	4ef7c2f28f	Whitespace changes due to upstreaming r363079.	2020-07-10 16:59:06 +00:00
Mark Johnston	052c5ec4d0	Provide support for building SCTP as a loadable module. With this change, a kernel compiled with "options SCTP_SUPPORT" and without "options SCTP" supports dynamic loading of the SCTP stack. Currently sctp.ko cannot be unloaded since some prerequisite teardown logic is not yet implemented. Attempts to unload the module will return EOPNOTSUPP. Discussed with: tuexen MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21997	2020-07-10 14:56:05 +00:00
Michael Tuexen	6ddc843832	Fix a use-after-free bug for the userland stack. The kernel stack is not affected. Thanks to Mark Wodrich from Google for finding and reporting the bug. MFC after: 1 week	2020-07-10 11:15:10 +00:00
Michael Tuexen	b6734d8f4a	Optimize flushing of receive queues. This addresses an issue found and reported for the userland stack in https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=21243 MFC after: 1 week	2020-07-09 16:18:42 +00:00
Michael Tuexen	fcbfdc0ab6	Improve consistency. MFC after: 1 week	2020-07-08 16:23:40 +00:00
Michael Tuexen	ef9095c72a	Fix error description. MFC after: 1 week	2020-07-08 16:04:06 +00:00
Michael Tuexen	c96d7c373e	Don't accept FORWARD-TSN chunks when I-FORWARD-TSN was negotiated and vice versa. MFC after: 1 week	2020-07-08 15:49:30 +00:00
Michael Tuexen	32df1c9ebb	Improve handling of PKTDROP chunks. This includes the input validation to address two issues found by ossfuzz testing the userland stack: * https://oss-fuzz.com/testcase-detail/5387560242380800 * https://oss-fuzz.com/testcase-detail/4887954068865024 and adding support for I-DATA chunks in addition to DATA chunks.	2020-07-08 12:25:19 +00:00
Richard Scheffenegger	c201ce0b4a	Fix KASSERT during tcp_newtcpcb when low on memory While testing with system default cc set to cubic, and running a memory exhaustion validation, FreeBSD panics for a missing inpcb reference / lock. Reviewed by: rgrimes (mentor), tuexen (mentor) Approved by: rgrimes (mentor), tuexen (mentor) MFC after: 3 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D25583	2020-07-07 12:10:59 +00:00
Alexander V. Chernikov	6ad7446c6f	Complete conversions from fib<4\|6>_lookup_nh_<basic\|ext> to fib<4\|6>_lookup(). fib[46]_lookup_nh_ represents pre-epoch generation of fib api, providing less guarantees over pointer validness and requiring on-stack data copying. With no callers remaining, remove fib[46]_lookup_nh_ functions. Submitted by: Neel Chauhan <neel AT neelc DOT org> Differential Revision: https://reviews.freebsd.org/D25445	2020-07-02 21:04:08 +00:00
Michael Tuexen	e54b7cd007	Fix the cleanup handling in a error path for TCP BBR. Reported by: syzbot+df7899c55c4cc52f5447@syzkaller.appspotmail.com Reviewed by: rscheff Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D25486	2020-07-01 17:17:06 +00:00
Mark Johnston	d16a2e4784	Fix a possible next-hop refcount leak when handling IPSec traffic. It may be possible to fix this by deferring the lookup, but let's keep the initial change simple to make MFCs easier. PR: 246951 Reviewed by: melifaro MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D25519	2020-07-01 15:42:48 +00:00
Michael Tuexen	7a3f60e7f5	Fix a bug introduced in https://svnweb.freebsd.org/changeset/base/362173 Reported by: syzbot+f3a6fccfa6ae9d3ded29@syzkaller.appspotmail.com MFC after: 1 week	2020-06-30 21:50:05 +00:00
Michael Tuexen	e99ce3eac5	Don't send packets containing ERROR chunks in response to unknown chunks when being in a state where the verification tag to be used is not known yet. MFC after: 1 week	2020-06-28 14:11:36 +00:00
Michael Tuexen	f2f66ef6d2	Don't check ch for not being NULL, since that is true. MFC after: 1 week	2020-06-28 11:12:03 +00:00
John Baldwin	4a711b8d04	Use zfree() instead of explicit_bzero() and free(). In addition to reducing lines of code, this also ensures that the full allocation is always zeroed avoiding possible bugs with incorrect lengths passed to explicit_bzero(). Suggested by: cem Reviewed by: cem, delphij Approved by: csprng (cem) Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D25435	2020-06-25 20:17:34 +00:00
Michael Tuexen	132c073866	Fix the acconting for fragmented unordered messages when using interleaving. This was reported for the userland stack in https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=19321 MFC after: 1 week	2020-06-24 14:47:51 +00:00
Richard Scheffenegger	6e26dd0dbe	TCP: fix cubic RTO reaction. Proper TCP Cubic operation requires the knowledge of the maximum congestion window prior to the last congestion event. This restores and improves a bugfix previously added by jtl@ but subsequently removed due to a revert. Reported by: chengc_netapp.com Reviewed by: chengc_netapp.com, tuexen (mentor) Approved by: tuexen (mentor), rgrimes (mentor) MFC after: 2 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D25133	2020-06-24 13:52:53 +00:00
Richard Scheffenegger	9dc7d8a246	TCP: make after-idle work for transactional sessions. The use of t_rcvtime as proxy for the last transmission fails for transactional IO, where the client requests data before the server can respond with a bulk transfer. Set aside a dedicated variable to actually track the last locally sent segment going forward. Reported by: rrs Reviewed by: rrs, tuexen (mentor) Approved by: tuexen (mentor), rgrimes (mentor) MFC after: 2 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D25016	2020-06-24 13:42:42 +00:00
Michael Tuexen	87c0bf77d9	Fix alignment issue manifesting in the userland stack. MFC after: 1 wwek	2020-06-23 23:05:05 +00:00
Michael Tuexen	b88082dd39	No need to include netinet/sctp_crc32.h twice.	2020-06-22 14:36:14 +00:00
Mark Johnston	e6db509d10	Move the definition of SCTP's system_base_info into sctp_crc32.c. This file is the only SCTP source file compiled into the kernel when SCTP_SUPPORT is configured. sctp_delayed_checksum() references a couple of counters defined in system_base_info, so the change allows these counters to be referenced in a kernel compiled without "options SCTP". Submitted by: tuexen MFC with: r362338	2020-06-22 14:01:31 +00:00
Michael Tuexen	c5d9e5c99e	Cleanup the defintion of struct sctp_getaddresses. This stucture is used by the IPPROTO_SCTP level socket options SCTP_GET_PEER_ADDRESSES and SCTP_GET_LOCAL_ADDRESSES, which are used by libc to implement sctp_getladdrs() and sctp_getpaddrs(). These changes allow an old libc to work on a newer kernel.	2020-06-21 23:12:56 +00:00
Bjoern A. Zeeb	e387af1fa8	Rather than zeroing MAXVIFS times size of pointer [r362289] (still better than sizeof pointer before [r354857]), we need to zero MAXVIFS times the size of the struct. All good things come in threes; I hope this is it on this one. PR: 246629, 206583 Reported by: kib MFC after: ASAP	2020-06-21 22:09:30 +00:00
Michael Tuexen	171edd2110	Fix the build for an INET6 only configuration. The fix from the last commit is actually needed twice... MFC after: 1 week	2020-06-21 09:56:09 +00:00
Michael Tuexen	5087b6e732	Set a variable also in the case of an INET6 only kernel MFC after: 1 week	2020-06-20 23:48:57 +00:00
Michael Tuexen	ed82c2edd6	Use a struct sockaddr_in pr struct sockaddr_in6 as the option value for the IPPROTO_SCTP level socket options SCTP_BINDX_ADD_ADDR and SCTP_BINDX_REM_ADDR. These socket option are intended for internal use only to implement sctp_bindx(). This is one user of struct sctp_getaddresses less. struct sctp_getaddresses is strange and will be changed shortly.	2020-06-20 21:06:02 +00:00
Michael Tuexen	7621bd5ead	Cleanup the adding and deleting of addresses via sctp_bindx(). There is no need to use the association identifier, so remove it. While there, cleanup the code a bit. MFC after: 1 week	2020-06-20 20:20:16 +00:00
Michael Tuexen	7a9dbc33f9	Remove last argument of sctp_addr_mgmt_ep_sa(), since it is not used. MFC after: 1 week	2020-06-19 12:35:29 +00:00
Mark Johnston	95033af923	Add the SCTP_SUPPORT kernel option. This is in preparation for enabling a loadable SCTP stack. Analogous to IPSEC/IPSEC_SUPPORT, the SCTP_SUPPORT kernel option must be configured in order to support a loadable SCTP implementation. Discussed with: tuexen MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2020-06-18 19:32:34 +00:00
Bjoern A. Zeeb	ce19cceb8d	When converting the static arrays to mallocarray() in r356621 I missed one place where we now need to multiply the size of the struct with the number of entries. This lead to problems when restarting user space daemons, as the cleanup was never properly done, resulting in MRT_ADD_VIF EADDRINUSE. Properly zero all array elements to avoid this problem. PR: 246629, 206583 Reported by: (many) MFC after: 4 days Sponsored by: Rubicon Communications, LLC (d/b/a "Netgate")	2020-06-17 21:04:38 +00:00
Bjoern A. Zeeb	b7b3d237e7	The call into ifa_ifwithaddr() needs to be epoch protected; ortherwise we'll panic on an assertion. While here, leave a comment that the ifp was never protected and stable (as glebius pointed out) and this needs to be fixed properly. Discovered while working on: PR 246629 Reviewed by: glebius MFC after: 4 days Sponsored by: Rubicon Communications, LLC (d/b/a "Netgate")	2020-06-17 20:58:37 +00:00
Michael Tuexen	2d87bacde4	Allow the self reference to be NULL in case the timer was stopped. Submitted by: Timo Voelker MFC after: 1 week	2020-06-17 15:27:45 +00:00
Tom Jones	d88fe3d964	Add header definition for RFC4340, Datagram Congestion Control Protocol Add a header definition for DCCP as defined in RFC4340. This header definition is required to perform validation when receiving and forwarding DCCP packets. We do not currently support DCCP. Reviewed by: gallatin, bz Approved by: bz (co-mentor) MFC after: 1 week MFC with: 350749 Differential Revision: https://reviews.freebsd.org/D21179	2020-06-17 13:27:13 +00:00
Randall Stewart	95ef69c63c	iSo in doing final checks on OCA firmware with all the latest tweaks the dup-ack checking packet drill script was failing with a number of unexpected acks. So it turns out if you have the default recvwin set up to 1Meg (like OCA's do) and you have no window scaling (like the dupack checking code) then we have another case where we are always trying to update the rwnd and sending an ack when we should not. Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D25298	2020-06-16 18:16:45 +00:00
Randall Stewart	4d418f8da8	So it turns out rack has a shortcoming in dup-ack counting. It counts the dupacks but then does not properly respond to them. This is because a few missing bits are not present. BBR actually does properly respond (though it also sends a TLP which is interesting and maybe something to fix).. Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D25294	2020-06-16 12:26:23 +00:00
Michael Tuexen	b231bff8b2	Allocate the mbuf for the signature in the COOKIE or the correct size. While there, do also do some cleanups. MFC after: 1 week	2020-06-14 16:05:08 +00:00
Michael Tuexen	4471043177	Cleanups, no functional change. MFC after: 1 week	2020-06-14 09:50:00 +00:00
Michael Tuexen	d60bdf8569	Remove usage of empty macro. MFC after: 1 week	2020-06-13 21:23:26 +00:00
Michael Tuexen	64c8fc5de8	Simpify a condition, no functional change. MFC after: 1 week	2020-06-13 18:38:59 +00:00
Randall Stewart	f092a3c71c	So it turns out with the right window scaling you can get the code in all stacks to always want to do a window update, even when no data can be sent. Now in cases where you are not pacing thats probably ok, you just send an extra window update or two. However with bbr (and rack if its paced) every time the pacer goes off its going to send a "window update". Also in testing bbr I have found that if we are not responding to data right away we end up staying in startup but incorrectly holding a pacing gain of 192 (a loss). This is because the idle window code does not restict itself to only work with PROBE_BW. In all other states you dont want it doing a PROBE_BW state change. Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D25247	2020-06-12 19:56:19 +00:00
Michael Tuexen	3ee11586b2	Whitespace change due to upstream cleanup. MFC after: 1 week	2020-06-12 16:40:10 +00:00
Michael Tuexen	2f9e6db0be	More cleanups due to ifdef cleanup done upstream MFC after: 1 week	2020-06-12 16:31:13 +00:00
Michael Tuexen	306c2ba375	Small cleanup due to upstream ifdef cleanups. MFC after: 1 week	2020-06-12 10:13:23 +00:00
Michael Tuexen	28397ac1ed	Non-functional changes due to upstream cleanup. MFC after: 1 week	2020-06-11 13:34:09 +00:00
Richard Scheffenegger	2fda0a6f3a	Prevent TCP Cubic to abruptly increase cwnd after app-limited Cubic calculates the new cwnd based on absolute time elapsed since the start of an epoch. A cubic epoch is started on congestion events, or once the congestion avoidance phase is started, after slow-start has completed. When a sender is application limited for an extended amount of time and subsequently a larger volume of data becomes ready for sending, Cubic recalculates cwnd with a lingering cubic epoch. This recalculation of the cwnd can induce a massive increase in cwnd, causing a burst of data to be sent at line rate by the sender. This adds a flag to reset the cubic epoch once a session transitions from app-limited to cwnd-limited to prevent the above effect. Reviewed by: chengc_netapp.com, tuexen (mentor) Approved by: tuexen (mentor), rgrimes (mentor) MFC after: 3 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D25065	2020-06-10 07:32:02 +00:00
Richard Scheffenegger	6907bbae18	Prevent TCP Cubic to abruptly increase cwnd after slow-start Introducing flags to track the initial Wmax dragging and exit from slow-start in TCP Cubic. This prevents sudden jumps in the caluclated cwnd by cubic, especially when the flow is application limited during slow start (cwnd can not grow as fast as expected). The downside is that cubic may remain slightly longer in the concave region before starting the convex region beyond Wmax again. Reviewed by: chengc_netapp.com, tuexen (mentor) Approved by: tuexen (mentor), rgrimes (mentor, blanket) MFC after: 3 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D23655	2020-06-09 21:07:58 +00:00
Michael Tuexen	5fb132abbb	Whitespace cleanups and removal of a stale comment. MFC after: 1 week	2020-06-08 20:23:20 +00:00
Randall Stewart	e854dd38ac	An important statistic in determining if a server process (or client) is being delayed is to know the time to first byte in and time to first byte out. Currently we have no way to know these all we have is t_starttime. That (t_starttime) tells us what time the 3 way handshake completed. We don't know when the first request came in or how quickly we responded. Nor from a client perspective do we know how long from when we sent out the first byte before the server responded. This small change adds the ability to track the TTFB's. This will show up in BB logging which then can be pulled for later analysis. Note that currently the tracking is via the ticks variable of all three variables. This provides a very rough estimate (hz=1000 its 1ms). A follow-on set of work will be to change all three of these values into something with a much finer resolution (either microseconds or nanoseconds), though we may want to make the resolution configurable so that on lower powered machines we could still use the much cheaper ticks variable. Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D24902	2020-06-08 11:48:07 +00:00
Michael Tuexen	70486b27ae	Retire SCTP_SO_LOCK_TESTING. This was intended to test the locking used in the MacOS X kernel on a FreeBSD system, to make use of WITNESS and other debugging infrastructure. This hasn't been used for ages, to take it out to reduce the #ifdef complexity. MFC after: 1 week	2020-06-07 14:39:20 +00:00
Michael Tuexen	3f53d62236	Fix typo in comment. Submitted by Orgad Shaneh for the userland stack. MFC after: 1 week	2020-06-06 21:26:34 +00:00
Michael Tuexen	2cf3347109	Non-functional changes due to cleanup (upstream removing of Panda support) of the code MFC after: 1 week	2020-06-06 18:20:09 +00:00
Randall Stewart	2cf21ae559	We should never allow either the broadcast or IN_ADDR_ANY to be connected to or sent to. This was fond when working with Michael Tuexen and Skyzaller. Skyzaller seems to want to use either of these two addresses to connect to at times. And it really is an error to do so, so lets not allow that behavior. Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D24852	2020-06-03 14:16:40 +00:00
Randall Stewart	f1ea4e4120	This fixes a couple of skyzaller crashes. Most of them have to do with TFO. Even the default stack had one of the issues: 1) We need to make sure for rack that we don't advance snd_nxt beyond iss when we are not doing fast open. We otherwise can get a bunch of SYN's sent out incorrectly with the seq number advancing. 2) When we complete the 3-way handshake we should not ever append to reassembly if the tlen is 0, if TFO is enabled prior to this fix we could still call the reasemmbly. Note this effects all three stacks. 3) Rack like its cousin BBR should track if a SYN is on a send map entry. 4) Both bbr and rack need to only consider len incremented on a SYN if the starting seq is iss, otherwise we don't increment len which may mean we return without adding a sendmap entry. This work was done in collaberation with Michael Tuexen, thanks for all the testing! Sponsored by: Netflix Inc Differential Revision: https://reviews.freebsd.org/D25000	2020-06-03 14:07:31 +00:00
Michael Tuexen	d442a65733	Restrict enabling TCP-FASTOPEN to end-points in CLOSED or LISTEN state Enabling TCP-FASTOPEN on an end-point which is in a state other than CLOSED or LISTEN, is a bug in the application. So it should not work. Also the TCP code does not (and needs not to) handle this. While there, also simplify the setting of the TF_FASTOPEN flag. This issue was found by running syzkaller. Reviewed by: rrs MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D25115	2020-06-03 13:51:53 +00:00
Alexander V. Chernikov	da187ddb3d	* Add rib_<add\|del\|change>_route() functions to manipulate the routing table. The main driver for the change is the need to improve notification mechanism. Currently callers guess the operation data based on the rtentry structure returned in case of successful operation result. There are two problems with this appoach. First is that it doesn't provide enough information for the upcoming multipath changes, where rtentry refers to a new nexthop group, and there is no way of guessing which paths were added during the change. Second is that some rtentry fields can change during notification and protecting from it by requiring customers to unlock rtentry is not desired. Additionally, as the consumers such as rtsock do know which operation they request in advance, making explicit add/change/del versions of the functions makes sense, especially given the functions don't share a lot of code. With that in mind, introduce rib_cmd_info notification structure and rib_<add\|del\|change>_route() functions, with mandatory rib_cmd_info pointer. It will be used in upcoming generalized notifications. * Move definitions of the new functions and some other functions/structures used for the routing table manipulation to a separate header file, net/route/route_ctl.h. net/route.h is a frequently used file included in ~140 places in kernel, and 90% of the users don't need these definitions. Reviewed by: ae Differential Revision: https://reviews.freebsd.org/D25067	2020-06-01 20:49:42 +00:00
Alexander V. Chernikov	e7403d0230	Revert r361704, it accidentally committed merged D25067 and D25070.	2020-06-01 20:40:40 +00:00
Alexander V. Chernikov	79674562b8	* Add rib_<add\|del\|change>_route() functions to manipulate the routing table. The main driver for the change is the need to improve notification mechanism. Currently callers guess the operation data based on the rtentry structure returned in case of successful operation result. There are two problems with this appoach. First is that it doesn't provide enough information for the upcoming multipath changes, where rtentry refers to a new nexthop group, and there is no way of guessing which paths were added during the change. Second is that some rtentry fields can change during notification and protecting from it by requiring customers to unlock rtentry is not desired. Additionally, as the consumers such as rtsock do know which operation they request in advance, making explicit add/change/del versions of the functions makes sense, especially given the functions don't share a lot of code. With that in mind, introduce rib_cmd_info notification structure and rib_<add\|del\|change>_route() functions, with mandatory rib_cmd_info pointer. It will be used in upcoming generalized notifications. * Move definitions of the new functions and some other functions/structures used for the routing table manipulation to a separate header file, net/route/route_ctl.h. net/route.h is a frequently used file included in ~140 places in kernel, and 90% of the users don't need these definitions. Reviewed by: ae Differential Revision: https://reviews.freebsd.org/D25067	2020-06-01 20:32:02 +00:00
Alexander V. Chernikov	a37a5246ca	Use fib[46]_lookup() in mtu calculations. fib[46]_lookup_nh_ represents pre-epoch generation of fib api, providing less guarantees over pointer validness and requiring on-stack data copying. Conversion is straight-forwarded, as the only 2 differences are requirement of running in network epoch and the need to handle RTF_GATEWAY case in the caller code. Differential Revision: https://reviews.freebsd.org/D24974	2020-05-28 08:00:08 +00:00
Alexander V. Chernikov	3553b3007f	Switch ip_output/icmp_reflect rt lookup calls with fib4_lookup. fib4_lookup_nh_ represents pre-epoch generation of fib api, providing less guarantees over pointer validness and requiring on-stack data copying. Conversion is straight-forwarded, as the only 2 differences are requirement of running in network epoch and the need to handle RTF_GATEWAY case in the caller code. Reviewed by: ae Differential Revision: https://reviews.freebsd.org/D24976	2020-05-28 07:31:53 +00:00
Alexander V. Chernikov	7bfc98af12	Switch gif(4) path verification to fib[46]_check_urfp(). fibX_lookup_nh_ represents pre-epoch generation of fib api, providing less guarantees over pointer validness and requiring on-stack data copying. Use specialized fib[46]_check_urpf() from newer KPI instead, to allow removal of older KPI. Reviewed by: ae Differential Revision: https://reviews.freebsd.org/D24978	2020-05-28 07:26:18 +00:00
Emmanuel Vadot	77c68315f6	bbr: Use arc4random_uniform from libkern. This unbreak LINT build Reported by: jenkins, melifaro	2020-05-23 19:52:20 +00:00
Alexander V. Chernikov	4d2c2509f2	Move <add\|del\|change>_route() functions to route_ctl.c in preparation of multipath control plane changed described in D24141. Currently route.c contains core routing init/teardown functions, route table manipulation functions and various helper functions, resulting in >2KLOC file in total. This change moves most of the route table manipulation parts to a dedicated file, simplifying planned multipath changes and making route.c more manageable. Differential Revision: https://reviews.freebsd.org/D24870	2020-05-23 19:06:57 +00:00
Richard Scheffenegger	e68cde59c3	DCTCP: update alpha only once after loss recovery. In mixed ECN marking and loss scenarios it was found, that the alpha value of DCTCP is updated two times. The second update happens with freshly initialized counters indicating to ECN loss. Overall this leads to alpha not adjusting as quickly as expected to ECN markings, and therefore lead to excessive loss. Reported by: Cheng Cui Reviewed by: chengc_netapp.com, rrs, tuexen (mentor) Approved by: tuexen (mentor) MFC after: 2 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D24817	2020-05-21 21:42:49 +00:00
Richard Scheffenegger	af2fb894c9	With RFC3168 ECN, CWR SHOULD only be sent with new data Overly conservative data receivers may ignore the CWR flag on other packets, and keep ECE latched. This can result in continous reduction of the congestion window, and very poor performance when ECN is enabled. Reviewed by: rgrimes (mentor), rrs Approved by: rgrimes (mentor), tuexen (mentor) MFC after: 3 days Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D23364	2020-05-21 21:33:15 +00:00
Richard Scheffenegger	8e0511652b	Retain only mutually supported TCP options after simultaneous SYN When receiving a parallel SYN in SYN-SENT state, remove all the options only we supported locally before sending the SYN,ACK. This addresses a consistency issue on parallel opens. Also, on such a parallel open, the stack could be coaxed into running with timestamps enabled, even if administratively disabled. Reviewed by: tuexen (mentor) Approved by: tuexen (mentor) MFC after: 2 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D23371	2020-05-21 21:26:21 +00:00
Richard Scheffenegger	6e16d87751	Handle ECN handshake in simultaneous open While testing simultaneous open TCP with ECN, found that negotiation fails to arrive at the expected final state. Reviewed by: tuexen (mentor) Approved by: tuexen (mentor), rgrimes (mentor) MFC after: 2 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D23373	2020-05-21 21:15:25 +00:00
Mark Johnston	591b09b486	Define a module version for accept filter modules. Otherwise accept filters compiled into the kernel do not preempt preloaded accept filter modules. Then, the preloaded file registers its accept filter module before the kernel, and the kernel's attempt fails since duplicate accept filter list entries are not permitted. This causes the preloaded file's module to be released, since module_register_init() does a lookup by name, so the preloaded file is unloaded, and the accept filter's callback points to random memory since preload_delete_name() unmaps the file on x86 as of r336505. Add a new ACCEPT_FILTER_DEFINE macro which wraps the accept filter and module definitions, and ensures that a module version is defined. PR: 245870 Reported by: Thomas von Dein <freebsd@daemon.de> MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2020-05-19 18:35:08 +00:00
Michael Tuexen	999f86d67d	Replace snprintf() by SCTP_SNPRINTF() and let SCTP_SNPRINTF() map to snprintf() on FreeBSD. This allows to check for failures of snprintf() on platforms other than FreeBSD kernel.	2020-05-19 07:23:35 +00:00
Michael Tuexen	821bae7cf3	Revert r361209: cem noted that on FreeBSD snprintf() can not fail and code should not check for that. A followup commit will replace the usage of snprintf() in the SCTP sources with a variadic macro SCTP_SNPRINTF, which will simply map to snprintf() on FreeBSD and do a checking similar to r361209 on other platforms.	2020-05-19 07:21:11 +00:00
Mike Karels	2fdbcbea76	Fix NULL-pointer bug from r361228. Note that in_pcb_lport and in_pcb_lport_dest can be called with a NULL local address for IPv6 sockets; handle it. Found by syzkaller. Reported by: cem MFC after: 1 month	2020-05-19 01:05:13 +00:00
Mike Karels	2510235150	Allow TCP to reuse local port with different destinations Previously, tcp_connect() would bind a local port before connecting, forcing the local port to be unique across all outgoing TCP connections for the address family. Instead, choose a local port after selecting the destination and the local address, requiring only that the tuple is unique and does not match a wildcard binding. Reviewed by: tuexen (rscheff, rrs previous version) MFC after: 1 month Sponsored by: Forcepoint LLC Differential Revision: https://reviews.freebsd.org/D24781	2020-05-18 22:53:12 +00:00
Michael Tuexen	6863ab0b8b	Remove assignment without effect. MFC after: 3 days	2020-05-18 19:48:38 +00:00
Michael Tuexen	9e8c9c9ef6	Don't check an unsigned variable for being negative. MFC after: 3 days.	2020-05-18 19:35:46 +00:00
Michael Tuexen	04ce5df9c9	Remove redundant assignment. MFC after: 3 days	2020-05-18 19:23:01 +00:00
Michael Tuexen	bca1802890	Cleanup, no functional change intended. MFC after: 3 days	2020-05-18 18:42:43 +00:00
Michael Tuexen	6395219ae3	Avoid an integer underflow. MFC after: 3 days	2020-05-18 18:32:58 +00:00
Michael Tuexen	60017c8e88	Remove redundant check. MFC after: 3 days	2020-05-18 18:27:10 +00:00
Michael Tuexen	88116b7e4d	Fix logical condition by looking at usecs. This issue was found by cpp-check running on the userland stack. MFC after: 3 days	2020-05-18 15:02:15 +00:00
Michael Tuexen	00023d8a87	Whitespace change. MFC after: 3 days	2020-05-18 15:00:18 +00:00
Michael Tuexen	e708e2a4f4	Handle failures of snprintf(). MFC after: 3 days	2020-05-18 10:07:01 +00:00
Michael Tuexen	da8c34c382	Non-functional changes, cleanups. MFC after: 3 days	2020-05-17 22:31:38 +00:00
Alexander V. Chernikov	174fb9dbb1	Remove redundant checks for nhop validity. Currently NH_IS_VALID() simly aliases to RT_LINK_IS_UP(), so we're checking the same thing twice. In the near future the implementation of this check will be simpler, as there are plans to introduce control-plane interface status monitoring similar to ipfw interface tracker.	2020-05-17 15:32:36 +00:00
Michael Tuexen	daf143413a	Ensure that an stcb is not dereferenced when it is about to be freed. This issue was found by SYZKALLER. MFC after: 3 days	2020-05-16 19:26:39 +00:00
Ed Maste	65a1d63665	libalias: retire cuseeme support The CU-SeeMe videoconferencing client and associated protocol is at this point a historical artifact; there is no need to retain support for this protocol today. Reviewed by: philip, markj, allanjude Relnotes: Yes Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D24790	2020-05-16 02:29:10 +00:00
Michael Tuexen	e240ce42bf	Allow only IPv4 addresses in sendto() for TCP on AF_INET sockets. This problem was found by looking at syzkaller reproducers for some other problems. Reviewed by: rrs Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D24831	2020-05-15 14:06:37 +00:00
Randall Stewart	777b88d60f	This fixes several skyzaller issues found with the help of Michael Tuexen. There was some accounting errors with TCPFO for bbr and also for both rack and bbr there was a FO case where we should be jumping to the just_return_nolock label to exit instead of returning 0. This of course caused no timer to be running and thus the stuck sessions. Reported by: Michael Tuexen and Skyzaller Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D24852	2020-05-15 14:00:12 +00:00
Ed Maste	46701f31be	libalias: fix potential memory disclosure from ftp module admbugs: 956 Submitted by: markj Reported by: Vishnu Dev TJ working with Trend Micro Zero Day Initiative Security: FreeBSD-SA-20:13.libalias Security: CVE-2020-7455 Security: ZDI-CAN-10849	2020-05-12 16:38:28 +00:00
Ed Maste	6461c83e09	libalias: validate packet lengths before accessing headers admbugs: 956 Submitted by: ae Reported by: Lucas Leong (@_wmliang_) of Trend Micro Zero Day Initiative Reported by: Vishnu working with Trend Micro Zero Day Initiative Security: FreeBSD-SA-20:12.libalias	2020-05-12 16:33:04 +00:00
Michael Tuexen	86fd36c502	Fix a copy and paste error introduced in r360878. Reported-by: syzbot+a0863e972771f2f0d4b3@syzkaller.appspotmail.com Reported-by: syzbot+4481757e967ba83c445a@syzkaller.appspotmail.com MFC after: 3 days	2020-05-11 22:47:20 +00:00
Alexander V. Chernikov	1d1a743e9f	Fix NOINET[6] build by using af-independent route lookup function. Reported by: rpokala	2020-05-11 20:41:03 +00:00

... 2 3 4 5 6 ...

6932 Commits