freebsd-dev

Author	SHA1	Message	Date
Wojciech Macek	9ce46cbc95	ip_mroute: move ip_mrouter_done outside lock X_ip_mrouter_done might sleep, which triggers INVARIANTS to print additional errors on the screen. Move it outside the lock, but provide some basic synchronization to avoid race condition during module uninit/unload. Obtained from: Semihalf Sponsored by: Stormshield	2022-01-21 06:17:19 +01:00
Wojciech Macek	58630bdd13	Revert "ip_mroute: do not call epoch_waitwhen lock is taken" This reverts commit `2e72208b6c`.	2022-01-21 06:17:19 +01:00
Randall Stewart	aac52f94ea	tcp: Warning cleanup from new compiler. The clang compiler recently got an update that generates warnings of unused variables where they were set, and then never used. This revision goes through the tcp stack and cleans all of those up. Reviewed by: Michael Tuexen, Gleb Smirnoff Sponsored by: Netflix Inc. Differential Revision:	2022-01-18 07:41:18 -05:00
Marko Zec	e7abe200c2	fib_algo: shift / mask by constants in dxr_lookup() Since trie configuration remains invariant during each DXR instance lifetime, instead of shifting and masking lookup keys by values computed at runtime, compile upfront several dxr_lookup() configurations with hardcoded shift / mask constants, and choose the apropriate lookup function version after each DXR instance rebuild. In synthetic tests this yields small but measurable (5-10%) lookup throughput improvement, depending on FIB size and prefix patterns. MFC after: 3 days	2022-01-17 00:13:47 +01:00
Gleb Smirnoff	1d41a49404	tcp_usr_connect: report actual error code when stack requests drop	2022-01-13 10:32:41 -08:00
Ryan Stone	3284f4925f	LRO: Don't merge ACK and non-ACK packets together LRO was willing to merge ACK and non-ACK packets together. This can cause incorrect th_ack values to be reported up the stack. While non-ACKs are quite unlikely to appear in practice, LRO's behaviour is against the spec. Make LRO unwilling to merge packets with different TH_ACK flag values in order to fix the issue. Found by: Sysunit test Differential Revision: https://reviews.freebsd.org/D33775 Reviewed by: rrs	2022-01-13 11:17:58 -05:00
Ryan Stone	24fe6643da	LRO: Fix lost packets when merging 1 payload with an ACK To check if it needed to regenerate a packet's header before sending it up the stack, LRO was checking if more than one payload had been merged into the packet. This failed in the case where a single payload was merged with one or more pure ACKs. This results in lost ACKs. Fix this by precisely tracking whether header regeneration is required instead of using an incorrect heuristic. Found with: Sysunit test Differential Revision: https://reviews.freebsd.org/D33774 Reviewed by: rrs	2022-01-13 11:17:48 -05:00
Wojciech Macek	776c34f646	ip_mroute: remove unused variables Sponsored by: Stormshield Obtained from: Semihalf	2022-01-11 13:06:22 +01:00
Wojciech Macek	2e72208b6c	ip_mroute: do not call epoch_waitwhen lock is taken mrouter_done is called with RAW IP lock taken. Some annoying printfs are visible on the console if INVARIANTS option is enabled. Provide atomic-based mechanism which counts enters and exits from/to critical section in ip_input and ip_output. Before de-initialization of function pointers ensure (with busy-wait) that mrouter de-initialization is visible to all readers and that we don't remove pointers (like ip_mforward etc.) in the middle of packet processing.	2022-01-11 11:19:32 +01:00
Wojciech Macek	68f28dd1cc	ip_mroute: do not sleep when lock is taken Kthread initialization calls uma_alloc which can sleep. Modify the code to use deferred work instead.	2022-01-11 11:19:32 +01:00
Robert Wing	eb18708ec8	syncache: accept packet with no SA when TCP_MD5SIG is set When TCP_MD5SIG is set on a socket, all packets are dropped that don't contain an MD5 signature. Relax this behavior to accept a non-signed packet when a security association doesn't exist with the peer. This is useful when a listen socket set with TCP_MD5SIG wants to handle connections protected with and without MD5 signatures. Reviewed by: bz (previous version) Sponsored by: nepustil.net Sponsored by: Klara Inc. Differential Revision: https://reviews.freebsd.org/D33227	2022-01-08 16:32:14 -09:00
Michael Tuexen	f87818eacf	sctp: miror change due to upstreaming	2022-01-03 23:03:06 +01:00
Gleb Smirnoff	afad340a14	inpcb: garbage collect INP_LOCK_INIT(), used only once in sctp Reviewed by: tuexen Differential revision: https://reviews.freebsd.org/D33543	2022-01-03 10:20:30 -08:00
Gleb Smirnoff	fec8a8c7cb	inpcb: use global UMA zones for protocols Provide structure inpcbstorage, that holds zones and lock names for a protocol. Initialize it with global protocol init using macro INPCBSTORAGE_DEFINE(). Then, at VNET protocol init supply it as the main argument to the in_pcbinfo_init(). Each VNET pcbinfo uses its private hash, but they all use same zone to allocate and SMR section to synchronize. Note: there is kern.ipc.maxsockets sysctl, which controls UMA limit on the socket zone, which was always global. Historically same maxsockets value is applied also to every PCB zone. Important fact: you can't create a pcb without a socket! A pcb may outlive its socket, however. Given that there are multiple protocols, and only one socket zone, the per pcb zone limits seem to have little value. Under very special conditions it may trigger a little bit earlier than socket zone limit, but in most setups the socket zone limit will be triggered earlier. When VIMAGE was added to the kernel PCB zones became per-VNET. This magnified existing disbalance further: now we have multiple pcb zones in multiple vnets limited to maxsockets, but every pcb requires a socket allocated from the global zone also limited by maxsockets. IMHO, this per pcb zone limit doesn't bring any value, so this patch drops it. If anybody explains value of this limit, it can be restored very easy - just 2 lines change to in_pcbstorage_init(). Differential revision: https://reviews.freebsd.org/D33542	2022-01-03 10:17:46 -08:00
Gleb Smirnoff	644ca0846d	domains: make domain_init() initialize only global state Now that each module handles its global and VNET initialization itself, there is no VNET related stuff left to do in domain_init(). Differential revision: https://reviews.freebsd.org/D33541	2022-01-03 10:15:22 -08:00
Gleb Smirnoff	89128ff3e4	protocols: init with standard SYSINIT(9) or VNET_SYSINIT The historical BSD network stack loop that rolls over domains and over protocols has no advantages over more modern SYSINIT(9). While doing the sweep, split global and per-VNET initializers. Getting rid of pr_init allows to achieve several things: o Get rid of ifdef's that protect against double foo_init() when both INET and INET6 are compiled in. o Isolate initializers statically to the module they init. o Makes code easier to understand and maintain. Reviewed by: melifaro Differential revision: https://reviews.freebsd.org/D33537	2022-01-03 10:15:21 -08:00
Kristof Provost	80871aeb0f	udp_var.h: other headers already include types.h Pointed out by: imp Sponsored by: Rubicon Communications, LLC ("Netgate")	2022-01-03 18:35:02 +01:00
Kristof Provost	aa70361d86	headers: make a few more headers self-contained Sponsored by: Rubicon Communications, LLC ("Netgate")	2022-01-03 10:12:30 +01:00
Gordon Bergling	1b90dfa5d2	tcp_bbr(4): Fix a few typos in sysctl descriptions - s/measurment/measurement/ MFC after: 3 days	2022-01-02 18:03:10 +01:00
Michael Tuexen	502d5e8500	sctp: improve counting of incoming chunks MFC after: 3 days	2022-01-01 20:59:47 +01:00
Michael Tuexen	4760956e9a	udp: use appropriate pcbinfo when signalling EHOSTDOWN MFC after: 3 days Sponsored by: Netflix, Inc.	2022-01-01 19:17:17 +01:00
Michael Tuexen	430df2abee	in_pcb: improve inp_next() If there is no inp to check, exit the loop iterating through them. Reported by: syzbot+403406a9cbf082b36ea4@syzkaller.appspotmail.com Reviewed by: glebius Sponsored by: Netflix, Inc.	2022-01-01 19:04:10 +01:00
Michael Tuexen	1adb91e521	sctp: retire sctp_mtu_size_reset() Thanks to Timo Voelker for making me aware that sctp_mtu_size_reset() is very similar to sctp_pathmtu_adjustment(). MFC after: 3 days	2021-12-30 15:30:11 +01:00
Michael Tuexen	2de2ae331b	sctp: improve sctp_pathmtu_adjustment() Allow the resending of DATA chunks to be controlled by the caller, which allows retiring sctp_mtu_size_reset() in a separate commit. Also improve the computaion of the overhead and use 32-bit integers consistently. Thanks to Timo Voelker for pointing me to the code. MFC after: 3 days	2021-12-30 15:16:05 +01:00
Alexander V. Chernikov	ff3a85d324	[lltable] Add per-family lltable getters. Introduce a new function, lltable_get(), to retrieve lltable pointer for the specified interface and family. Use it to avoid all-iftable list traversal when adding or deleting ARP/ND records. Differential Revision: https://reviews.freebsd.org/D33660 MFC after: 2 weeks	2021-12-29 20:57:15 +00:00
Gleb Smirnoff	4287aa5619	tcp_usr_shutdown: don't cast inp_ppcb to tcpcb before checking inp_flags While here move out one more erroneous condition out of the epoch and common return. The only functional change is that if we send control on a shut down socket we would get EINVAL instead of ECONNRESET. Reviewed by: tuexen Reported by: syzbot+8388cf7f401a7b6bece6@syzkaller.appspotmail.com Fixes: `f64dc2ab5b`	2021-12-28 08:50:02 -08:00
Michael Tuexen	a7ba00a438	sctp: minor improvements in sctp_get_frag_point MFC after: 3 days	2021-12-28 10:23:31 +01:00
Michael Tuexen	ca0dd19f09	sctp: check that the computed frag point is a multiple of 4 Reported by: syzbot+5da189fc1fe80b31f5bd@syzkaller.appspotmail.com MFC after: 3 days	2021-12-28 09:40:52 +01:00
Gleb Smirnoff	0af4ce4547	tcp_usr_shutdown: don't cast inp_ppcb to tcpcb before checking inp_flags Fixes: `f64dc2ab5b`	2021-12-27 16:58:09 -08:00
Michael Tuexen	989453da05	sctp: cleanup the SCTP_MAXSEG socket option. This patch makes the handling of the SCTP_MAXSEG socket option compliant with RFC 6458 (SCTP socket API) and fixes an issue found by syzkaller. Reported by: syzbot+a2791b89ab99121e3333@syzkaller.appspotmail.com MFC after: 3 days	2021-12-27 23:40:31 +01:00
Gleb Smirnoff	37a7f55737	tcp_usr_rcvd: don't cast inp_ppcb to tcpcb before checking inp_flags Fixes: `f64dc2ab5b`	2021-12-27 10:41:51 -08:00
Michael Tuexen	34ae6a1a44	sctp: cleanup, on functional change intended. MFC after: 3 days	2021-12-27 18:28:44 +01:00
Michael Tuexen	a859e9f9aa	sctp: apply limit for socket buffers as indicated in comment MFC after: 3 days	2021-12-27 18:15:29 +01:00
Gleb Smirnoff	a057769205	in_pcb: use jenkins hash over the entire IPv6 (or IPv4) address The intent is to provide more entropy than can be provided by just the 32-bits of the IPv6 address which overlaps with 6to4 tunnels. This is needed to mitigate potential algorithmic complexity attacks from attackers who can control large numbers of IPv6 addresses. Together with: gallatin Reviewed by: dwmalone, rscheff Differential revision: https://reviews.freebsd.org/D33254	2021-12-26 10:47:28 -08:00
Gleb Smirnoff	eb8dcdeac2	jail: network epoch protection for IP address lists Now struct prison has two pointers (IPv4 and IPv6) of struct prison_ip type. Each points into epoch context, address count and variable size array of addresses. These structures are freed with network epoch deferred free and are not edited in place, instead a new structure is allocated and set. While here, the change also generalizes a lot (but not enough) of IPv4 and IPv6 processing. E.g. address family agnostic helpers for kern_jail_set() are provided, that reduce v4-v6 copy-paste. The fast-path prison_check_ip[46]_locked() is also generalized into prison_ip_check() that can be executed with network epoch protection only. Reviewed by: jamie Differential revision: https://reviews.freebsd.org/D33339	2021-12-26 10:45:50 -08:00
Gleb Smirnoff	a370832bec	tcp: remove delayed drop KPI No longer needed after tcp_output() can ask caller to drop. Reviewed by: rrs, tuexen Differential revision: https://reviews.freebsd.org/D33371	2021-12-26 08:48:24 -08:00
Gleb Smirnoff	f64dc2ab5b	tcp: TCP output method can request tcp_drop The advanced TCP stacks (bbr, rack) may decide to drop a TCP connection when they do output on it. The default stack never does this, thus existing framework expects tcp_output() always to return locked and valid tcpcb. Provide KPI extension to satisfy demands of advanced stacks. If the output method returns negative error code, it means that caller must call tcp_drop(). In tcp_var() provide three inline methods to call tcp_output(): - tcp_output() is a drop-in replacement for the default stack, so that default stack can continue using it internally without modifications. For advanced stacks it would perform tcp_drop() and unlock and report that with negative error code. - tcp_output_unlock() handles the negative code and always converts it to positive and always unlocks. - tcp_output_nodrop() just calls the method and leaves the responsibility to drop on the caller. Sweep over the advanced stacks and use new KPI instead of using HPTS delayed drop queue for that. Reviewed by: rrs, tuexen Differential revision: https://reviews.freebsd.org/D33370	2021-12-26 08:48:19 -08:00
Gleb Smirnoff	dbbcc777de	rack: rack_do_compressed_ack_processing() can call tcp_drop() Reviewed by: rrs, tuexen Differential revision: https://reviews.freebsd.org/D33369	2021-12-26 08:48:15 -08:00
Gleb Smirnoff	66aeb0b53b	rack: drop connection synchronously, when we can For all functions that are leaves of tcp_input() call ctf_do_dropwithreset_conn() instead of ctf_do_dropwithreset(), cause we always got tp and we want it to be dropped. Reviewed by: rrs, tuexen Differential revision: https://reviews.freebsd.org/D33368	2021-12-26 08:48:10 -08:00
Gleb Smirnoff	17ac6b1c14	bbr: drop packet synchronously in ctf_do_dropwithreset_conn() This function is always called from tcp_do_segment() method, that can drop tcpcb and return unlocked. Reviewed by: rrs, tuexen Differential revision: https://reviews.freebsd.org/D33367	2021-12-26 08:48:06 -08:00
Gleb Smirnoff	40fa3e40b5	tcp: mechanically substitute call to tfb_tcp_output to new method. Made with sed(1) execution: sed -Ef sed -i "" $(grep --exclude tcp_var.h -lr tcp_output sys/) sed: s/tp->t_fb->tfb_tcp_output$tp$/tcp_output(tp)/ s/to tfb_tcp_output/to tcp_output()/ Reviewed by: rrs, tuexen Differential revision: https://reviews.freebsd.org/D33366	2021-12-26 08:47:59 -08:00
Gleb Smirnoff	5b08b46a6d	tcp: welcome back tcp_output() as the right way to run output on tcpcb. Reviewed by: rrs, tuexen Differential revision: https://reviews.freebsd.org/D33365	2021-12-26 08:47:42 -08:00
Bjoern A. Zeeb	f389439f50	IPv4: fix redirect sending conditions RFC792,1009,1122 state the original conditions for sending a redirect. RFC1812 further refine these. ip_forward() still sepcifies the checks originally implemented for these (we do slightly more/different than suggested as makes sense). The implementation added in `8ad114c082` to ip_tryforward() however is flawed and may send a "multi-hop" redirects (to a host not on the directly connected network). Do proper checks in ip_tryforward() to stop us from sending redirects in situations we may not. Keep as much logic out of ip_tryforward() and in ip_redir_alloc() and only do the mbuf copy once we are sure we will send a redirect. While here enhance and fix comments as to which conditions are handled for sending redirects in various places. Reported by: pi (on net@ 2021-12-04) MFC after: 3 days Sponsored by: Dr.-Ing. Nepustil & Co. GmbH Reviewed by: cy, others (earlier versions) Differential Revision: https://reviews.freebsd.org/D33274	2021-12-26 15:33:48 +00:00
Alexander V. Chernikov	c2c8e360d8	tcp: virtualise net.inet.tcp.msl sysctl. VNET teardown waits 2*MSL (60 seconds by default) before expiring tcp PCBs. These PCBs holds references to nexthops, which, in turn, reference ifnets. This chain results in VNET interfaces being destroyed and moved to default VNET only after 60 seconds. Allow tcp_msl to be set in jail by virtualising net.inet.tcp.msl sysctl, permitting more predictable VNET tests outcomes. MFC after: 1 week Reviewed by: glebius Differential Revision: https://reviews.freebsd.org/D33270	2021-12-26 14:56:04 +00:00
Robert Wing	08d157a832	Fix dtrace SDT probe tcp:::debug-input The tcp:::debug-input probe is passed an mbuf pointer, use the correct translator for ipinfo_t when defining tcp:::debug-input. Fixes: `82988b50a1` ("Add an mbuf to ipinfo_t translator to finish ...") Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D33066	2021-12-20 17:15:43 -09:00
Robert Wing	2a28b045ca	tcp_twrespond: send signed segment when connection is TCP-MD5 When a connection is established to use TCP-MD5, tcp_twrespond() doesn't respond with a signed segment. This results in the host performing the active close to remain in a TIME_WAIT state and the other host in the LAST_ACK state. Fix this by sending a signed segment when the connection is established to use TCP-MD5. Reviewed by: glebius Differential Revision: https://reviews.freebsd.org/D33490	2021-12-20 11:38:01 -09:00
Gleb Smirnoff	71d2d5adfe	tcptw: count how many times a tcptw was actually useful This will allow a sysadmin to lower net.inet.tcp.msl and see how long tcptw are actually useful.	2021-12-19 08:22:12 -08:00
Gleb Smirnoff	cb3772639f	tcptw: remove unused fields The structure goes away anyway, but it would be interesting to know how much memory we used to save with it. So for the record, structure size with this revision is 64 bytes.	2021-12-19 08:22:12 -08:00
Gleb Smirnoff	9a8cf950b2	carp: fix send error demotion recovery The problem is that carp(4) would clear the error counter on first successful send, and stop counting successes after that. Fix this logic and document it in human language. PR: 260499 Differential revision: https://reviews.freebsd.org/D33536	2021-12-18 17:19:26 -08:00
Gleb Smirnoff	75add59a8e	tcp: allocate statistics in the main tcp_init() No reason to have a separate SYSINIT.	2021-12-17 10:50:56 -08:00
Gleb Smirnoff	d8b45c8e14	inpcb: don't leak the port zone in in_pcbinfo_destroy()	2021-12-16 15:15:02 -08:00
Mark Johnston	014f98b119	udp: Fix a use-after-free in udp_multi_input() "ip" is a pointer into the input mbuf chain, so we shouldn't access it after the chain is freed. Fix style at the call site while here. Reported by: syzbot+7c8258509722af1b6145@syzkaller.appspotmail.com Reviewed by: tuexen, glebius Fixes: `de2d47842e` ("SMR protection for inpcbs") Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D33473	2021-12-16 09:17:05 -05:00
Randall Stewart	9b60296531	tcp: Rack in a rare case we can get stuck sending a very small amount. If a tlp sending new data fails, and then the peer starts talking to us again, we can be in a situation where the tlp_new_data count is set, we are not in recovery and we always send one packet every RTT. The failure has to occur when we send the TLP initially from the ip_output() which is rare. But if it occurs you are basically stuck. This fixes it so we use the new_data count and clear it so we know it will be cleared. If a failure occurs the tlp timer will regenerate a new amount anyway so it is un-needed to carry the value on. Reviewed by: Michael Tuexen Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D33325	2021-12-15 09:41:33 -05:00
Gleb Smirnoff	185e659c40	inpcb: use locked variant of prison_check_ip*() The pcb lookup always happens in the network epoch and in SMR section. We can't block on a mutex due to the latter. Right now this patch opens up a race. But soon that will be addressed by D33339. Reviewed by: markj, jamie Differential revision: https://reviews.freebsd.org/D33340 Fixes: `de2d47842e`	2021-12-14 09:38:52 -08:00
Gleb Smirnoff	d74b7baeb0	ifnet_byindex() actually requires network epoch Sweep over potentially unsafe calls to ifnet_byindex() and wrap them in epoch. Most of the code touched remains unsafe, as the returned pointer is being used after epoch exit. Mark that with a comment. Validate the index argument inside the function, reducing argument validation requirement from the callers and making V_if_index private to if.c. Reviewed by: melifaro Differential revision: https://reviews.freebsd.org/D33263	2021-12-06 09:32:31 -08:00
Randall Stewart	dadbc04250	tcp: rack fails to send out a TLP after a MTU change When rack sends out a TLP it sets up various state to make sure it avoids the cwnd (its been more than 1 RTT since our last send) and it may at times send new data. If an MTU change as occurred and our cwnd has collapsed we can have a situation where must_retran flag is set and we obey the cwnd thus never sending the TLP and then sitting stuck. This one line fix addresses that problem Reviewed by: Michael Tuexen Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D33231	2021-12-06 09:56:09 -05:00
Gleb Smirnoff	eb93b99d69	in_pcb: delay crfree() down into UMA dtor inpcb lookups, which check inp_cred, work with pcbs that potentially went through in_pcbfree(). So inp_cred should stay valid until SMR guarantees its invisibility to lookups. While here, put the whole inpcb destruction sequence of in_pcbfree(), inpcb_dtor() and inpcb_fini() sequentially. Submitted by: markj Differential revision: https://reviews.freebsd.org/D33273	2021-12-05 10:46:37 -08:00
Michael Tuexen	54912d47b6	sctp: unbreak NOINET6 builds. PR: 260119 Reported by: kostikbel MFC after: 1 week	2021-12-04 19:16:18 +01:00
Michael Tuexen	d79676fb13	sctp: inherit IP level socket options from listening socket Ensure that TTL and TOS values set on a listener get inheritet to the accepted sockets. PR: 260119 MFC after: 1 week	2021-12-03 22:44:01 +01:00
Gleb Smirnoff	36f42c5ebf	tcp_ccalgounload(): initialize the inpcb iterator when curvnet is set Pointy hat to: glebius Fixes: `de2d47842e`	2021-12-03 12:39:56 -08:00
Peter Lei	4c018b5aed	in_pcb: limit the effect of wraparound in TCP random port allocation check The check to see if TCP port allocation should change from random to sequential port allocation mode may incorrectly cause a false positive due to negative wraparound. Example: V_ipport_tcpallocs = 2147483585 (0x7fffffc1) V_ipport_tcplastcount = 2147483553 (0x7fffffa1) V_ipport_randomcps = 100 The original code would compare (2147483585 <= -2147483643) and thus incorrectly move to sequential allocation mode. Compute the delta first before comparing against the desired limit to limit the wraparound effect (since tcplastcount is always a snapshot of a previous tcpallocs).	2021-12-03 12:38:12 -08:00
Michael Tuexen	f32357be53	sctp: use the correct traffic class when sending SCTP/IPv6 packets When sending packets the stcb was used to access the inp and then access the endpoint specific IPv6 level options. This fails when there exists an inp, but no stcb yet. This is the case for sending an INIT-ACK in response to an INIT when no association already exists. Fix this by just providing the inp instead of the stcb. PR: 260120 MFC after: 1 week	2021-12-03 21:36:44 +01:00
Peter Lei	13e3f3349f	in_pcb: fix TCP local ephemeral port accounting Fix logic error causing UDP(-Lite) local ephemeral port bindings to count against the TCP allocation counter, potentially causing TCP to go from random to sequential port allocation mode prematurely.	2021-12-03 12:30:21 -08:00
Gleb Smirnoff	12ae3476f3	tcp_drain(): initialize the inpcb iterator when curvnet is set Reported by: cy Pointy hat to: glebius Fixes: `de2d47842e`	2021-12-02 21:08:30 -08:00
Gleb Smirnoff	651a545143	udp_detach(): fix set but not used warning	2021-12-02 20:12:40 -08:00
Gleb Smirnoff	bd1d085045	udp_multi_input(): the UDP header is only needed for probes Reported by: kib Fixes: `de2d47842e`	2021-12-02 20:12:40 -08:00
Cy Schubert	db0ac6ded6	Revert "wpa: Import wpa_supplicant/hostapd commit 14ab4a816" This reverts commit `266f97b5e9`, reversing changes made to `a10253cffe`. A mismerge of a merge to catch up to main resulted in files being committed which should not have been.	2021-12-02 14:45:04 -08:00
Cy Schubert	266f97b5e9	wpa: Import wpa_supplicant/hostapd commit 14ab4a816 This is the November update to vendor/wpa committed upstream 2021-11-26. MFC after: 1 month	2021-12-02 13:35:14 -08:00
Gleb Smirnoff	3cce6164ab	ip_input: remove pointless check in INP_RECVIF handling An mbuf rcvif pointer is supposed to be valid and doesn't need extra checks. The code appeared in `d314ad7b73`.	2021-12-02 11:15:04 -08:00
Gleb Smirnoff	2e27230ff9	tcp_hpts: rewrite inpcb synchronization Just trust the pcb database, that if we did in_pcbref(), no way an inpcb can go away. And if we never put a dropped inpcb on our queue, and tcp_discardcb() always removes an inpcb to be dropped from the queue, then any inpcb on the queue is valid. Now, to solve LOR between inpcb lock and HPTS queue lock do the following trick. When we are about to process a certain time slot, take the full queue of the head list into on stack list, drop the HPTS lock and work on our queue. This of course opens a race when an inpcb is being removed from the on stack queue, which was already mentioned in comments. To address this race introduce generation count into queues. If we want to remove an inpcb with generation count mismatch, we can't do that, we can only mark it with desired new time slot or -1 for remove. Reviewed by: rrs Differential revision: https://reviews.freebsd.org/D33026	2021-12-02 10:48:49 -08:00
Gleb Smirnoff	f971e79139	tcp_hpts: rename input queue to drop queue and trim dead code The HPTS input queue is in reality used only for "delayed drops". When a TCP stack decides to drop a connection on the output path it can't do that due to locking protocol between main tcp_output() and stacks. So, rack/bbr utilize HPTS to drop the connection in a different context. In the past the queue could also process input packets in context of HPTS thread, but now no stack uses this, so remove this functionality. Reviewed by: rrs Differential revision: https://reviews.freebsd.org/D33025	2021-12-02 10:48:48 -08:00
Gleb Smirnoff	b0a7c008cb	tcp_hpts: make struct tcp_hpts_entry private to the module. Also, make some of the functions also private to the module. Remove unused functions discovered after that. Reviewed by: rrs Differential revision: https://reviews.freebsd.org/D33024	2021-12-02 10:48:48 -08:00
Gleb Smirnoff	50f081ecb7	tcp_hpts: provide tcp_in_hpts(). It will hide some internal HPTS knowledge from the consumers. Reviewed by: rrs Differential revision: https://reviews.freebsd.org/D33023	2021-12-02 10:48:48 -08:00
Gleb Smirnoff	de2d47842e	SMR protection for inpcbs With introduction of epoch(9) synchronization to network stack the inpcb database became protected by the network epoch together with static network data (interfaces, addresses, etc). However, inpcb aren't static in nature, they are created and destroyed all the time, which creates some traffic on the epoch(9) garbage collector. Fairly new feature of uma(9) - Safe Memory Reclamation allows to safely free memory in page-sized batches, with virtually zero overhead compared to uma_zfree(). However, unlike epoch(9), it puts stricter requirement on the access to the protected memory, needing the critical(9) section to access it. Details: - The database is already build on CK lists, thanks to epoch(9). - For write access nothing is changed. - For a lookup in the database SMR section is now required. Once the desired inpcb is found we need to transition from SMR section to r/w lock on the inpcb itself, with a check that inpcb isn't yet freed. This requires some compexity, since SMR section itself is a critical(9) section. The complexity is hidden from KPI users in inp_smr_lock(). - For a inpcb list traversal (a pcblist sysctl, or broadcast notification) also a new KPI is provided, that hides internals of the database - inp_next(struct inp_iterator *). Reviewed by: rrs Differential revision: https://reviews.freebsd.org/D33022	2021-12-02 10:48:48 -08:00
Gleb Smirnoff	565655f4e3	inpcb: reduce some aliased functions after removal of PCBGROUP. Reviewed by: rrs Differential revision: https://reviews.freebsd.org/D33021	2021-12-02 10:48:48 -08:00
Gleb Smirnoff	93c67567e0	Remove "options PCBGROUP" With upcoming changes to the inpcb synchronisation it is going to be broken. Even its current status after the move of PCB synchronization to the network epoch is very questionable. This experimental feature was sponsored by Juniper but ended never to be used in Juniper and doesn't exist in their source tree [sjg@, stevek@, jtl@]. In the past (AFAIK, pre-epoch times) it was tried out at Netflix [gallatin@, rrs@] with no positive result and at Yandex [ae@, melifaro@]. I'm up to resurrecting it back if there is any interest from anybody. Reviewed by: rrs Differential revision: https://reviews.freebsd.org/D33020	2021-12-02 10:48:48 -08:00
Gleb Smirnoff	1cec1c5831	Allow to compile RSS without PCBGROUP. Reviewed by: rrs Differential revision: https://reviews.freebsd.org/D33019	2021-12-02 10:48:48 -08:00
Randall Stewart	dcf2dfed26	tcp: unloading a module that is set to default should error. I just discovered that the return of the EBUSY error was incorrectly rigged so that you could unload a CC module that was set to default. Its supposed to be an EBUSY error. Make it so. Reviewed by: Michael Tuexen Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D33229	2021-12-02 06:12:16 -05:00
Michael Tuexen	13c196a41e	sctp: improve handling of assoc ids in socket options For socket options related to local and remote addresses providing generic association ids does not make sense. Report EINVAL in this case. MFC after: 1 week	2021-12-01 14:54:55 +01:00
Michael Tuexen	a01b8859cb	sctp: cleanup, no functional change intended. MFC after: 1 week	2021-12-01 09:19:40 +01:00
Gordon Bergling	1dadeab367	netinet: Fix a common typo in source code comments - s/segement/segment/ MFC after: 3 days	2021-11-30 10:37:20 +01:00
Gordon Bergling	27c4abc7cd	inet(3): Fix two typos in sysctl descriptions - s/sequental/sequential/ MFC after: 3 days	2021-11-30 10:21:47 +01:00
Gordon Bergling	b4aa9cb217	tcp(4): Fix a typo in a sysctl description - s/entires/entries/ MFC after: 3 days	2021-11-30 07:17:30 +01:00
Michael Tuexen	147bf5e930	tcp: Don't try to upgrade a read lock just for logging Reviewed by: glebius, lstewart, rrs Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D33098	2021-11-29 13:48:40 +01:00
Michael Tuexen	3c1ba6f394	sctp: improve consistency, no functional change intended	2021-11-26 12:53:43 +01:00
Michael Tuexen	0906362646	sctp: add some asserts, no functional changes intended This might help in narrowing down https://syzkaller.appspot.com/bug?id=fbd79abaec55f5aede63937182f4247006ea883b	2021-11-26 12:19:33 +01:00
Mark Johnston	44775b163b	netinet: Remove unneeded mb_unmapped_to_ext() calls in_cksum_skip() now handles unmapped mbufs on platforms where they're permitted. Reviewed by: glebius, jhb MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D33097	2021-11-24 13:31:16 -05:00
Mark Johnston	0d9c3423f5	netinet: Implement in_cksum_skip() using m_apply() This allows it to work with unmapped mbufs. In particular, in_cksum_skip() calls no longer need to be preceded by calls to mb_unmapped_to_ext() to avoid a page fault. PR: 259645 Reviewed by: gallatin, glebius, jhb MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D33096	2021-11-24 13:31:16 -05:00
Mark Johnston	ecbbe83144	netinet: Deduplicate most in_cksum() implementations in_cksum() and related routines are implemented separately for each platform, but only i386 and arm have optimized versions. Other platforms' copies of in_cksum.c are identical except for style differences and support for big-endian CPUs. Deduplicate the implementations for the rest of the platforms. This will make it easier to implement in_cksum() for unmapped mbufs. On arm and i386, define HAVE_MD_IN_CKSUM to mean that the MI implementation is not to be compiled. No functional change intended. Reviewed by: kp, glebius MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D33095	2021-11-24 13:31:16 -05:00
Mark Johnston	5195bcc212	netinet: Remove in_cksum.c It does not get compiled into the kernel. No functional change inteneded. Reviewed by: kp, glebius, cy MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D33094	2021-11-24 13:31:16 -05:00
Gordon Bergling	b4fbc855a5	cc_newreno(4): Fix a typo in a source code comment - s/conditons/conditions/ MFC after: 3 days	2021-11-19 19:16:02 +01:00
Gleb Smirnoff	ff94500855	Add tcp_freecb() - single place to free tcpcb. Until this change there were two places where we would free tcpcb - tcp_discardcb() in case if all timers are drained and tcp_timer_discard() otherwise. They were pretty much copy-n-paste, except that in the default case we would run tcp_hc_update(). Merge this into single function tcp_freecb() and move new short version of tcp_timer_discard() to tcp_timer.c and make it static. Reviewed by: rrs, hselasky Differential revision: https://reviews.freebsd.org/D32965	2021-11-18 20:27:45 -08:00
Gleb Smirnoff	fb8588d2cb	tcp_timewait: use on stack struct tcptw as last resort In case we failed to uma_zalloc() and also failed to reuse with tcp_tw_2msl_scan(), then just use on stack tcptw. This will allow to run through tcp_twrespond() and standard tcpcb discard routine. Reviewed by: rrs Differential revision: https://reviews.freebsd.org/D32965	2021-11-18 20:27:45 -08:00
Randall Stewart	97e28f0f58	tcp: Rack ack war with a mis-behaving firewall or nat with resets. Previously we added ack-war prevention for misbehaving firewalls. This is where the f/w or nat messes up its sequence numbers and causes an ack-war. There is yet another type of ack war that we have found in the wild that is like unto this. Basically the f/w or nat gets a ack (keep-alive probe or such) and instead of turning the ack/seq around and adding a TH_RST it does something real stupid and sends a new packet with seq=0. This of course triggers the challenge ack in the reset processing which then sends in a challenge ack (if the seq=0 is within the range of possible sequence numbers allowed by the challenge) and then we rinse-repeat. This will add the needed tweaks (similar to the last ack-war prevention using the same sysctls and counters) to prevent it and allow say 5 per second by default. Reviewed by: Michael Tuexen Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D32938	2021-11-17 09:45:51 -05:00
Mark Johnston	756bb50b6a	sctp: Remove now-unneeded mb_unmapped_to_ext() calls sctp_delayed_checksum() now handles unmapped mbufs, thanks to m_apply(). No functional change intended. Reviewed by: tuexen MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D32942	2021-11-16 13:38:09 -05:00
Mark Johnston	b4d758a0cc	sctp: Use m_apply() to calcuate a checksum for an mbuf chain m_apply() works on unmapped mbufs, so this will let us elide mb_unmapped_to_ext() calls preceding sctp_calculate_cksum() calls in the network stack. Modify sctp_calculate_cksum() to assume it's passed an mbuf header. This assumption appears to be true in practice, and we need to know the full length of the chain. No functional change intended. Reviewed by: tuexen, jhb MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D32941	2021-11-16 13:36:30 -05:00
Mike Karels	2f35e7d9fa	kernel: partially revert e9efb1125a15, default inet mask When no mask is supplied to the ioctl adding an Internet interface address, revert to using the historical class mask rather than a single default. Similarly for the NFS bootp code. MFC after: 3 weeks Reviewed by: melifaro glebius Differential Revision: https://reviews.freebsd.org/D32951	2021-11-14 14:12:25 -06:00
Michael Tuexen	2f62f92e37	tcp: Fix a locking issue related to logging tcp_respond() is sometimes called with only a read lock. The logging however, requires a write lock. So either try to upgrade the lock if needed, or don't log the packet. Reported by: syzbot+8151ef969c170f76706b@syzkaller.appspotmail.com Reported by: syzbot+eb679adb3304c511c1e4@syzkaller.appspotmail.com Reviewed by: markj, rrs Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D32983	2021-11-14 15:04:27 +01:00
Gleb Smirnoff	ef396441ce	tcp_usr_detach: revert debugging piece from `f5cf1e5f5a`. The code was probably useful during the problem being chased down, but for brevity makes sense just to return to the original KASSERT. Reviewed by: rrs Differential revision: https://reviews.freebsd.org/D32968	2021-11-13 08:33:32 -08:00
Gleb Smirnoff	9a06a82455	tcp_timers: check for (INP_TIMEWAIT \| INP_DROPPED) only once All timers keep inpcb locked through their execution. We need to check these flags only once. Checking for INP_TIMEWAIT earlier is is also safer, since such inpcbs point into tcptw rather than tcpcb, and any dereferences of inp_ppcb as tcpcb are erroneous. Reviewed by: rrs, hselasky Differential revision: https://reviews.freebsd.org/D32967	2021-11-13 08:32:06 -08:00

1 2 3 4 5 ...

7276 Commits