freebsd-nq

Author	SHA1	Message	Date
John Baldwin	d782385e9b	tcp_ratelimit: Handle some edge cases with TLS + RL send tags. - After a connection has fallen back from NIC TLS to SW TLS, any pacing rate changes should modify the inpcb send tag even though SB_TLS_IFNET is set. - If a connection tries to modify the pacing rate before the send tag has been converted from plain TLS to TLS + RL, don't fail the rate request set but let it fall through to setting the rate on the non-TLS inpcb RL tag. Reviewed by: gallatin, rrs, hselasky Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D34085	2022-01-31 16:40:04 -08:00
Gordon Bergling	4bd030b369	sctp(4): Fix a typo in an INVARIANTS panic message - s/failes/fails/ MFC after: 1 week	2022-01-28 13:20:52 +01:00
Richard Scheffenegger	4531b3450b	tcp: Tidying up the conditionals for unwinding a spurious RTO - Use the semantically correct TSTMP_xx macro when comparing timestamps. (No functional change) - check for bad retransmits only when TSopt is present in ACK (don't assume there will be a valid TSopt in the TCP options struct) - exclude tsecr == 0, since that most likely indicates an invalid ts echo return (tsecr) value. Reviewed By: tuexen, #transport MFC after: 3 days Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D34062	2022-01-27 18:59:55 +01:00
Richard Scheffenegger	68e623c3f0	tcp: Rewind erraneous RTO only while performing RTO retransmissions Under rare circumstances, a spurious retranmission is incorrectly detected and rewound, messing up various tcpcb values, which can lead to a panic when SACK is in use. Reviewed By: tuexen, chengc_netapp.com, #transport MFC after: 3 days Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D33979	2022-01-27 18:49:42 +01:00
Andrew Gallatin	8a7404b2ae	tcp: fix leaks in tcp_chg_pacing_rate error paths tcp_chg_pacing_rate() is expected to release the hw rate limit table, but failed to do so in several error cases, leading to ever increasing counts of flows using the rate. This patch was mostly done by rrs Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D34058 Reviewed by: hselasky, rrs, jhb (inital version, outside of Differential)	2022-01-27 10:35:03 -05:00
Andrew Gallatin	9ba117960e	Fix a memory leak when ip_output_send() returns EAGAIN due to send tag issues When ip_output_send() returns EAGAIN due to issues with send tags (route change, lagg failover, etc), it must free the mbuf. This is because ip_output_send() was written as a wrapper/replacement for a direct call to if_output(), and the contract with if_output() has historically been that it owns the mbufs once called. When ip_output_send() failed to free mbufs, it violated this assumption and lead to leaked mbufs. This was noticed when using NIC TLS in combination with hardware rate-limited connections. When seeing lots of NIC output drops triggered ratelimit send tag changes, we noticed we were leaking ktls_sessions, send tags and mbufs. This was due ip_output_send() leaking mbufs which held references to ktls_sessions, which in turn held references to send tags. Many thanks to jbh, rrs, hselasky and markj for their help in debugging this. Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D34054 Reviewed by: hselasky, jhb, rrs MFC after: 2 weeks	2022-01-27 10:34:34 -05:00
Gordon Bergling	9e58cca3e8	extra_tcp_stacks: Fix two typos in source code comments - s/differnt/different/ MFC after; 3 days	2022-01-26 18:02:55 +01:00
Gordon Bergling	b3df222eae	extra_tcp_stacks: Fix a few common typos TCP_BBR: - Fix a typo introducted in 1b90dfa5d2b0, which was reported by tuexen@ TCP_RACK: - Correct two sysctl descriptions: s/corret/correct/ tcp_bbr(4): Also fix s/measurment/measurement/ in the man page MFC after: 1 week	2022-01-26 10:35:17 +01:00
Wojciech Macek	0daa28057c	ip_mroute: add unlock in early-exit Add missing unlock if V_ip_mrotue is not set Obtained from: Semihalf	2022-01-22 14:48:47 +01:00
Wojciech Macek	889c60500d	ip_mroute: release epoch lock if mrouter is not configured Add mising "else" branch to release a lock if mrouter is not configured. Obtained from: Semihalf Sponsored by: Stormshield	2022-01-22 11:48:30 +01:00
Wojciech Macek	9ce46cbc95	ip_mroute: move ip_mrouter_done outside lock X_ip_mrouter_done might sleep, which triggers INVARIANTS to print additional errors on the screen. Move it outside the lock, but provide some basic synchronization to avoid race condition during module uninit/unload. Obtained from: Semihalf Sponsored by: Stormshield	2022-01-21 06:17:19 +01:00
Wojciech Macek	58630bdd13	Revert "ip_mroute: do not call epoch_waitwhen lock is taken" This reverts commit 2e72208b6c622505323ed48dc58830fc307392b1.	2022-01-21 06:17:19 +01:00
Randall Stewart	aac52f94ea	tcp: Warning cleanup from new compiler. The clang compiler recently got an update that generates warnings of unused variables where they were set, and then never used. This revision goes through the tcp stack and cleans all of those up. Reviewed by: Michael Tuexen, Gleb Smirnoff Sponsored by: Netflix Inc. Differential Revision:	2022-01-18 07:41:18 -05:00
Marko Zec	e7abe200c2	fib_algo: shift / mask by constants in dxr_lookup() Since trie configuration remains invariant during each DXR instance lifetime, instead of shifting and masking lookup keys by values computed at runtime, compile upfront several dxr_lookup() configurations with hardcoded shift / mask constants, and choose the apropriate lookup function version after each DXR instance rebuild. In synthetic tests this yields small but measurable (5-10%) lookup throughput improvement, depending on FIB size and prefix patterns. MFC after: 3 days	2022-01-17 00:13:47 +01:00
Gleb Smirnoff	1d41a49404	tcp_usr_connect: report actual error code when stack requests drop	2022-01-13 10:32:41 -08:00
Ryan Stone	3284f4925f	LRO: Don't merge ACK and non-ACK packets together LRO was willing to merge ACK and non-ACK packets together. This can cause incorrect th_ack values to be reported up the stack. While non-ACKs are quite unlikely to appear in practice, LRO's behaviour is against the spec. Make LRO unwilling to merge packets with different TH_ACK flag values in order to fix the issue. Found by: Sysunit test Differential Revision: https://reviews.freebsd.org/D33775 Reviewed by: rrs	2022-01-13 11:17:58 -05:00
Ryan Stone	24fe6643da	LRO: Fix lost packets when merging 1 payload with an ACK To check if it needed to regenerate a packet's header before sending it up the stack, LRO was checking if more than one payload had been merged into the packet. This failed in the case where a single payload was merged with one or more pure ACKs. This results in lost ACKs. Fix this by precisely tracking whether header regeneration is required instead of using an incorrect heuristic. Found with: Sysunit test Differential Revision: https://reviews.freebsd.org/D33774 Reviewed by: rrs	2022-01-13 11:17:48 -05:00
Wojciech Macek	776c34f646	ip_mroute: remove unused variables Sponsored by: Stormshield Obtained from: Semihalf	2022-01-11 13:06:22 +01:00
Wojciech Macek	2e72208b6c	ip_mroute: do not call epoch_waitwhen lock is taken mrouter_done is called with RAW IP lock taken. Some annoying printfs are visible on the console if INVARIANTS option is enabled. Provide atomic-based mechanism which counts enters and exits from/to critical section in ip_input and ip_output. Before de-initialization of function pointers ensure (with busy-wait) that mrouter de-initialization is visible to all readers and that we don't remove pointers (like ip_mforward etc.) in the middle of packet processing.	2022-01-11 11:19:32 +01:00
Wojciech Macek	68f28dd1cc	ip_mroute: do not sleep when lock is taken Kthread initialization calls uma_alloc which can sleep. Modify the code to use deferred work instead.	2022-01-11 11:19:32 +01:00
Robert Wing	eb18708ec8	syncache: accept packet with no SA when TCP_MD5SIG is set When TCP_MD5SIG is set on a socket, all packets are dropped that don't contain an MD5 signature. Relax this behavior to accept a non-signed packet when a security association doesn't exist with the peer. This is useful when a listen socket set with TCP_MD5SIG wants to handle connections protected with and without MD5 signatures. Reviewed by: bz (previous version) Sponsored by: nepustil.net Sponsored by: Klara Inc. Differential Revision: https://reviews.freebsd.org/D33227	2022-01-08 16:32:14 -09:00
Michael Tuexen	f87818eacf	sctp: miror change due to upstreaming	2022-01-03 23:03:06 +01:00
Gleb Smirnoff	afad340a14	inpcb: garbage collect INP_LOCK_INIT(), used only once in sctp Reviewed by: tuexen Differential revision: https://reviews.freebsd.org/D33543	2022-01-03 10:20:30 -08:00
Gleb Smirnoff	fec8a8c7cb	inpcb: use global UMA zones for protocols Provide structure inpcbstorage, that holds zones and lock names for a protocol. Initialize it with global protocol init using macro INPCBSTORAGE_DEFINE(). Then, at VNET protocol init supply it as the main argument to the in_pcbinfo_init(). Each VNET pcbinfo uses its private hash, but they all use same zone to allocate and SMR section to synchronize. Note: there is kern.ipc.maxsockets sysctl, which controls UMA limit on the socket zone, which was always global. Historically same maxsockets value is applied also to every PCB zone. Important fact: you can't create a pcb without a socket! A pcb may outlive its socket, however. Given that there are multiple protocols, and only one socket zone, the per pcb zone limits seem to have little value. Under very special conditions it may trigger a little bit earlier than socket zone limit, but in most setups the socket zone limit will be triggered earlier. When VIMAGE was added to the kernel PCB zones became per-VNET. This magnified existing disbalance further: now we have multiple pcb zones in multiple vnets limited to maxsockets, but every pcb requires a socket allocated from the global zone also limited by maxsockets. IMHO, this per pcb zone limit doesn't bring any value, so this patch drops it. If anybody explains value of this limit, it can be restored very easy - just 2 lines change to in_pcbstorage_init(). Differential revision: https://reviews.freebsd.org/D33542	2022-01-03 10:17:46 -08:00
Gleb Smirnoff	644ca0846d	domains: make domain_init() initialize only global state Now that each module handles its global and VNET initialization itself, there is no VNET related stuff left to do in domain_init(). Differential revision: https://reviews.freebsd.org/D33541	2022-01-03 10:15:22 -08:00
Gleb Smirnoff	89128ff3e4	protocols: init with standard SYSINIT(9) or VNET_SYSINIT The historical BSD network stack loop that rolls over domains and over protocols has no advantages over more modern SYSINIT(9). While doing the sweep, split global and per-VNET initializers. Getting rid of pr_init allows to achieve several things: o Get rid of ifdef's that protect against double foo_init() when both INET and INET6 are compiled in. o Isolate initializers statically to the module they init. o Makes code easier to understand and maintain. Reviewed by: melifaro Differential revision: https://reviews.freebsd.org/D33537	2022-01-03 10:15:21 -08:00
Kristof Provost	80871aeb0f	udp_var.h: other headers already include types.h Pointed out by: imp Sponsored by: Rubicon Communications, LLC ("Netgate")	2022-01-03 18:35:02 +01:00
Kristof Provost	aa70361d86	headers: make a few more headers self-contained Sponsored by: Rubicon Communications, LLC ("Netgate")	2022-01-03 10:12:30 +01:00
Gordon Bergling	1b90dfa5d2	tcp_bbr(4): Fix a few typos in sysctl descriptions - s/measurment/measurement/ MFC after: 3 days	2022-01-02 18:03:10 +01:00
Michael Tuexen	502d5e8500	sctp: improve counting of incoming chunks MFC after: 3 days	2022-01-01 20:59:47 +01:00
Michael Tuexen	4760956e9a	udp: use appropriate pcbinfo when signalling EHOSTDOWN MFC after: 3 days Sponsored by: Netflix, Inc.	2022-01-01 19:17:17 +01:00
Michael Tuexen	430df2abee	in_pcb: improve inp_next() If there is no inp to check, exit the loop iterating through them. Reported by: syzbot+403406a9cbf082b36ea4@syzkaller.appspotmail.com Reviewed by: glebius Sponsored by: Netflix, Inc.	2022-01-01 19:04:10 +01:00
Michael Tuexen	1adb91e521	sctp: retire sctp_mtu_size_reset() Thanks to Timo Voelker for making me aware that sctp_mtu_size_reset() is very similar to sctp_pathmtu_adjustment(). MFC after: 3 days	2021-12-30 15:30:11 +01:00
Michael Tuexen	2de2ae331b	sctp: improve sctp_pathmtu_adjustment() Allow the resending of DATA chunks to be controlled by the caller, which allows retiring sctp_mtu_size_reset() in a separate commit. Also improve the computaion of the overhead and use 32-bit integers consistently. Thanks to Timo Voelker for pointing me to the code. MFC after: 3 days	2021-12-30 15:16:05 +01:00
Alexander V. Chernikov	ff3a85d324	[lltable] Add per-family lltable getters. Introduce a new function, lltable_get(), to retrieve lltable pointer for the specified interface and family. Use it to avoid all-iftable list traversal when adding or deleting ARP/ND records. Differential Revision: https://reviews.freebsd.org/D33660 MFC after: 2 weeks	2021-12-29 20:57:15 +00:00
Gleb Smirnoff	4287aa5619	tcp_usr_shutdown: don't cast inp_ppcb to tcpcb before checking inp_flags While here move out one more erroneous condition out of the epoch and common return. The only functional change is that if we send control on a shut down socket we would get EINVAL instead of ECONNRESET. Reviewed by: tuexen Reported by: syzbot+8388cf7f401a7b6bece6@syzkaller.appspotmail.com Fixes: f64dc2ab5be38e5366271ef85ea90d8cb1c7841a	2021-12-28 08:50:02 -08:00
Michael Tuexen	a7ba00a438	sctp: minor improvements in sctp_get_frag_point MFC after: 3 days	2021-12-28 10:23:31 +01:00
Michael Tuexen	ca0dd19f09	sctp: check that the computed frag point is a multiple of 4 Reported by: syzbot+5da189fc1fe80b31f5bd@syzkaller.appspotmail.com MFC after: 3 days	2021-12-28 09:40:52 +01:00
Gleb Smirnoff	0af4ce4547	tcp_usr_shutdown: don't cast inp_ppcb to tcpcb before checking inp_flags Fixes: f64dc2ab5be38e5366271ef85ea90d8cb1c7841a	2021-12-27 16:58:09 -08:00
Michael Tuexen	989453da05	sctp: cleanup the SCTP_MAXSEG socket option. This patch makes the handling of the SCTP_MAXSEG socket option compliant with RFC 6458 (SCTP socket API) and fixes an issue found by syzkaller. Reported by: syzbot+a2791b89ab99121e3333@syzkaller.appspotmail.com MFC after: 3 days	2021-12-27 23:40:31 +01:00
Gleb Smirnoff	37a7f55737	tcp_usr_rcvd: don't cast inp_ppcb to tcpcb before checking inp_flags Fixes: f64dc2ab5be38e5366271ef85ea90d8cb1c7841a	2021-12-27 10:41:51 -08:00
Michael Tuexen	34ae6a1a44	sctp: cleanup, on functional change intended. MFC after: 3 days	2021-12-27 18:28:44 +01:00
Michael Tuexen	a859e9f9aa	sctp: apply limit for socket buffers as indicated in comment MFC after: 3 days	2021-12-27 18:15:29 +01:00
Gleb Smirnoff	a057769205	in_pcb: use jenkins hash over the entire IPv6 (or IPv4) address The intent is to provide more entropy than can be provided by just the 32-bits of the IPv6 address which overlaps with 6to4 tunnels. This is needed to mitigate potential algorithmic complexity attacks from attackers who can control large numbers of IPv6 addresses. Together with: gallatin Reviewed by: dwmalone, rscheff Differential revision: https://reviews.freebsd.org/D33254	2021-12-26 10:47:28 -08:00
Gleb Smirnoff	eb8dcdeac2	jail: network epoch protection for IP address lists Now struct prison has two pointers (IPv4 and IPv6) of struct prison_ip type. Each points into epoch context, address count and variable size array of addresses. These structures are freed with network epoch deferred free and are not edited in place, instead a new structure is allocated and set. While here, the change also generalizes a lot (but not enough) of IPv4 and IPv6 processing. E.g. address family agnostic helpers for kern_jail_set() are provided, that reduce v4-v6 copy-paste. The fast-path prison_check_ip[46]_locked() is also generalized into prison_ip_check() that can be executed with network epoch protection only. Reviewed by: jamie Differential revision: https://reviews.freebsd.org/D33339	2021-12-26 10:45:50 -08:00
Gleb Smirnoff	a370832bec	tcp: remove delayed drop KPI No longer needed after tcp_output() can ask caller to drop. Reviewed by: rrs, tuexen Differential revision: https://reviews.freebsd.org/D33371	2021-12-26 08:48:24 -08:00
Gleb Smirnoff	f64dc2ab5b	tcp: TCP output method can request tcp_drop The advanced TCP stacks (bbr, rack) may decide to drop a TCP connection when they do output on it. The default stack never does this, thus existing framework expects tcp_output() always to return locked and valid tcpcb. Provide KPI extension to satisfy demands of advanced stacks. If the output method returns negative error code, it means that caller must call tcp_drop(). In tcp_var() provide three inline methods to call tcp_output(): - tcp_output() is a drop-in replacement for the default stack, so that default stack can continue using it internally without modifications. For advanced stacks it would perform tcp_drop() and unlock and report that with negative error code. - tcp_output_unlock() handles the negative code and always converts it to positive and always unlocks. - tcp_output_nodrop() just calls the method and leaves the responsibility to drop on the caller. Sweep over the advanced stacks and use new KPI instead of using HPTS delayed drop queue for that. Reviewed by: rrs, tuexen Differential revision: https://reviews.freebsd.org/D33370	2021-12-26 08:48:19 -08:00
Gleb Smirnoff	dbbcc777de	rack: rack_do_compressed_ack_processing() can call tcp_drop() Reviewed by: rrs, tuexen Differential revision: https://reviews.freebsd.org/D33369	2021-12-26 08:48:15 -08:00
Gleb Smirnoff	66aeb0b53b	rack: drop connection synchronously, when we can For all functions that are leaves of tcp_input() call ctf_do_dropwithreset_conn() instead of ctf_do_dropwithreset(), cause we always got tp and we want it to be dropped. Reviewed by: rrs, tuexen Differential revision: https://reviews.freebsd.org/D33368	2021-12-26 08:48:10 -08:00
Gleb Smirnoff	17ac6b1c14	bbr: drop packet synchronously in ctf_do_dropwithreset_conn() This function is always called from tcp_do_segment() method, that can drop tcpcb and return unlocked. Reviewed by: rrs, tuexen Differential revision: https://reviews.freebsd.org/D33367	2021-12-26 08:48:06 -08:00

1 2 3 4 5 ...

7236 Commits