freebsd-dev

Author	SHA1	Message	Date
Mateusz Guzik	c67eb393fa	tcp_hpts: plug a compiler warn Sponsored by: Rubicon Communications, LLC ("Netgate")	2023-04-05 14:32:13 +00:00
Gleb Smirnoff	84b42df834	rack: fix build on powerpc	2023-04-04 16:35:36 -07:00
Randall Stewart	030434acaf	Update rack to the latest code used at NF. There have been many changes to rack over the last couple of years, including: a) Ability when switching stacks to have one stack query another. b) Internal use of micro-second timers instead of ticks. c) Many changes to pacing in forms of 1) Improvements to Dynamic Goodput Pacing (DGP) 2) Improvements to fixed rate paciing 3) A new feature called hybrid pacing where the requestor can get a combination of DGP and fixed rate pacing with deadlines for delivery that can dynamically speed things up. d) All kinds of bugs found during extensive testing and use of the rack stack for streaming video and in fact all data transferred by NF Reviewed by: glebius, gallatin, tuexen Sponsored By: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D39402	2023-04-04 16:05:46 -04:00
Gleb Smirnoff	2ff8187efd	tcp_hpts: remove dead code tcp_drop_in_pkts() Should have gone in `f971e79139`.	2023-04-04 12:55:27 -07:00
Randall Stewart	73ee5756de	Fixes in the tcp infrastructure with respect to stack changes as well as other infrastructure updates for incoming rack features. So stack switching as always been a bit of a issue. We currently use a break before make setup which means that if something goes wrong you have to try to get back to a stack. This patch among a lot of other things changes that so that it is a make before break. We also expand some of the function blocks in prep for new features in rack that will allow more controlled pacing. We also add other abilities such as the pathway for a stack to query a previous stack to acquire from it critical state information so things in flight don't get dropped or mis-handled when switching stacks. We also add the concept of a timer granularity. This allows an alternate stack to change from the old ticks granularity to microseconds and of course this even gives us a pathway to go to nanosecond timekeeping if we need to (something for the data center to consider for sure). Once all this lands I will then update rack to begin using all these new features. Reviewed by: tuexen Sponsored by: Netflix Inc Differential Revision: https://reviews.freebsd.org/D39210	2023-04-01 01:46:38 -04:00
Kristof Provost	28921c4f7d	carp: allow commands to use interface name rather than index Get/set commands can now choose to provide the interface name rather than the interface index. This allows userspace to avoid a call to if_nametoindex(). Suggested by: melifaro Reviewed by: melifaro Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D39359	2023-03-31 11:29:58 +02:00
Richard Scheffenegger	f858eb916f	tcp: send SACK rescue retransmission also mid-stream Previously, SACK rescue retransmissions would only happen on a loss recovery at the tail end of the send buffer. This extends the mechanism such that partial ACKs without SACK mid-stream also trigger a rescue retransmission to try avoid an otherwise unavoidable retransmission timeout. Reviewed By: tuexen, #transport Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D39274	2023-03-28 04:47:01 +02:00
Gleb Smirnoff	78e6c3aacc	tcp: update error counter when dropping a packet due to bad source Use the same counter that ip_input()/ip6_input() use for bad destination address. For IPv6 this is already heavily abused ip6s_badscope, which needs to be split into several separate error counters. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D39234	2023-03-27 18:37:15 -07:00
Kristof Provost	ccff2078af	carp: fix source MAC When we're not in unicast mode we need to change the source MAC address. The check for this was wrong, because IN_MULTICAST() assumes host endianness and the address in sc_carpaddr is in network endianness. Sponsored by: Rubicon Communications, LLC ("Netgate")	2023-03-28 01:18:18 +02:00
Alexander V. Chernikov	19e43c163c	netlink: add netlink KPI to the kernel by default This change does the following: Base Netlink KPIs (ability to register the family, parse and/or write a Netlink message) are always present in the kernel. Specifically, * Implementation of genetlink family/group registration/removal, some base accessors (netlink_generic_kpi.c, 260 LoC) are compiled in unconditionally. * Basic TLV parser functions (netlink_message_parser.c, 507 LoC) are compiled in unconditionally. * Glue functions (netlink<>rtsock), malloc/core sysctl definitions (netlink_glue.c, 259 LoC) are compiled in unconditionally. * The rest of the KPI _functions_ are defined in the netlink_glue.c, but their implementation calls a pointer to either the stub function or the actual function, depending on whether the module is loaded or not. This approach allows to have only 1k LoC out of ~3.7k LoC (current sys/netlink implementation) in the kernel, which will not grow further. It also allows for the generic netlink kernel customers to load successfully without requiring Netlink module and operate correctly once Netlink module is loaded. Reviewed by: imp MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D39269	2023-03-27 13:55:44 +00:00
Andrew Gallatin	abba58766f	LRO: Add missing checks for invalid IP addresses LRO bypasses normal ip_input()/tcp_input() and lacks several checks that are present in the normal path. Without these checks, it is possible to trigger assertions added in `b0ccf53f24` Reviewed by: glebius, rrs Sponsored by: Netflix	2023-03-25 11:56:02 -04:00
Kristof Provost	511a6d5ed3	carp: use if_name() Reported by: melifaro Sponsored by: Rubicon Communications, LLC ("Netgate")	2023-03-20 14:37:10 +01:00
Kristof Provost	137818006d	carp: support unicast Allow users to configure the address to send carp messages to. This allows carp to be used in unicast mode, which is useful in certain virtual configurations (e.g. AWS, VMWare ESXi, ...) Reviewed by: melifaro Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D38940	2023-03-20 14:37:09 +01:00
Kristof Provost	40e0435964	carp: add netlink interface Allow carp configuration information to be supplied and retrieved via netlink. Reviewed by: melifaro Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D39048	2023-03-20 10:52:27 +01:00
Michael Tuexen	48345048cd	sctp: fix typo in assignment	2023-03-18 23:58:50 +01:00
Michael Tuexen	8ed1e2c880	sctp: enforce Kahn's rule during the handshake Don't take RTT measurements on packets containing INIT or COOKIE-ECHO chunks, when they were retransmitted. MFC after: 1 week	2023-03-16 17:40:40 +01:00
Randall Stewart	69c7c81190	Move access to tcp's t_logstate into inline functions and provide new tracepoint and bbpoint capabilities. The TCP stacks have long accessed t_logstate directly, but in order to do tracepoints and the new bbpoints we need to move to using the new inline functions. This adds them and moves rack to now use the tcp_tracepoints. Reviewed by: tuexen, gallatin Sponsored by: Netflix Inc Differential Revision: https://reviews.freebsd.org/D38831	2023-03-16 11:43:16 -04:00
Zhenlei Huang	49cad3daf2	carp: carp_master_down_locked() requires net epoch Reviewed by: kp Fixes: `1d126e9b94` carp: Widen epoch coverage MFC after: 1 day Differential Revision: https://reviews.freebsd.org/D39113	2023-03-16 18:07:03 +08:00
Michael Tuexen	c91ae48a25	sctp: don't do RTT measurements with cookies When receiving a cookie, the receiver does not know whether the peer retransmitted the COOKIE-ECHO chunk or not. Therefore, don't do an RTT measurement. It might be much too long. To overcome this limitation, one could do at least two things: 1. Bundle the INIT-ACK chunk with a HEARTBEAT chunk for doing the RTT measurement. But this is not allowed. 2. Add a flag to the COOKIE-ECHO chunk, which indicates that it is the initial transmission, and not a retransmission. But this requires an RFC. MFC after: 1 week	2023-03-16 10:45:13 +01:00
Michael Tuexen	cee09bda03	sctp: allow disabling of SCTP_ACCEPT_ZERO_CHECKSUM socket option	2023-03-15 22:55:23 +01:00
Michael Tuexen	6026b45aab	sctp: improve negotiation of zero checksum feature Enforce consistency between announcing 0-cksum support and actually using it in the association. The value from the inp when the INIT ACK is sent must be used, not the one from the inp when the cookie is received.	2023-03-15 22:29:52 +01:00
Mina Galić	0b0ae2e4cd	jail: convert several functions from int to bool these functions exclusively return (0) and (1), so convert them to bool We also convert some networking related jail functions from int to bool some of which were returning an error that was never used. Differential Revision: https://reviews.freebsd.org/D29659 Reviewed by: imp, jamie (earlier version) Pull Request: https://github.com/freebsd/freebsd-src/pull/663	2023-03-14 21:05:33 -06:00
Mark Johnston	aa71d6b4a2	netinet: Disallow unspecified addresses in ICMP-embedded packets Reported by: glebius Reported by: syzbot+981c528ccb5c5534dffc@syzkaller.appspotmail.com Reviewed by: tuexen, glebius MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D38936	2023-03-13 10:45:56 -04:00
Michael Tuexen	4a2b92d99f	sctp: initial implementation of draft-tuexen-tsvwg-sctp-zero-checksum	2023-03-10 01:45:46 +01:00
Mark Johnston	713264f6b8	netinet: Tighten checks for unspecified source addresses The assertions added in commit `b0ccf53f24` ("inpcb: Assert against wildcard addrs in in_pcblookup_hash_locked()") revealed that protocol layers may pass the unspecified address to in_pcblookup(). Add some checks to filter out such packets before we attempt an inpcb lookup: - Disallow the use of an unspecified source address in in_pcbladdr() and in6_pcbladdr(). - Disallow IP packets with an unspecified destination address. - Disallow TCP packets with an unspecified source address, and add an assertion to verify the comment claiming that the case of an unspecified destination address is handled by the IP layer. Reported by: syzbot+9ca890fb84e984e82df2@syzkaller.appspotmail.com Reported by: syzbot+ae873c71d3c71d5f41cb@syzkaller.appspotmail.com Reported by: syzbot+e3e689aba1d442905067@syzkaller.appspotmail.com Reviewed by: glebius, melifaro MFC after: 2 weeks Sponsored by: Klara, Inc. Sponsored by: Modirum MDPay Differential Revision: https://reviews.freebsd.org/D38570	2023-03-06 15:06:00 -05:00
Fidaullah Noonari	290f7f4a09	in_mcat.c: change multicast not member condition If there is no source filter entry => block if that's SSM ("exclude" mode per RFC 3678 clause 3). If there is an entry => check its action & block if the action is "exclude". It would be nice if the test case in this PR were converted into an ATF test case, but not blocking on that. Reviewed by: imp, melifaro Pull Request: https://github.com/freebsd/freebsd-src/pull/601	2023-03-03 22:25:17 -07:00
Gleb Smirnoff	7fc82fd1f8	ipfw: garbage collect ip_fw_chk_ptr It is a relict left from the old times when ipfw(4) was hooked into IP stack directly, without pfil(9).	2023-03-03 10:30:15 -08:00
Mark Johnston	317fa5169d	netinet: Remove the IP(V6)_RSS_LISTEN_BUCKET socket option It has no effect, and an exp-run revealed that it is not in use. PR: 261398 (exp-run) Reviewed by: mjg, glebius Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D38822	2023-02-28 15:57:21 -05:00
Richard Scheffenegger	399a5655e6	tcp: Make TCP PCAP buffer properly configurable. Reviewed By: tuexen, cc, #transport MFC after: 3 days Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D38824	2023-02-28 20:12:11 +01:00
Mark Johnston	3aff4ccdd7	netinet: Remove IP(V6)_BINDMULTI This option was added in commit `0a100a6f1e` but was never completed. In particular, there is no logic to map flowids to different listening sockets, so it accomplishes basically the same thing as SO_REUSEPORT. Meanwhile, we've since added SO_REUSEPORT_LB, which at least tries to balance among listening sockets using a hash of the 4-tuple and some optional NUMA policy. The option was never documented or completed, and an exp-run revealed nothing using it in the ports tree. Moreover, it complicates the already very complicated in_pcbbind_setup(), and the checking in in_pcbbind_check_bindmulti() is insufficient. So, let's remove it. PR: 261398 (exp-run) Reviewed by: glebius Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D38574	2023-02-27 10:03:11 -05:00
Alfonso	2f201df1f8	Change hw_tls to a bool Reviewed by: imp Pull Request: https://github.com/freebsd/freebsd-src/pull/512	2023-02-25 09:59:11 -07:00
Mateusz Guzik	3a01a97d23	mroute: partially sanitize the file There is rampant inconsistent formatting all around, make it mostly style(9)-conformant. While here: - drop malloc casts - rename a rw lock from mroute_mtx to mroute_lock - replace NOTREACHED comment with __assert_unreachable Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D38652	2023-02-23 13:35:44 +00:00
Michael Tuexen	453aa7fac9	tcp: ensure the tcpcb is not NULL when logging an event When calling tcp_bblog_pru() on some error paths, tp is NULL, therefore handle it. Sponsored by: Netflix, Inc.	2023-02-23 02:04:17 +01:00
Michael Tuexen	624de4eca5	tcp: remove unused function prototype tcp_trace was implemented in tcp_debug.c, which was removed recently. Reviewed by: rscheff@, zlei@ Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D38712	2023-02-22 13:28:17 +01:00
Michael Tuexen	76578d601e	bblog: improve timeout event handling Extend the BBLog RTO event to deal with all timers of the base stack. Also provide information about starting, stopping, and running off. The expiration of the retransmission timer is reported as it was done before. Reviewed by: rscheff@ Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D38710	2023-02-21 22:46:15 +01:00
Michael Tuexen	6b802933f1	tcp: rearrange enum and remove unused variable Rearrange the enum tt_which such that TT_REXMIT is 0. This allows an extension of the BBLog event RTO in a backwards compatible way. Remove tcptimers, which was only used in trpt, a utility removed from the source tree recently. Reviewed by: glebius@, guest-ccui@ Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D38547	2023-02-21 18:26:49 +01:00
Michael Tuexen	4065becf3f	bblog: unbreak build Ensure that tp is always declared and set. Reported by: Michael Butler Sponsored by: Netflix, Inc.	2023-02-21 18:16:59 +01:00
Michael Tuexen	00812bbda2	bblog: add logging of protocol user requests This information was available in trpt and is useful. So provide a way to get this information via TCP BBLog. Reviewed by: rscheff@ Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D38701	2023-02-21 12:07:35 +01:00
Michael Tuexen	b16a37eda8	bblog: sync tcp_log_events with Netflix tree This allows the addition of entries to tcp_log_events without causing conflicts in the Netflix tree. rrs@ will upstream the related functional changes eventually. Reviewed by: guest-ccui@, rrs@ Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D38646	2023-02-20 21:42:57 +01:00
John Baldwin	cda6bdbaa1	tcp: Don't try to disconnect a socket multiple times. When the checks for INP_TIMEWAIT were removed, tcp_usr_close() and tcp_usr_disconnect() were no longer prevented from calling tcp_disconnect() on a socket that was already disconnected. This triggered a panic in cxgbe(4) for TOE where the tcp_disconnect() on an already-disconnected socket invoked tcp_output() on a socket that was already in time-wait. Reviewed by: rrs, np Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D37112	2023-02-17 09:13:53 -08:00
Gleb Smirnoff	96871af013	inpcb: use family specific sockaddr argument for bind functions Do the cast from sockaddr to either IPv4 or IPv6 sockaddr in the protocol's pr_bind method and from there on go down the call stack with family specific argument. Reviewed by: zlei, melifaro, markj Differential Revision: https://reviews.freebsd.org/D38601	2023-02-15 10:30:16 -08:00
Gleb Smirnoff	caf32b260a	pfil: add pfil_mem_{in,out}() and retire pfil_run_hooks() The `0b70e3e78b` changed the original design of a single entry point into pfil(9) chains providing separate functions for the filtering points that always provide mbufs and know the direction of a flow. The motivation was to reduce branching. The logical continuation would be to do the same for the filtering points that always provide a memory pointer and retire the single entry point. o Hooks now provide two functions: one for mbufs and optional for memory pointers. o pfil_hook_args() has a new member and pfil_add_hook() has a requirement to zero out uninitialized data. Bump PFIL_VERSION. o As it was before, a hook function for a memory pointer may realloc into an mbuf. Such mbuf would be returned via a pointer that must be provided in argument. o The only hook that supports memory pointers is ipfw:default-link. It is rewritten to provide two functions. o All remaining uses of pfil_run_hooks() are converted to pfil_mem_in(). o Transparent union of pfil_packet_t and tricks to fix pointer alignment are retired. Internal pfil_realloc() reduces down to m_devget() and thus is retired, too. Reviewed by: mjg, ocochard Differential revision: https://reviews.freebsd.org/D37977	2023-02-14 10:02:49 -08:00
Gleb Smirnoff	a22561501f	net: use pfil_mbuf_{in,out} where we always have an mbuf This finalizes what has been started in `0b70e3e78b`. Reviewed by: kp, mjg Differential revision: https://reviews.freebsd.org/D37976	2023-02-14 10:02:49 -08:00
Mark Johnston	636b19ead4	tcp: Disallow re-connection of a connected socket soconnectat() tries to ensure that one cannot connect a connected socket. However, the check is racy and does not really prevent two threads from attempting to connect the same TCP socket. Modify tcp_connect() and tcp6_connect() to perform the check again, this time synchronized by the inpcb lock, under which we call soisconnecting(). Reported by: syzkaller Reviewed by: glebius MFC after: 2 weeks Sponsored by: Klara, Inc. Sponsored by: Modirum MDPay Differential Revision: https://reviews.freebsd.org/D38507	2023-02-14 10:07:19 -05:00
Mark Johnston	c7ea65ec69	inpcb: refcount_release() returns a bool No functional change intended. MFC after: 1 week Sponsored by: Klara, Inc.	2023-02-13 16:35:47 -05:00
Mark Johnston	775da7f8a9	tcp: Remove a redundant net_epoch entry in tcp6_connect() tcp6_connect() is always called in a net_epoch read section. Fixes: `3d76be28ec` ("netinet6: require network epoch for in6_pcbconnect()") Reviewed by: tuexen, glebius Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D38506	2023-02-13 16:35:47 -05:00
Mateusz Guzik	937b00ac0d	tcp: add missing void keyword to tcp_stats_init Sponsored by: Rubicon Communications, LLC ("Netgate")	2023-02-13 18:38:04 +00:00
Mateusz Guzik	e4542107d8	sctp: ansify Sponsored by: Rubicon Communications, LLC ("Netgate")	2023-02-13 18:17:10 +00:00
Mark Johnston	4130ea611f	inpcb: Split in_pcblookup_hash_locked() and clean up a bit Split the in_pcblookup_hash_locked() function into several independent subroutine calls, each of which does some kind of hash table lookup. This refactoring makes it easier to introduce variants of the lookup algorithm that behave differently depending on whether they are synchronized by SMR or the PCB database hash lock. While here, do some related cleanup: - Remove an unused ifnet parameter from internal functions. Keep it in external functions so that it can be used in the future to derive a v6 scopeid. - Reorder the parameters to in_pcblookup_lbgroup() to be consistent with the other lookup functions. - Remove an always-true check from in_pcblookup_lbgroup(): we can assume that we're performing a wildcard match. No functional change intended. Reviewed by: glebius Differential Revision: https://reviews.freebsd.org/D38364	2023-02-09 16:15:03 -05:00
Andrew Gallatin	c0e4090e3d	ktls: Accurately track if ifnet ktls is enabled This allows us to avoid spurious calls to ktls_disable_ifnet() When we implemented ifnet kTLSe, we set a flag in the tx socket buffer (SB_TLS_IFNET) to indicate ifnet kTLS. This flag meant that now, or in the past, ifnet ktls was active on a socket. Later, I added code to switch ifnet ktls sessions to software in the case of lossy TCP connections that have a high retransmit rate. Because TCP was using SB_TLS_IFNET to know if it needed to do math to calculate the retransmit ratio and potentially call into ktls_disable_ifnet(), it was doing unneeded work long after a session was moved to software. This patch carefully tracks whether or not ifnet ktls is still enabled on a TCP connection. Because the inp is now embedded in the tcpcb, and because TCP is the most frequent accessor of this state, it made sense to move this from the socket buffer flags to the tcpcb. Because we now need reliable access to the tcbcb, we take a ref on the inp when creating a tx ktls session. While here, I noticed that rack/bbr were incorrectly implementing tfb_hwtls_change(), and applying the change to all pending sends, when it should apply only to future sends. This change reduces spurious calls to ktls_disable_ifnet() by 95% or so in a Netflix CDN environment. Reviewed by: markj, rrs Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D38380	2023-02-09 12:44:44 -05:00

1 2 3 4 5 ...

7655 Commits