Commit Graph

7640 Commits

Author SHA1 Message Date
Michael Tuexen
8ed1e2c880 sctp: enforce Kahn's rule during the handshake
Don't take RTT measurements on packets containing INIT or COOKIE-ECHO
chunks, when they were retransmitted.

MFC after:	1 week
2023-03-16 17:40:40 +01:00
Randall Stewart
69c7c81190 Move access to tcp's t_logstate into inline functions and provide new tracepoint and bbpoint capabilities.
The TCP stacks have long accessed t_logstate directly, but in order to do tracepoints and the new bbpoints
we need to move to using the new inline functions. This adds them and moves rack to now use
the tcp_tracepoints.

Reviewed by: tuexen, gallatin
Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D38831
2023-03-16 11:43:16 -04:00
Zhenlei Huang
49cad3daf2 carp: carp_master_down_locked() requires net epoch
Reviewed by:	kp
Fixes:		1d126e9b94 carp: Widen epoch coverage
MFC after:	1 day
Differential Revision:	https://reviews.freebsd.org/D39113
2023-03-16 18:07:03 +08:00
Michael Tuexen
c91ae48a25 sctp: don't do RTT measurements with cookies
When receiving a cookie, the receiver does not know whether the
peer retransmitted the COOKIE-ECHO chunk or not. Therefore, don't
do an RTT measurement. It might be much too long.
To overcome this limitation, one could do at least two things:
1. Bundle the INIT-ACK chunk with a HEARTBEAT chunk for doing the
   RTT measurement. But this is not allowed.
2. Add a flag to the COOKIE-ECHO chunk, which indicates that it
   is the initial transmission, and not a retransmission. But
   this requires an RFC.

MFC after:	1 week
2023-03-16 10:45:13 +01:00
Michael Tuexen
cee09bda03 sctp: allow disabling of SCTP_ACCEPT_ZERO_CHECKSUM socket option 2023-03-15 22:55:23 +01:00
Michael Tuexen
6026b45aab sctp: improve negotiation of zero checksum feature
Enforce consistency between announcing 0-cksum support and actually
using it in the association. The value from the inp when the
INIT ACK is sent must be used, not the one from the inp when the
cookie is received.
2023-03-15 22:29:52 +01:00
Mina Galić
0b0ae2e4cd jail: convert several functions from int to bool
these functions exclusively return (0) and (1), so convert them to bool

We also convert some networking related jail functions from int to bool
some of which were returning an error that was never used.

Differential Revision: https://reviews.freebsd.org/D29659
Reviewed by: imp, jamie (earlier version)
Pull Request: https://github.com/freebsd/freebsd-src/pull/663
2023-03-14 21:05:33 -06:00
Mark Johnston
aa71d6b4a2 netinet: Disallow unspecified addresses in ICMP-embedded packets
Reported by:	glebius
Reported by:	syzbot+981c528ccb5c5534dffc@syzkaller.appspotmail.com
Reviewed by:	tuexen, glebius
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D38936
2023-03-13 10:45:56 -04:00
Michael Tuexen
4a2b92d99f sctp: initial implementation of draft-tuexen-tsvwg-sctp-zero-checksum 2023-03-10 01:45:46 +01:00
Mark Johnston
713264f6b8 netinet: Tighten checks for unspecified source addresses
The assertions added in commit b0ccf53f24 ("inpcb: Assert against
wildcard addrs in in_pcblookup_hash_locked()") revealed that protocol
layers may pass the unspecified address to in_pcblookup().

Add some checks to filter out such packets before we attempt an inpcb
lookup:
- Disallow the use of an unspecified source address in in_pcbladdr() and
  in6_pcbladdr().
- Disallow IP packets with an unspecified destination address.
- Disallow TCP packets with an unspecified source address, and add an
  assertion to verify the comment claiming that the case of an
  unspecified destination address is handled by the IP layer.

Reported by:	syzbot+9ca890fb84e984e82df2@syzkaller.appspotmail.com
Reported by:	syzbot+ae873c71d3c71d5f41cb@syzkaller.appspotmail.com
Reported by:	syzbot+e3e689aba1d442905067@syzkaller.appspotmail.com
Reviewed by:	glebius, melifaro
MFC after:	2 weeks
Sponsored by:	Klara, Inc.
Sponsored by:	Modirum MDPay
Differential Revision:	https://reviews.freebsd.org/D38570
2023-03-06 15:06:00 -05:00
Fidaullah Noonari
290f7f4a09 in_mcat.c: change multicast not member condition
If there is no source filter entry => block if that's SSM ("exclude"
mode per RFC 3678 clause 3).  If there is an entry => check its action &
block if the action is "exclude".

It would be nice if the test case in this PR were converted into an ATF
test case, but not blocking on that.

Reviewed by: imp, melifaro
Pull Request: https://github.com/freebsd/freebsd-src/pull/601
2023-03-03 22:25:17 -07:00
Gleb Smirnoff
7fc82fd1f8 ipfw: garbage collect ip_fw_chk_ptr
It is a relict left from the old times when ipfw(4) was hooked
into IP stack directly, without pfil(9).
2023-03-03 10:30:15 -08:00
Mark Johnston
317fa5169d netinet: Remove the IP(V6)_RSS_LISTEN_BUCKET socket option
It has no effect, and an exp-run revealed that it is not in use.

PR:		261398 (exp-run)
Reviewed by:	mjg, glebius
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D38822
2023-02-28 15:57:21 -05:00
Richard Scheffenegger
399a5655e6 tcp: Make TCP PCAP buffer properly configurable.
Reviewed By:		tuexen, cc, #transport
MFC after:		3 days
Sponsored by:		NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D38824
2023-02-28 20:12:11 +01:00
Mark Johnston
3aff4ccdd7 netinet: Remove IP(V6)_BINDMULTI
This option was added in commit 0a100a6f1e but was never completed.
In particular, there is no logic to map flowids to different listening
sockets, so it accomplishes basically the same thing as SO_REUSEPORT.
Meanwhile, we've since added SO_REUSEPORT_LB, which at least tries to
balance among listening sockets using a hash of the 4-tuple and some
optional NUMA policy.

The option was never documented or completed, and an exp-run revealed
nothing using it in the ports tree.  Moreover, it complicates the
already very complicated in_pcbbind_setup(), and the checking in
in_pcbbind_check_bindmulti() is insufficient.  So, let's remove it.

PR:		261398 (exp-run)
Reviewed by:	glebius
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D38574
2023-02-27 10:03:11 -05:00
Alfonso
2f201df1f8 Change hw_tls to a bool
Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/512
2023-02-25 09:59:11 -07:00
Mateusz Guzik
3a01a97d23 mroute: partially sanitize the file
There is rampant inconsistent formatting all around, make it mostly
style(9)-conformant.

While here:
- drop malloc casts
- rename a rw lock from mroute_mtx to mroute_lock
- replace NOTREACHED comment with __assert_unreachable

Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D38652
2023-02-23 13:35:44 +00:00
Michael Tuexen
453aa7fac9 tcp: ensure the tcpcb is not NULL when logging an event
When calling tcp_bblog_pru() on some error paths, tp is NULL,
therefore handle it.

Sponsored by:	Netflix, Inc.
2023-02-23 02:04:17 +01:00
Michael Tuexen
624de4eca5 tcp: remove unused function prototype
tcp_trace was implemented in tcp_debug.c, which was removed recently.

Reviewed by:		rscheff@, zlei@
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D38712
2023-02-22 13:28:17 +01:00
Michael Tuexen
76578d601e bblog: improve timeout event handling
Extend the BBLog RTO event to deal with all timers of the base
stack. Also provide information about starting, stopping, and
running off. The expiration of the retransmission timer is
reported as it was done before.

Reviewed by:		rscheff@
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D38710
2023-02-21 22:46:15 +01:00
Michael Tuexen
6b802933f1 tcp: rearrange enum and remove unused variable
Rearrange the enum tt_which such that TT_REXMIT is 0. This allows
an extension of the BBLog event RTO in a backwards compatible way.
Remove tcptimers, which was only used in trpt, a utility removed
from the source tree recently.

Reviewed by:		glebius@, guest-ccui@
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D38547
2023-02-21 18:26:49 +01:00
Michael Tuexen
4065becf3f bblog: unbreak build
Ensure that tp is always declared and set.

Reported by:	Michael Butler
Sponsored by:	Netflix, Inc.
2023-02-21 18:16:59 +01:00
Michael Tuexen
00812bbda2 bblog: add logging of protocol user requests
This information was available in trpt and is useful. So provide
a way to get this information via TCP BBLog.

Reviewed by:		rscheff@
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D38701
2023-02-21 12:07:35 +01:00
Michael Tuexen
b16a37eda8 bblog: sync tcp_log_events with Netflix tree
This allows the addition of entries to tcp_log_events without
causing conflicts in the Netflix tree.
rrs@ will upstream the related functional changes eventually.

Reviewed by:		guest-ccui@, rrs@
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D38646
2023-02-20 21:42:57 +01:00
John Baldwin
cda6bdbaa1 tcp: Don't try to disconnect a socket multiple times.
When the checks for INP_TIMEWAIT were removed, tcp_usr_close() and
tcp_usr_disconnect() were no longer prevented from calling
tcp_disconnect() on a socket that was already disconnected.  This
triggered a panic in cxgbe(4) for TOE where the tcp_disconnect() on an
already-disconnected socket invoked tcp_output() on a socket that was
already in time-wait.

Reviewed by:	rrs, np
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D37112
2023-02-17 09:13:53 -08:00
Gleb Smirnoff
96871af013 inpcb: use family specific sockaddr argument for bind functions
Do the cast from sockaddr to either IPv4 or IPv6 sockaddr in the
protocol's pr_bind method and from there on go down the call
stack with family specific argument.

Reviewed by:		zlei, melifaro, markj
Differential Revision:	https://reviews.freebsd.org/D38601
2023-02-15 10:30:16 -08:00
Gleb Smirnoff
caf32b260a pfil: add pfil_mem_{in,out}() and retire pfil_run_hooks()
The 0b70e3e78b changed the original design of a single entry point
into pfil(9) chains providing separate functions for the filtering
points that always provide mbufs and know the direction of a flow.
The motivation was to reduce branching.  The logical continuation
would be to do the same for the filtering points that always provide
a memory pointer and retire the single entry point.

o Hooks now provide two functions: one for mbufs and optional for
  memory pointers.
o pfil_hook_args() has a new member and pfil_add_hook() has a
  requirement to zero out uninitialized data. Bump PFIL_VERSION.
o As it was before, a hook function for a memory pointer may realloc
  into an mbuf.  Such mbuf would be returned via a pointer that must
  be provided in argument.
o The only hook that supports memory pointers is ipfw:default-link.
  It is rewritten to provide two functions.
o All remaining uses of pfil_run_hooks() are converted to
  pfil_mem_in().
o Transparent union of pfil_packet_t and tricks to fix pointer
  alignment are retired. Internal pfil_realloc() reduces down to
  m_devget() and thus is retired, too.

Reviewed by:		mjg, ocochard
Differential revision:	https://reviews.freebsd.org/D37977
2023-02-14 10:02:49 -08:00
Gleb Smirnoff
a22561501f net: use pfil_mbuf_{in,out} where we always have an mbuf
This finalizes what has been started in 0b70e3e78b.

Reviewed by:		kp, mjg
Differential revision:	https://reviews.freebsd.org/D37976
2023-02-14 10:02:49 -08:00
Mark Johnston
636b19ead4 tcp: Disallow re-connection of a connected socket
soconnectat() tries to ensure that one cannot connect a connected
socket.  However, the check is racy and does not really prevent two
threads from attempting to connect the same TCP socket.

Modify tcp_connect() and tcp6_connect() to perform the check again, this
time synchronized by the inpcb lock, under which we call
soisconnecting().

Reported by:	syzkaller
Reviewed by:	glebius
MFC after:	2 weeks
Sponsored by:	Klara, Inc.
Sponsored by:	Modirum MDPay
Differential Revision:	https://reviews.freebsd.org/D38507
2023-02-14 10:07:19 -05:00
Mark Johnston
c7ea65ec69 inpcb: refcount_release() returns a bool
No functional change intended.

MFC after:	1 week
Sponsored by:	Klara, Inc.
2023-02-13 16:35:47 -05:00
Mark Johnston
775da7f8a9 tcp: Remove a redundant net_epoch entry in tcp6_connect()
tcp6_connect() is always called in a net_epoch read section.

Fixes:		3d76be28ec ("netinet6: require network epoch for in6_pcbconnect()")
Reviewed by:	tuexen, glebius
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D38506
2023-02-13 16:35:47 -05:00
Mateusz Guzik
937b00ac0d tcp: add missing void keyword to tcp_stats_init
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2023-02-13 18:38:04 +00:00
Mateusz Guzik
e4542107d8 sctp: ansify
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2023-02-13 18:17:10 +00:00
Mark Johnston
4130ea611f inpcb: Split in_pcblookup_hash_locked() and clean up a bit
Split the in_pcblookup_hash_locked() function into several independent
subroutine calls, each of which does some kind of hash table lookup.
This refactoring makes it easier to introduce variants of the lookup
algorithm that behave differently depending on whether they are
synchronized by SMR or the PCB database hash lock.

While here, do some related cleanup:
- Remove an unused ifnet parameter from internal functions.  Keep it in
  external functions so that it can be used in the future to derive a v6
  scopeid.
- Reorder the parameters to in_pcblookup_lbgroup() to be consistent with
  the other lookup functions.
- Remove an always-true check from in_pcblookup_lbgroup(): we can assume
  that we're performing a wildcard match.

No functional change intended.

Reviewed by:	glebius
Differential Revision:	https://reviews.freebsd.org/D38364
2023-02-09 16:15:03 -05:00
Andrew Gallatin
c0e4090e3d ktls: Accurately track if ifnet ktls is enabled
This allows us to avoid spurious calls to ktls_disable_ifnet()

When we implemented ifnet kTLSe, we set a flag in the tx socket
buffer (SB_TLS_IFNET) to indicate ifnet kTLS.  This flag meant that
now, or in the past, ifnet ktls was active on a socket.  Later,
I added code to switch ifnet ktls sessions to software in the case
of lossy TCP connections that have a high retransmit rate.
Because TCP was using SB_TLS_IFNET to know if it needed to do math
to calculate the retransmit ratio and potentially call into
ktls_disable_ifnet(), it was doing unneeded work long after
a session was moved to software.

This patch carefully tracks whether or not ifnet ktls is still enabled
on a TCP connection.  Because the inp is now embedded in the tcpcb, and
because TCP is the most frequent accessor of this state, it made sense to
move this from the socket buffer flags to the tcpcb. Because we now need
reliable access to the tcbcb, we take a ref on the inp when creating a tx
ktls session.

While here, I noticed that rack/bbr were incorrectly implementing
tfb_hwtls_change(), and applying the change to all pending sends,
when it should apply only to future sends.

This change reduces spurious calls to  ktls_disable_ifnet() by 95% or so
in a Netflix CDN environment.

Reviewed by: markj, rrs
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D38380
2023-02-09 12:44:44 -05:00
Mark Johnston
e6aba98fdd tcp: Remove a couple of always-false checks from syncache_socket()
syncache_socket() does some unnecessary work: before connecting the PCB,
it saves the local address on the stack and restores it before freeing
the PCB in case of an error.  However:
- There's no need to restore the old address in the error case.
- The PCB's local address will always be equal to that of the syncache
  entry anyway.

So just remove this unnecessary code, which appears to date from the
introduction of the syncache 20+ years ago.

No functional change intended.

Reviewed by:	tuexen, glebius
Sponsored by:	Klara, Inc.
Sponsored by:	Modirum MDPay
Differential Revision:	https://reviews.freebsd.org/D38391
2023-02-07 15:22:54 -05:00
John Baldwin
9aff05bb1c tcp_var.h: Fix spelling of independent in comment 2023-02-07 12:01:29 -08:00
Gleb Smirnoff
220d892129 inpcb: immediately return matching pcb on lookup
This saves a lot of CPU cycles if you got large connection table.

The code removed originates from 413628a7e3, a very large changeset.
Discussed that with Bjoern, Jamie we can't recover why would we ever
have identical 4-tuples in the hash, even in the presence of jails.
Bjoern did a test that confirms that it is impossible to allocate an
identical connection from a jail to a host. Code review also confirms
that system shouldn't allow for such connections to exist.

With a lack of proper test suite we decided to take a risk and go
forward with removing that code.

Reviewed by:		gallatin, bz, markj
Differential Revision:	https://reviews.freebsd.org/D38015
2023-02-07 09:21:52 -08:00
Gleb Smirnoff
dfc4d218ce tcp: use straight in_pcbconnect() in tcp_connect()
This brings tcp_connect() par with tcp6_connect().  The code removed
now is a remnant of "truncating old TIME-WAIT" removed back in 2004
in c94c54e4df.

Reviewed by:		markj, tuexen
Differential Revision:	https://reviews.freebsd.org/D38405
2023-02-07 09:21:52 -08:00
Maxim Konovalov
fb8f221aeb db_printf: fix a typo
PR:	269377
2023-02-06 20:41:05 +00:00
Gleb Smirnoff
09d3671b0e inpcb: better document INP_ANONPORT flag
The name is pretty self explaining, but it is unclear why we need this
flag, as kernel only sets it and never reads.
2023-02-03 11:33:36 -08:00
Gleb Smirnoff
9e46ff4d4c netinet: don't return conflicting inpcb in in_pcbconnect_setup()
Last time this inpcb was actually used was in tcp_connect()
before c94c54e4df.
2023-02-03 11:33:36 -08:00
Gleb Smirnoff
a9afe0864f tcp: bring comment for tcp_connect() up to date
We no longer use in_pcbbind() since 2510235150.  The comment about
truncating old TIME-WAIT describes a code that had been removed back
in 2004 in c94c54e4df.
2023-02-03 11:33:36 -08:00
Gleb Smirnoff
a9d22cce10 inpcb: use family specific sockaddr argument for connect functions
Do the cast from sockaddr to either IPv4 or IPv6 sockaddr in the
protocol's pr_connect method and from there on go down the call
stack with family specific argument.

Reviewed by:		markj
Differential revision:	https://reviews.freebsd.org/D38356
2023-02-03 11:33:36 -08:00
Gleb Smirnoff
3d76be28ec netinet6: require network epoch for in6_pcbconnect()
This removes recursive epoch entry in the syncache case.  Fixes
unprotected access to V_in6_ifaddrhead in in6_pcbladdr(), as
well as access to prison IP address lists. It also matches what
IPv4 in_pcbconnect() does.

Reviewed by:		markj
Differential revision:	https://reviews.freebsd.org/D38355
2023-02-03 11:33:36 -08:00
Gleb Smirnoff
221b9e3d06 inpcb: merge two versions of in6_pcbconnect() into one
No functional change.

Reviewed by:		markj
Differential revision:	https://reviews.freebsd.org/D38354
2023-02-03 11:33:35 -08:00
Gleb Smirnoff
76f1499ff5 tcp: retire net.inet.tcp.tcp_require_unique_port
It was a safe belt just in case if the new port allocation
behaviour introduced in 2510235150 would cause a problem.

Reviewed by:		markj, rscheff, tuexen
Differential revision:	https://reviews.freebsd.org/D38353
2023-02-03 11:33:35 -08:00
Mark Johnston
2589ec0f36 pcb: Move an assignment into in_pcbdisconnect()
All callers of in_pcbdisconnect() clear the local address, so let's just
do that in the function itself.

Note that the inp's local address is not a parameter to the inp hash
functions.  No functional change intended.

Reviewed by:	glebius
MFC after:	2 weeks
Sponsored by:	Klara, Inc.
Sponsored by:	Modirum MDPay
Differential Revision:	https://reviews.freebsd.org/D38362
2023-02-03 11:48:25 -05:00
Mark Johnston
b0ccf53f24 inpcb: Assert against wildcard addrs in in_pcblookup_hash_locked()
No functional change intended.

Reviewed by:	glebius
MFC after:	1 week
Sponsored by:	Klara, Inc.
Sponsored by:	Modirum MDPay
Differential Revision:	https://reviews.freebsd.org/D38361
2023-02-03 11:48:25 -05:00
Mark Johnston
675e2618ae inpcb: Deduplicate some assertions
It makes more sense to check lookupflags in the function which actually
uses SMR.  No functional change intended.

Reviewed by:	glebius
MFC after:	1 week
Sponsored by:	Klara, Inc.
Sponsored by:	Modirum MDPay
Differential Revision:	https://reviews.freebsd.org/D38359
2023-02-03 11:48:25 -05:00