Commit Graph

7767 Commits

Author SHA1 Message Date
Warner Losh
78d146160d sys: Remove $FreeBSD$: one-line bare tag
Remove /^\s*\$FreeBSD\$$\n/
2023-08-16 11:55:17 -06:00
Warner Losh
9e78921256 sys: Remove $FreeBSD$: two-line nroff pattern
Remove /^\.\\"\n\.\\"\s*\$FreeBSD\$$\n/
2023-08-16 11:55:06 -06:00
Warner Losh
685dc743dc sys: Remove $FreeBSD$: one-line .c pattern
Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/
2023-08-16 11:54:36 -06:00
Warner Losh
dfc016587a sys: Remove $FreeBSD$: two-line .c pattern
Remove /^#include\s+<sys/cdefs.h>.*$\n\s+__FBSDID\("\$FreeBSD\$"\);\n/
2023-08-16 11:54:30 -06:00
Warner Losh
71625ec9ad sys: Remove $FreeBSD$: one-line .c comment pattern
Remove /^/[*/]\s*\$FreeBSD\$.*\n/
2023-08-16 11:54:24 -06:00
Warner Losh
2ff63af9b8 sys: Remove $FreeBSD$: one-line .h pattern
Remove /^\s*\*+\s*\$FreeBSD\$.*$\n/
2023-08-16 11:54:18 -06:00
Warner Losh
95ee2897e9 sys: Remove $FreeBSD$: two-line .h pattern
Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/
2023-08-16 11:54:11 -06:00
Michael Tuexen
749a7fb588 sctp: cleanup
Do not put a variable in the stcb for passing it to a function.
Just use a parameter of the function. No functional change intended.

MFC after:	1 week
2023-08-14 12:27:39 +02:00
Michael Tuexen
e8eb0b7134 sctp: add an assert
This enforces a condition mentioned in a comment.

MFC after:	1 week
2023-08-13 22:47:43 +02:00
Michael Tuexen
6cb8b3b5cd sctp: use consistent names for locking macros
While there, add also a macro for an assert. Will be used shortly.
No functional change intended.

MFC after:	1 week
2023-08-13 22:35:53 +02:00
Michael Tuexen
85e5480df9 sctp: another cleanup
No functional change intended.

MFC after:	1 week
2023-08-09 04:17:52 +02:00
Michael Tuexen
9ade2745db sctp: remove duplicate code
No functional change intended.

MFC after:	1 week
2023-08-08 13:05:39 +02:00
Andrey V. Elsukov
600bf006d3 carp: delete interface routes on link loss.
Obtained from:	Yandex LLC
MFC after:	10 days
Sponsored by:	Yandex LLC
Differential Revision: https://reviews.freebsd.org/D41290
2023-08-08 13:22:10 +03:00
Michael Tuexen
10b2b30670 sctp: improve consistency
MFC after:	1 week
2023-08-05 11:29:23 +02:00
Michael Tuexen
e3771cc034 sctp: remove redundant check
This is already checked by the caller.

MFC after:	1 week
2023-08-05 11:26:45 +02:00
Michael Tuexen
efb04fb404 sctp: improve consistency of acc and ccc handling in snd buffer
Don't clear the counters for the socket snd buffer when
shutdown(..., SHUT_WR) or shutdown(..., SHUT_RDWR) is called.
This was causing the system to panic() when SCTP pf tests were
running.

Reported by:	dchagin, kp
MFC after:	1 week
2023-08-04 08:32:25 +02:00
Michael Tuexen
c620788150 sctp: keep sb_acc and sb_ccc in sync
PR:		260116
MFC after:	1 week
2023-07-28 15:16:23 +02:00
Michael Tuexen
b279e84a47 sctp: improve consistency
This is simplifying a patch to address PR 260116.

PR:		260116
MFC after:	1 week
2023-07-28 14:36:11 +02:00
Kristof Provost
680ad06f90 mroute: avoid calling if_allmulti with the lock held
Avoid locking issues when if_allmulti() calls the driver's if_ioctl,
because that may acquire sleepable locks (while we hold a non-sleepable
rwlock).

Fortunately there's no pressing need to hold the mroute lock while we
do this, so we can postpone the call slightly, until after we've
released the lock.

This avoids the following WITNESS warning (with iflib drivers):

	lock order reversal: (sleepable after non-sleepable)
	 1st 0xffffffff82f64960 IPv4 multicast forwarding (IPv4 multicast forwarding, rw) @ /usr/src/sys/netinet/ip_mroute.c:1050
	 2nd 0xfffff8000480f180 iflib ctx lock (iflib ctx lock, sx) @ /usr/src/sys/net/iflib.c:4525
	lock order IPv4 multicast forwarding -> iflib ctx lock attempted at:
	#0 0xffffffff80bbd6ce at witness_checkorder+0xbbe
	#1 0xffffffff80b56d10 at _sx_xlock+0x60
	#2 0xffffffff80c9ce5c at iflib_if_ioctl+0x2dc
	#3 0xffffffff80c7c395 at if_setflag+0xe5
	#4 0xffffffff82f60a0e at del_vif_locked+0x9e
	#5 0xffffffff82f5f0d5 at X_ip_mrouter_set+0x265
	#6 0xffffffff80bfd402 at sosetopt+0xc2
	#7 0xffffffff80c02105 at kern_setsockopt+0xa5
	#8 0xffffffff80c02054 at sys_setsockopt+0x24
	#9 0xffffffff81046be8 at amd64_syscall+0x138
	#10 0xffffffff8101930b at fast_syscall_common+0xf8

See also:	https://redmine.pfsense.org/issues/12079
Reviewed by:	mjg
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D41209
2023-07-28 11:32:39 +02:00
Michael Tuexen
cf32543fa4 tcp: document that conditional fields in tcpcb should be at the end
Reviewed by: 	rscheff, Peter Lei
Sponsored by:	Netflix, Inc.
2023-07-27 09:02:19 +02:00
Gleb Smirnoff
e3ba0d6add inpcb: do not copy so_options into inp_flags2
Since f71cb9f748 socket stays connnected with inpcb through latter's
lifetime and there is no reason to complicate things and copy these
flags.

Reviewed by:		markj
Differential Revision:	https://reviews.freebsd.org/D41198
2023-07-26 20:35:42 -07:00
Gleb Smirnoff
a43e7a96b6 inpcb: use internal flag to mark pcbs that are inserted into lbgroup
Using INP_REUSEPORT_LB is unsafe, as it is basically a copy of socket's
SO_REUSEPORT_LB flag, which can be cleared by userland after bind().

Reviewed by:		markj
Reported by:		syzbot+e7d2e451f89fb444319b@syzkaller.appspotmail.com
Differential Revision:	https://reviews.freebsd.org/D41197
2023-07-26 20:35:30 -07:00
Michael Tuexen
ab65c64bc4 tcp: fix handling of <RST,ACK> segments in SYN-RCVD for RACK and BBR
This deals with TCP endpoints in the SYN-RCVD state coming from the
SYN-SENT state.

Reviewed by:		rscheff
MFC after:		3 days
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D41203
2023-07-26 16:22:13 +02:00
Richard Scheffenegger
b352ef58c2 tcp: Handle <RST,ACK> in SYN-RCVD
Patch base stack to correctly handle the RST bit independently
of other header flags per TCP RFC.

MFC after: 1 week
Reviewed By: tuexen, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D40982
2023-07-27 00:42:26 +02:00
Marius Strobl
e82d7b2952 gif(4): Revert in{,6}_gif_output() misalignment handling
The code added in c89c8a1029 in order
to compensate possible misalignment caused by prepending the IP4/6
header with an EtherIP one got broken at some point by a rewrite of
gif(4). For better or worse, 8018ac153f
relaxed the alignment of struct ip from 32 bit to 16 bit, though. As
a result, a 16 bit offset of the IPv4 header induced by the addition
of the 16 bit EtherIP one no longer is a problem in the first place.
The alignment of struct ip6_hdr currently is even only 8 bit, making
it even less problematic with regards to possible misalignment.
Thus, remove the code for handling misalignment in in{,6}_gif_output()
altogether again.
While at it, replace the 3 bcopy(9) calls in gif(4) with memcpy(9) as
there's no need to handle overlap here.
2023-07-26 13:14:22 +02:00
Shivank Garg
215bab7924 mac_ipacl: new MAC policy module to limit jail/vnet IP configuration
The mac_ipacl policy module enables fine-grained control over IP address
configuration within VNET jails from the base system.
It allows the root user to define rules governing IP addresses for
jails and their interfaces using the sysctl interface.

Requested by:	multiple
Sponsored by:	Google, Inc. (GSoC 2019)
MFC after:	2 months
Reviewed by:	bz, dch (both earlier versions)
Differential Revision: https://reviews.freebsd.org/D20967
2023-07-26 00:07:57 +00:00
Michael Tuexen
52640d6174 sctp: update zero checksum support
Implement support for the error detection method identifier.
MFC after:	2 weeks
2023-07-23 06:41:32 +02:00
Konstantin Belousov
bc310a95c5 ip output: ensure that mbufs are mapped if ipsec is enabled
Ipsec needs access to packet headers to determine if a policy is
applicable. It seems that typically IP headers are mapped, but the code
is arguably needs to check this before blindly accessing them. Then,
operations like m_unshare() and m_makespace() are not yet ready for
unmapped mbufs.

Ensure that the packet is mapped before calling into IPSEC_OUTPUT().

PR:	272616
Reviewed by:	jhb, markj
Sponsored by:	NVidia networking
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D41112
2023-07-21 21:51:13 +03:00
Michael Tuexen
e4a873bf10 tcp: improve layout of struct tcpcb
Put optional fields at the end to minimize run time problems in
case CC modules are build from within its directory.

Reviewed by:		cc, gallatin, glebius, imp
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D41059
2023-07-19 14:47:36 +02:00
Warner Losh
d3152ab23e tcbpcb: Always define t_osd
Always define t_osd. congestion control modules access it
unconditionally. This fixes the build.

However, this is, at best, a temporary band-aide until the
larger issues are sorted.

Sponsored by:		Netflix
2023-07-17 11:22:45 -06:00
Doug Moore
8579bf27d7 inline_fls - HAVE_INLINE_FLSLL is always true
flsll is inlined, or replaced by a smart binary search implementation,
on all architectures, and HAVE_INLINE_FLSLL is #defined always. So
remove code the the #undefined case.

Reviewed by:	mhorne, tuexen
Differential Revision:	https://reviews.freebsd.org/D40704
2023-07-06 15:27:31 -05:00
Michael Tuexen
2176c9ab71 dtrace: improve siftr probe
Improve consistency of the field names with tcpsinfo_t:
* Use mss instead of max_seg_size.
* Use lport and rport instead of tcp_localport and tcp_foreignport.

Use t_flags instead of flags to improve consistency with t_flags2.

Add laddr and raddr, since the addresses were missing when compared
to the output of siftr.

Reviewed by:		cc
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D40834
2023-07-02 03:08:51 +02:00
Michael Tuexen
cd9da8d072 siftr: unbreak dtrace support
This patch adds back some fields needed by the siftr probe, which were
removed in
https://cgit.freebsd.org/src/commit/?id=aa61cff4249c92689d7a1f15db49d65d082184cb

With this fix, you can run dtrace scripts again when the siftr
module is loaded. And the siftr probe works again.

Reviewed by:		rscheff
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D40826
2023-07-01 09:50:54 +02:00
Alexander V. Chernikov
bb06a80cf6 netinet[6]: make in[6]_control use ucred instead of td.
Reviewed by:	markj, zlei
Differential Revision: https://reviews.freebsd.org/D40793
MFC after:	2 weeks
2023-07-01 06:52:24 +00:00
Michael Tuexen
dc2d26df43 siftr: provide dtrace with the correct pointer to data
This fixes a bug which was introduced in the commit
https://svnweb.freebsd.org/changeset/base/282276

Reviewed by:		cc, rscheff
MFC after:		3 days
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D40806
2023-06-30 22:03:04 +02:00
Mark Johnston
de0a2eb2ef tcp: Disallow connecting a disconnected socket
Currently nothing prevents tcp_usr_connect() from attempting to connect
when the socket has been disconnected.  At the moment, doing so triggers
an assertion in in_pcbconnect() because inp_faddr is not unspecified.  I
believe this may have been caught in the past by TIMEWAIT checks, but
those are now removed.

Check for additional socket states in tcp_connect().

Reported by:	syzbot+f0f7871ec5397602b446@syzkaller.appspotmail.com
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D40579
2023-06-23 10:00:52 -04:00
Michael Tuexen
02b885b09d tcp: fix TCP MD5 computation for the BBR and RACK stack
PR:			253096
Reviewed by:		cc, rscheff
MFC after:		3 days
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D40597
2023-06-21 22:54:33 +02:00
Richard Scheffenegger
04682968c3 tcp: expose AccECN mode and TCP FastOpen (TFO) in TCPI
Reviewed By:		tuexen, #transport
Sponsored by:		NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D40621
2023-06-20 23:48:56 +02:00
Richard Scheffenegger
7ea8d02798 Update various sys/netinet source files to conform with the style(9)
guide on how to label FALLTHOUGH in switch statements.

No functional chance.

Reviewed By:		tuexen, cc, #transport
Sponsored by:		NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D40622
2023-06-20 23:23:19 +02:00
Gleb Smirnoff
6eb2dbfa63 tcp: add missing static keywords
Without them compilation with -O0 would produce kernel modules
that depend on symbol that doesn't exist.
2023-06-14 14:21:28 -07:00
Randall Stewart
e022f2b013 tcp: Rack fixes and misc updates
So over the past few weeks we have found several bugs and updated hybrid pacing to have
more data in the low-level logging. We have also moved more of the BBlogs to "verbose" mode
so that we don't generate a lot of the debug data unless you put verbose/debug on.
There were a couple of notable bugs, one being the incorrect passing of percentage
for reduction to timely and the other the incorrect use of 20% timely Beta instead of
80%. This also expands a simply idea to be able to pace a cwnd (fillcw) as an alternate
pacing mechanism combining that with timely reduction/increase.

Reviewed by: tuexen
Sponsored by: Netflix Inc
Differential Revision:https://reviews.freebsd.org/D40391
2023-06-09 10:27:08 -04:00
Richard Scheffenegger
eb5bfdd065 tcp: Add and update cubic module variable names
Prepare the cubic congestion control module to better align with
the specifications in RFC8312bis.

Rename a few cubic state variables to the variable names found in
the RFC8312bis specification. This makes the code more understandable
for someone reading the RFC and the code. It also makes the variable
naming convention more uniform. Add some variables needed subsequently.

No functional change.

Submitted By:		Bhaskar Pardeshi, VMware Inc.
Reviewed By:		tuexen, #transport
Sponsored by:		NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D40436
2023-06-06 23:09:28 +02:00
Richard Scheffenegger
43b117f88f tcp: make the maximum number of retransmissions tunable per VNET
Both Windows (TcpMaxDataRetransmissions) and Linux (tcp_retries2)
allow to restrict the maximum number of consecutive timer based
retransmissions. Add that same capability on a per-VNet basis to
FreeBSD.

Reviewed By:		cc, tuexen, #transport
Sponsored by:		NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D40424
2023-06-06 22:58:54 +02:00
Kristof Provost
185c1cddd7 netinet: re-read IP length after PFIL hook
The pfil hook may modify the packet, so before we check its length (to
decide if it needs to be fragmented or not) we should re-read that
length.

This is most likely to happen when pf is reassembling packets. In that
scenario we'd receive the last fragment, which is likely to be a short
packet, pf would reassemble it (likely exceeding the interface MTU) and
then we'd transmit it without fragmenting, because we're comparing the
MTU to the length of the last fragment, not the fully reassembled
packet.

See also:	https://redmine.pfsense.org/issues/14396
Reviewed by:	cy
MFC after:	3 weeks
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D40395
2023-06-06 10:01:03 +02:00
Michael Tuexen
d66540e829 tcp: improve sending of TTL/hoplimit and DSCP
Ensure that a user specified value of TTL/hoplimit and DSCP is
used when sending packets.

Reviewed by:		cc, rscheff
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D40423
2023-06-05 18:43:06 +02:00
Cheng Cui
a3aa6f6529
cc_cubic: Use units of micro seconds (usecs) instead of ticks in rtt.
This improves TCP friendly cwnd in cases of low latency high drop rate
networks. Tests show +42% and +37% better performance in 1Gpbs and 10Gbps
cases.

Reported by: Bhaskar Pardeshi from VMware.
Reviewed By: rscheff, tuexen
Approved by: rscheff (mentor), tuexen (mentor)
2023-06-01 07:55:01 -04:00
Alexander V. Chernikov
e32221a15f netinet6: make IPv6 fragment TTL per-VNET configurable.
Having it configurable adds more flexibility, especially
 for the systems with low amount of memory.
Additionally, it allows to speedup frag6/ tests execution.

Reviewed by:	kp, markj, bz
Differential Revision:	https://reviews.freebsd.org/D35755
MFC after:	2 weeks
2023-06-01 12:04:49 +00:00
Jonathan T. Looney
4f2cc73f34 tcp: Refactor tcp_get_srtt()
Refactor tcp_get_srtt() into its two component operations: unit
conversion and shifting. No functional change is intended.

Reviewed by:	cc, tuexen
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D40304
2023-05-31 19:16:20 +00:00
Doug Rabson
5ab151574c netinet*: Fix redirects for connections from localhost
Redirect rules use PFIL_IN and PFIL_OUT events to allow packet filter
rules to change the destination address and port for a connection.
Typically, the rule triggers on an input event when a packet is received
by a router and the destination address and/or port is changed to
implement the redirect. When a reply packet on this connection is output
to the network, the rule triggers again, reversing the modification.

When the connection is initiated on the same host as the packet filter,
it is initially output via lo0 which queues it for input processing.
This causes an input event on the lo0 interface, allowing redirect
processing to rewrite the destination and create state for the
connection. However, when the reply is received, no corresponding output
event is generated; instead, the packet is delivered to the higher level
protocol (e.g. tcp or udp) without reversing the redirect, the reply is
not matched to the connection and the packet is dropped (for tcp, a
connection reset is also sent).

This commit fixes the problem by adding a second packet filter call in
the input path. The second call happens right before the handoff to
higher level processing and provides the missing output event to allow
the redirect's reply processing to perform its rewrite. This extra
processing is disabled by default and can be enabled using pfilctl:

	pfilctl link -o pf:default-out inet-local
	pfilctl link -o pf:default-out6 inet6-local

PR:		268717
Reviewed-by:	kp, melifaro
MFC-after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D40256
2023-05-31 11:11:05 +01:00
Mark Johnston
a306ed50ec inpcb: Restore missing validation of local addresses for jailed sockets
When looking up a listening socket, the SMR-protected lookup routine may
return a jailed socket with no local address.  This happens when using
classic jails with more than one IP address; in a single-IP classic
jail, a bound socket's local address is always rewritten to be that of
the jail.

After commit 7b92493ab1, the lookup path failed to check whether the
jail corresponding to a matched wildcard socket actually owns the
address, and would return the match regardless.  Restore the omitted
checks.

Fixes:		7b92493ab1 ("inpcb: Avoid inp_cred dereferences in SMR-protected lookup")
Reported by:	peter
Reviewed by:	bz
Differential Revision:	https://reviews.freebsd.org/D40268
2023-05-30 15:15:48 -04:00