Commit Graph

7600 Commits

Author SHA1 Message Date
Gleb Smirnoff
09d3671b0e inpcb: better document INP_ANONPORT flag
The name is pretty self explaining, but it is unclear why we need this
flag, as kernel only sets it and never reads.
2023-02-03 11:33:36 -08:00
Gleb Smirnoff
9e46ff4d4c netinet: don't return conflicting inpcb in in_pcbconnect_setup()
Last time this inpcb was actually used was in tcp_connect()
before c94c54e4df.
2023-02-03 11:33:36 -08:00
Gleb Smirnoff
a9afe0864f tcp: bring comment for tcp_connect() up to date
We no longer use in_pcbbind() since 2510235150.  The comment about
truncating old TIME-WAIT describes a code that had been removed back
in 2004 in c94c54e4df.
2023-02-03 11:33:36 -08:00
Gleb Smirnoff
a9d22cce10 inpcb: use family specific sockaddr argument for connect functions
Do the cast from sockaddr to either IPv4 or IPv6 sockaddr in the
protocol's pr_connect method and from there on go down the call
stack with family specific argument.

Reviewed by:		markj
Differential revision:	https://reviews.freebsd.org/D38356
2023-02-03 11:33:36 -08:00
Gleb Smirnoff
3d76be28ec netinet6: require network epoch for in6_pcbconnect()
This removes recursive epoch entry in the syncache case.  Fixes
unprotected access to V_in6_ifaddrhead in in6_pcbladdr(), as
well as access to prison IP address lists. It also matches what
IPv4 in_pcbconnect() does.

Reviewed by:		markj
Differential revision:	https://reviews.freebsd.org/D38355
2023-02-03 11:33:36 -08:00
Gleb Smirnoff
221b9e3d06 inpcb: merge two versions of in6_pcbconnect() into one
No functional change.

Reviewed by:		markj
Differential revision:	https://reviews.freebsd.org/D38354
2023-02-03 11:33:35 -08:00
Gleb Smirnoff
76f1499ff5 tcp: retire net.inet.tcp.tcp_require_unique_port
It was a safe belt just in case if the new port allocation
behaviour introduced in 2510235150 would cause a problem.

Reviewed by:		markj, rscheff, tuexen
Differential revision:	https://reviews.freebsd.org/D38353
2023-02-03 11:33:35 -08:00
Mark Johnston
2589ec0f36 pcb: Move an assignment into in_pcbdisconnect()
All callers of in_pcbdisconnect() clear the local address, so let's just
do that in the function itself.

Note that the inp's local address is not a parameter to the inp hash
functions.  No functional change intended.

Reviewed by:	glebius
MFC after:	2 weeks
Sponsored by:	Klara, Inc.
Sponsored by:	Modirum MDPay
Differential Revision:	https://reviews.freebsd.org/D38362
2023-02-03 11:48:25 -05:00
Mark Johnston
b0ccf53f24 inpcb: Assert against wildcard addrs in in_pcblookup_hash_locked()
No functional change intended.

Reviewed by:	glebius
MFC after:	1 week
Sponsored by:	Klara, Inc.
Sponsored by:	Modirum MDPay
Differential Revision:	https://reviews.freebsd.org/D38361
2023-02-03 11:48:25 -05:00
Mark Johnston
675e2618ae inpcb: Deduplicate some assertions
It makes more sense to check lookupflags in the function which actually
uses SMR.  No functional change intended.

Reviewed by:	glebius
MFC after:	1 week
Sponsored by:	Klara, Inc.
Sponsored by:	Modirum MDPay
Differential Revision:	https://reviews.freebsd.org/D38359
2023-02-03 11:48:25 -05:00
Michael Tuexen
7b2f1a7fe9 sctp: improve delivery of stream reset notifications
Two functions are not called via sctp_ulp_notify() and therefore
need additional checks when being called.

Reported by:	syzbot+eb888d3a5a6c54413de5@syzkaller.appspotmail.com
MFC after:	3 days
2023-02-02 14:46:10 +01:00
Gleb Smirnoff
5ebea466dc inpcb: add myself to the copyright notice
for the SMR synchronization in late 2021 and following cleanups
2023-02-01 09:39:25 -08:00
Justin Hibbits
3d0d5b21c9 IfAPI: Explicitly include <net/if_private.h> in netstack
Summary:
In preparation of making if_t completely opaque outside of the netstack,
explicitly include the header.  <net/if_var.h> will stop including the
header in the future.

Sponsored by:	Juniper Networks, Inc.
Reviewed by:	glebius, melifaro
Differential Revision: https://reviews.freebsd.org/D38200
2023-01-31 15:02:16 -05:00
Boris Lytochkin
ee49c5d33d carp: turn net.inet.carp.allow into a RW tunable
Currently CARP starts announcing its state when initialised, regardless
of the state of the other services provided by the server.
As a result, the device can become master while still loading the
firewall ruleset or initialising long-starting service.

This change adds the way to request delayed CARP start by setting the
  net.inet.carp.allow=0 in the loader.conf.

Differential Revision: https://reviews.freebsd.org/D38167
MFC after:	2 weeks
2023-01-30 11:23:53 +00:00
Michael Tuexen
e2d14a04c5 tcp: improve error handling of net.inet.tcp.udp_tunneling_port
In case the new port can't be set, set the port to 0.

MFC after:	3 days
Sponsored by:	Netflix, Inc.
2023-01-26 22:55:22 +01:00
Gleb Smirnoff
d3acb974b4 tcp: protect TCP over UDP configuration with a lock
The sysctl modifies global sockets without any locks.  The removed
comment suggests that previously it relied on a lock that doesn't
exist today.
2023-01-26 10:16:32 -08:00
Richard Scheffenegger
18b83b626a tcp: reduce the size of t_rttupdated in tcpcb
During tcp session start, various mechanisms need to
track a few initial RTTs before becoming active.
Prevent overflows of the corresponding tracking counter
and reduce the size of tcpcb simultaneously.

Reviewed By:		#transport, tuexen, guest-ccui
Sponsored by:		NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D21117
2023-01-26 18:08:00 +01:00
Gordon Bergling
fa7de6dcb9 ip_gre: Fix a common typo in source code comments
- s/addres/address/

MFC after:	3 days
2023-01-19 14:13:02 +01:00
Gordon Bergling
73e994a998 extra_tcp_stacks: Fix a common typo in source code comments
- s/orginal/original/

MFC after:	3 days
2023-01-19 14:11:00 +01:00
Hans Petter Selasky
e0d8add4af tcp_lro: Fix for undefined behaviour.
Make sure the size of the raw[] array in the lro_address union is
correctly set at compile time, so that static code analysis tools
do not report undefined behaviour.

MFC after:	1 week
Sponsored by:	NVIDIA Networking
2023-01-13 11:18:19 +01:00
Gordon Bergling
432a398d86 tcp_rack(4): Fix a typo in a source code comment
- s/postion/position/

MFC after:	3 days
2023-01-11 12:02:25 +01:00
Gordon Bergling
d68f154205 tcp_hpts: Fix a typo in a source code comment
- s/subract/subtract/

MFC after:	3 days
2023-01-11 11:33:29 +01:00
Andrew Gallatin
8ea4182995 tcp: Build RACK and BBR stacks as a part of LINT
When RACK and BBR were added to the kernel, they were put
behind 'WITH_EXTRA_TCP_STACKS=1'.   Unfortunately that was
never added to any NOTES file, so RACK & BBR were not compiled
with the various LINT-NOINET, LINT-NOINET6, and LINT-NOIP kernels.
This lead to the stacks sometimes being broken.

This change:

- Fixes RACK so that it compiles with the various LINT-NO* kernels
- Adds WITH_EXTRA_TCP_STACKS=1 to all NOTES kernels so that
   RACK and BBR are compile tested regularly

Sponsored by: Netflix
Reviewed by: rrs
Differential Revision: https://reviews.freebsd.org/D37903
2023-01-10 16:16:43 -05:00
Gleb Smirnoff
aab8c844b9 tcp/ipfw: fix "ipfw fwd localaddr,port"
The ipfw(4) feature of forwarding to local address without modifying
a packet was broken.  The first lookup needs always be a non-wildcard
one, cause its goal is to find an already existing socket.  Otherwise
a local wildcard listener with the same port number may match resulting
in the connection being forwared to wrong port.

Reported by:	Pavel Polyakov <bsd kobyla.org>
Fixes:		d88eb4654f
2023-01-05 14:34:50 -08:00
Randall Stewart
26bdd35c39 rack and bbr not loading if TCP_RATELIMIT is not configured.
So it turns out that rack and bbr still will not load without TCP_RATELIMIT. This needs
to be fixed and lets also at the same time bring tcp_ratelimit up to date where we allow
the transports to set a divisor (though still having a default path with the default
divisor of 1000) for setting the burst size.

Reviewed by: tuexen, gallatin
Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D37954
2023-01-05 11:59:52 -05:00
Cheng Cui
57cc27a332 BBLog: improve sysctl variables
Correct the format in sysctl net.inet.tcp.bb.disable_all and
sysctl net.inet.tcp.bb.log_auto_all.
Correct the format and the description in
net.inet.tcp.bb.log_auto_mode.

Reviewed by:		rscheff, tuexen
MFC after:		1 week
Sponsored by:		NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D37776
2022-12-24 22:10:31 +01:00
Randall Stewart
2e2a1c3139 Opps take out a stray left behind printf that was
for debugging.. Sorry.
2022-12-14 16:11:39 -05:00
Randall Stewart
e2e088ae86 Rack cannot be loaded without cc_newreno compiled into the kernel.
Right now rack will fail to load due to its hack in accessing symbol names
in cc_newreno. This was fine when newreno was always compiled into the
kernel but now ... not so much. Instead lets fix up rack to use the socket
option queries to get the information it wants and set the parameters. We
also fix the CC parameter so they are always settable.

Reviewed by: tuexen
Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D37622
2022-12-14 15:37:48 -05:00
Andrew Gallatin
c4a4b2633d allocate inpcb aligned to cachelines
The inpcb struct is one of the most heavily utilized in the kernel
on a busy network server.  By aligning it to a cacheline
boundary, we can ensure that closely related fields in the inpcb
and tcbcb can be predictably located on the same cacheline.  rrs
has already done a lot of this work to put related fields on the
same line for the tcbcb.

In combination with a forthcoming patch to align the start of the tcpcb,
we see a roughly 3% reduction in CPU use on a busy web server serving
traffic over roughly 50,000 TCP connections.

Reviewed by: glebius, markj, tuexen
Differential Revision: https://reviews.freebsd.org/D37687
Sponsored by: Netflix
2022-12-14 14:19:35 -05:00
Gleb Smirnoff
eaabc93764 tcp: retire TCPDEBUG
This subsystem is superseded by modern debugging facilities,
e.g. DTrace probes and TCP black box logging.

We intentionally leave SO_DEBUG in place, as many utilities may
set it on a socket.  Also the tcp::debug DTrace probes look at
this flag on a socket.

Reviewed by:		gnn, tuexen
Discussed with:		rscheff, rrs, jtl
Differential revision:	https://reviews.freebsd.org/D37694
2022-12-14 09:54:06 -08:00
Mateusz Guzik
e6fc01f6be tcp: whack the stale declaration of rack_timer_stop
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2022-12-14 08:48:52 +00:00
Gleb Smirnoff
1b5895c624 tcp: remove a 4.4BSD relic
The actual code to modify this counter was disabled in 2c37256e5a
and later removed in d0390e0570.
2022-12-13 20:21:45 -08:00
Gleb Smirnoff
5050df3f4a tcp: fix counter leak for SYN_RCVD state when syncache_socket() fails
The SYN_RCVD state count is tricky here due to default code path and TFO
being so different.  In the default case the count is incremented when a
syncache entry is added to the the database in syncache_insert().  Later
when connection transitions from syncache entry to a socket in
syncache_expand(), this counter is inherited by the tcpcb.  If socket or
tcpcb allocation failed in syncache_socket() failed the syncache_expand()
is responsible for decrement.  In the TFO case the syncache entry is not
inserted into database and count of SYN_RCVD is first incremented in the
syncache_tfo_expand() after successful socket allocation.  Thus, inside
syncache_socket() we can't tell whether we need to decrement in a case of
a failure or not.  The caller is responsible for this book keeping.

Fixes:	07285bb4c2
Differential revision:	https://reviews.freebsd.org/D37610
2022-12-13 19:31:05 -08:00
Gleb Smirnoff
1aed3b3430 udp: add protocol method declarations to udp_var.h
They are shared between UDP over IPv4 and over IPv6.  To prevent all
possible kernel build failures wrap them in #ifdef _SYS_PROTOSW_H_.
Prompted by feedback from jhb@ and jrtc27@ on c93db4abf4.
2022-12-07 11:51:49 -08:00
Gleb Smirnoff
32920f038a udp: inline udp_output() into udp_send() 2022-12-07 11:51:48 -08:00
Gleb Smirnoff
483fe96511 udp: embed inpcb into udpcb
See similar change to TCP e68b379244 for more context.  For UDP the
change is much simplier, though.
2022-12-07 11:51:42 -08:00
Gleb Smirnoff
0c0d8a4f7e udp: rearrange declarations in udp_var.h into user and _KERNEL halves
Bring everything that belongs to _KERNEL into single block.  Move
sub-includes to its beginning.
2022-12-07 09:55:38 -08:00
Gleb Smirnoff
294a609fc0 udp: destroy UDP and UDP-Lite inpcbinfos in single SYSUNINIT
They are created in a single SYSINIT, there is no reason to destroy
them in separate functions.
2022-12-07 09:55:38 -08:00
Gleb Smirnoff
446ccdd08e tcp: use single locked callout per tcpcb for the TCP timers
Use only one callout structure per tcpcb that is responsible for handling
all five TCP timeouts.  Use locked version of callout, of course. The
callout function tcp_timer_enter() chooses soonest timer and executes it
with lock held.  Unless the timer reports that the tcpcb has been freed,
the callout is rescheduled for next soonest timer, if there is any.

With single callout per tcpcb on connection teardown we should be able
to fully stop the callout and immediately free it, avoiding use of
callout_async_drain().  There is one gotcha here: callout_stop() can
actually touch our memory when a rare race condition happens.  See
comment above tcp_timer_stop().  Synchronous stop of the callout makes
tcp_discardcb() the single entry point for tcpcb destructor, merging the
tcp_freecb() to the end of the function.

While here, also remove lots of lingering checks in the beginning of
TCP timer functions.  With a locked callout they are unnecessary.

While here, clean unused parts of timer KPI for the pluggable TCP stacks.

While here, remove TCPDEBUG from tcp_timer.c, as this allows for more
simplification of TCP timers.  The TCPDEBUG is scheduled for removal.

Move the DTrace probes in timers to the beginning of a function, where
a tcpcb is always existing.

Discussed with:		rrs, tuexen, rscheff	(the TCP part of the diff)
Reviewed by:		hselasky, kib, mav	(the callout part)
Differential revision:	https://reviews.freebsd.org/D37321
2022-12-07 09:00:48 -08:00
Gleb Smirnoff
918fa4227d tcp: remove tcp_timer_suspend()
It was a temporary code added together with RACK to fight against
TCP timer races.
2022-12-07 09:00:48 -08:00
Gleb Smirnoff
e68b379244 tcp: embed inpcb into tcpcb
For the TCP protocol inpcb storage specify allocation size that would
provide space to most of the data a TCP connection needs, embedding
into struct tcpcb several structures, that previously were allocated
separately.

The most import one is the inpcb itself.  With embedding we can provide
strong guarantee that with a valid TCP inpcb the tcpcb is always valid
and vice versa.  Also we reduce number of allocs/frees per connection.
The embedded inpcb is placed in the beginning of the struct tcpcb,
since in_pcballoc() requires that.  However, later we may want to move
it around for cache line efficiency, and this can be done with a little
effort.  The new intotcpcb() macro is ready for such move.

The congestion algorithm data, the TCP timers and osd(9) data are
also embedded into tcpcb, and temprorary struct tcpcb_mem goes away.
There was no extra allocation here, but we went through extra pointer
every time we accessed this data.

One interesting side effect is that now TCP data is allocated from
SMR-protected zone.  Potentially this allows the TCP stacks or other
TCP related modules to utilize that for their own synchronization.

Large part of the change was done with sed script:

s/tp->ccv->/tp->t_ccv./g
s/tp->ccv/\&tp->t_ccv/g
s/tp->cc_algo/tp->t_cc/g
s/tp->t_timers->tt_/tp->tt_/g
s/CCV\(ccv, osd\)/\&CCV(ccv, t_osd)/g

Dependency side effect is that code that needs to know struct tcpcb
should also know struct inpcb, that added several <netinet/in_pcb.h>.

Differential revision:	https://reviews.freebsd.org/D37127
2022-12-07 09:00:48 -08:00
Gleb Smirnoff
0aa120d52f inpcb: allow to provide protocol specific pcb size
The protocol specific structure shall start with inpcb.

Differential revision:	https://reviews.freebsd.org/D37126
2022-12-02 14:10:55 -08:00
John Baldwin
d00c20882f udp[6]_multi_input: Don't unlock freed inp.
If udp[6]_append() returns non-zero, it is because the inp has gone
away (inpcbrele_rlocked returned 1 after running the tunnel function).

Reviewed by:	ae
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D37511
2022-11-30 14:38:51 -08:00
Michael Tuexen
bd4f986644 tcp: remove unused t_rttbest
No functional change intended.

Reviewed by:		rscheff@
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D37401
2022-11-16 11:22:13 +01:00
Michael Tuexen
9a71437621 libalias: improve handling of invalid SCTP packets
In case of a paritial chunk only pretend the result is OK if
the packet is not the last fragment and there is a valid association.

PR:		267476
MFC after:	3 days
2022-11-15 21:05:02 +01:00
Richard Scheffenegger
1a70101a87 tcp: account sent/received IP ECN markings independently
Have tcpstats (netstat -s) differentiate between received and sent
ECN-marked packets. Also account for IP ECN bits (on TCP packets)
even when the tcp session has not negotiated ECN support.

Event:			IETF 115 Hackathon
Reviewed By:		glebius, tuexen, #transport
Sponsored by:		NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D37314
2022-11-10 11:35:35 +01:00
Richard Scheffenegger
0b00b80149 ipfw: Have NAT steal the TH_RES1 bit, instead of the TH_AE bit
The NAT module use of the tcphdr.th_x2 field now collides with the
use of this TCP header flag as AccECN (AE) bit. Use the topmost
bit instead to allow negotiation of AccECN across a NAT device.

Event:			IETF 115 Hackathon
Reviewed By:		#transport, tuexen
MFC after:		3 days
Sponsored by:		NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D37300
2022-11-09 11:19:19 +01:00
Gleb Smirnoff
b40ae8c9fe tcp: fix build without INVARIANTS and VIMAGE
Lines from upcoming changes crept in and broke certain builds.

Fixes:	9eb0e8326d
2022-11-08 12:34:45 -08:00
Gleb Smirnoff
326f455625 tcp: forward declare struct tcpcb in the TCP logging header
This allows to include tcp_log_buf.h without including tcp_var.h.
2022-11-08 10:32:29 -08:00
Gleb Smirnoff
73bebcc5bd inpcb: remove TCP includes, all TCP specific code was moved 2022-11-08 10:24:40 -08:00