Commit Graph

7557 Commits

Author SHA1 Message Date
Michael Tuexen
bd4f986644 tcp: remove unused t_rttbest
No functional change intended.

Reviewed by:		rscheff@
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D37401
2022-11-16 11:22:13 +01:00
Michael Tuexen
9a71437621 libalias: improve handling of invalid SCTP packets
In case of a paritial chunk only pretend the result is OK if
the packet is not the last fragment and there is a valid association.

PR:		267476
MFC after:	3 days
2022-11-15 21:05:02 +01:00
Richard Scheffenegger
1a70101a87 tcp: account sent/received IP ECN markings independently
Have tcpstats (netstat -s) differentiate between received and sent
ECN-marked packets. Also account for IP ECN bits (on TCP packets)
even when the tcp session has not negotiated ECN support.

Event:			IETF 115 Hackathon
Reviewed By:		glebius, tuexen, #transport
Sponsored by:		NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D37314
2022-11-10 11:35:35 +01:00
Richard Scheffenegger
0b00b80149 ipfw: Have NAT steal the TH_RES1 bit, instead of the TH_AE bit
The NAT module use of the tcphdr.th_x2 field now collides with the
use of this TCP header flag as AccECN (AE) bit. Use the topmost
bit instead to allow negotiation of AccECN across a NAT device.

Event:			IETF 115 Hackathon
Reviewed By:		#transport, tuexen
MFC after:		3 days
Sponsored by:		NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D37300
2022-11-09 11:19:19 +01:00
Gleb Smirnoff
b40ae8c9fe tcp: fix build without INVARIANTS and VIMAGE
Lines from upcoming changes crept in and broke certain builds.

Fixes:	9eb0e8326d
2022-11-08 12:34:45 -08:00
Gleb Smirnoff
326f455625 tcp: forward declare struct tcpcb in the TCP logging header
This allows to include tcp_log_buf.h without including tcp_var.h.
2022-11-08 10:32:29 -08:00
Gleb Smirnoff
73bebcc5bd inpcb: remove TCP includes, all TCP specific code was moved 2022-11-08 10:24:40 -08:00
Gleb Smirnoff
8840ae2288 tcp: don't store VNET in every tcpcb, take it from the inpcbinfo
Reviewed by:		rscheff
Differential revision:	https://reviews.freebsd.org/D37125
2022-11-08 10:24:40 -08:00
Gleb Smirnoff
ab0ef9455f hpts: move inp initialization from the generic inpcb code to TCP
Differential revision:	https://reviews.freebsd.org/D37124
2022-11-08 10:24:40 -08:00
Gleb Smirnoff
9eb0e8326d tcp: provide macros to access inpcb and socket from a tcpcb
There should be no functional changes with this commit.

Reviewed by:		rscheff
Differential revision:	https://reviews.freebsd.org/D37123
2022-11-08 10:24:40 -08:00
Gleb Smirnoff
f71cb9f748 tcp: inp_socket is valid through the lifetime of a TCP inpcb
The inp_socket is cleared only in in_pcbdetach(), which for TCP is
always accompanied with inp_pcbfree().  An inpcb that went through
in_pcbfree() shall never be returned by any kind of pcb lookup.

Reviewed by:		tuexen
Differential revision:	https://reviews.freebsd.org/D37062
2022-11-08 10:24:39 -08:00
Gleb Smirnoff
ada90cb978 tcp: remove INP_DROPPED check from notify functions
These functions tcp_notify(), tcp_drop_syn_sent() and tcp_mtudisc()
are called from tcp*_ctlinput*() right after successfull
in_pcblookup*().  They shall never get a pcb that is dropped.
2022-11-08 10:24:39 -08:00
Gleb Smirnoff
f567d55f51 inpcb: don't return INP_DROPPED entries from pcb lookups
The in_pcbdrop() KPI, which is used solely by TCP, allows to remove a
pcb from hash list and mark it as dropped.  The comment suggests that
such pcb won't be returned by lookups.  Indeed, every call to
in_pcblookup*() is accompanied by a check for INP_DROPPED.  Do what
comment suggests: never return such pcbs and remove unnecessary checks.

Reviewed by:		tuexen
Differential revision:	https://reviews.freebsd.org/D37061
2022-11-08 10:24:39 -08:00
Richard Scheffenegger
dc9daa04fb tcp: allow packets to be marked as ECT1 instead of ECT0
This adds the capability for a modular congestion control
to select which variant of ECN-capable-transport it wants to use
when sending out elegible segments. As an initial CC to utilize
this, DCTCP was selected.

Event:			IETF 115 Hackathon
Reviewed By:		tuexen, #transport
Sponsored by:		NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D24869
2022-11-08 18:36:38 +01:00
Gordon Bergling
bcf8fb7f03 tcp_bbr(4): Fix a typo in a source code comment
- s/retranmitted/retransmitted/

MFC after:	3 days
2022-11-08 14:59:56 +01:00
Michael Tuexen
126f8248cc Unbreak builds having SCTP support compiled in
Including sctp_var.h requires INET to be defined if IPv4 support
is needed.
2022-11-07 08:50:51 +01:00
Michael Tuexen
f83db6441a sctp: minor changes due to upstreaming of Glebs recent changes 2022-11-06 23:06:40 +01:00
Richard Scheffenegger
37bf391d3c tcp: make tcp_packets_this_ack() only visible in kernel scope 2022-11-06 13:51:57 +01:00
Richard Scheffenegger
004bb636ca tcp: Move sysctl OIDs related to ECN to tcp_ecn.c
Keep all ECN related code in (mostly) one place.

No functional change.

Event:			IETF 115 Hackathon
Reviewed By:		tuexen, #transport
Sponsored by:		NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D37285
2022-11-06 12:38:42 +01:00
Richard Scheffenegger
b1258b7643 tcp: add conservative d.cep accounting algorithm
Accurate ECN asks to conservatively estimate, when the
ACE counter may have wrapped due to a single ACK covering a larger
number of segments. This is described in Annex A.2 of the
accurate-ecn draft.

Event:			IETF 115 Hackathon
Reviewed By:		tuexen, #transport
Sponsored by:		NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D37281
2022-11-06 12:05:22 +01:00
Richard Scheffenegger
22c81cc516 tcp: add AccECN CE packet counters to tcpinfo
Provide diagnostics information around AccECN into
the tcpinfo struct.

Event:			IETF 115 Hackathon
Reviewed By:		tuexen, #transport
Sponsored by:		NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D37280
2022-11-06 11:56:02 +01:00
Richard Scheffenegger
3708c3d370 tcp: reserve tcp_info counters for AccECN
Marking all new fields unused (__xxx).

No functional change.

Reviewed By:		tuexen, rrs, #transport
Sponsored by:		NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D37016
2022-11-04 10:18:53 +01:00
Mark Johnston
d93ec8cb13 inpcb: Allow SO_REUSEPORT_LB to be used in jails
Currently SO_REUSEPORT_LB silently does nothing when set by a jailed
process.  It is trivial to support this option in VNET jails, but it's
also useful in traditional jails.

This patch enables LB groups in jails with the following semantics:
- all PCBs in a group must belong to the same jail,
- PCB lookup prefers jailed groups to non-jailed groups

This is a straightforward extension of the semantics used for individual
listening sockets.  One pre-existing quirk of the lbgroup implementation
is that non-jailed lbgroups are searched before jailed listening
sockets; that is preserved with this change.

Discussed with:	glebius
MFC after:	1 month
Sponsored by:	Modirum MDPay
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D37029
2022-11-02 13:46:24 -04:00
Mark Johnston
a152dd8634 inpcb: Remove a PCB from its LB group upon a subsequent error
If a memory allocation failure causes bind to fail, we should take the
inpcb back out of its LB group since it's not prepared to handle
connections.

Reviewed by:	glebius
MFC after:	2 weeks
Sponsored by:	Modirum MDPay
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D37027
2022-11-02 13:46:24 -04:00
Mark Johnston
ac1750dd14 inpcb: Remove NULL checks of credential references
Some auditing of the code shows that "cred" is never non-NULL in these
functions, either because all callers pass a non-NULL reference or
because they unconditionally dereference "cred".  So, let's simplify the
code a bit and remove NULL checks.  No functional change intended.

Reviewed by:	glebius
MFC after:	1 week
Sponsored by:	Modirum MDPay
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D37025
2022-11-02 13:46:24 -04:00
Gleb Smirnoff
c348e88053 tcp: make tcp_handle_wakeup() static and robust
It is called only from tcp_input() and always has valid parameter.

Reviewed by:		rscheff, tuexen
Differential revision:	https://reviews.freebsd.org/D37115
2022-10-31 08:57:15 -07:00
Gleb Smirnoff
19acc50667 inpcb: retire suppresion of randomization of ephemeral ports
The suppresion was added in 5f311da2cc with no explanation in the
commit message of the exact problem that was fixed. In the BSDCan
2006 talk [1], slides 12 to 14, we can find that it seems that there
was some problem with the TIME_WAIT state not properly being handled
on the remote side (also FreeBSD!), and this switching off the
suppression had hidden the problem.  The rationale of the change was
that other stacks may also be buggy wrt the TIME_WAIT.

I did not find the actual problem in TIME_WAIT that the suppression
has hidden, neither a commit that would fix it.  However, since that
time we started to handle SYNs with RFC5961 instead of RFC793, see
3220a2121c.  We also now have the tcp-testsuite [2], that has full
coverage of all possible scenarios of receiving SYN in TIME_WAIT.

This effectively reverts 5f311da2cc
and 6ee79c59d2.

[1] https://www.bsdcan.org/2006/papers/ImprovingTCPIP.pdf
[2] https://github.com/freebsd-net/tcp-testsuite

Reviewed by:		rscheff
Discussed with:		rscheff, rrs, tuexen
Differential revision:	https://reviews.freebsd.org/D37042
2022-10-31 08:57:11 -07:00
Gleb Smirnoff
65a58d6390 icmp: doesn't need tcp_var.h 2022-10-31 08:44:55 -07:00
Gleb Smirnoff
f504685a7a rack/bbr: put back assertion that connection is not in TIME-WAIT
The assertion was incorrectly removed in 0d7445193a.  The leak of
a TIME-WAIT state into tfb_do_segment_nounlock method was fixed in
31bc602ff8.  The TIME-WAIT connections are processed by the main
tcp_input() always.
2022-10-31 08:30:59 -07:00
Gleb Smirnoff
77fe40cf2f netinet*: add back necessary headers
The LINT successful build was provided by the includes that SCTP
pulled in.

Fixes:	92e190f11f
2022-10-26 08:16:44 -07:00
Gleb Smirnoff
92e190f11f netinet*: remove unneeded headers from files that just declare domains 2022-10-25 11:09:23 -07:00
Gleb Smirnoff
eda633455a tcp: remove useless today lock assertion in a middle of function
It was added back in 7cfc690440, when there was a jump label
above and tcp_input() hadn't been locked all through.
2022-10-25 11:09:22 -07:00
Randall Stewart
31bc602ff8 Rack and BBR broken with the new timewait state purge.
We recently got rid of the explicit INP_TIMEWAIT state, this has caused some
minor breakage to both rack and bbr. Basically the timewait check that was
in tcp_lro.c is now gone. This means that compressed_ack and mbuf_queued
packets will arrive at TCP without going through tcp_input_with_port(). We need
to expand the check that was stripped to look at the tcp_state (t_state) and
not "LRO" packets that are in the TCPS_TIMEWAIT state.

Reviewed by: tuexen, gliebus
Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D37080
2022-10-24 15:47:29 -04:00
Richard Scheffenegger
83c1ec92e4 tcp: ECN preparations for ECN++, AccECN (tcp_respond)
tcp_respond is another function to build a tcp control packet
quickly. With ECN++ and AccECN, both the IP ECN header, and
the TCP ECN flags are supposed to reflect the correct state.

Also ensure that on receiving multiple ECN SYN-ACKs, the
responses triggered will reflect the latest state.

Reviewed By:		tuexen, #transport
Sponsored by:		NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D36973
2022-10-20 21:48:27 +02:00
Gleb Smirnoff
24cf7a8d62 inpcb: provide pcbinfo pointer argument to inp_apply_all()
Allows to clear inpcb layer of TCP knowledge.
2022-10-19 15:15:53 -07:00
Gleb Smirnoff
b6a816f116 inpcb: garbage collect so_sototcpcb()
It had very little use and required inpcb layer to know tcpcb.
2022-10-19 15:15:53 -07:00
Gleb Smirnoff
c37384665f tcp: style the struct tcpcb definition
- Use C99 types uintXX_t instead of u_intXX_t.
- Try to make space/tab usage a little bit more consistent.
- Shorten comments to fit into 80 chars.

Not a functional change, just making future changes easier to read.
2022-10-18 18:00:30 -07:00
Kristof Provost
a974702e27 pf: apply the network stack's ICMP rate limiting to ICMP errors sent by pf
PR:		266477
Event:		Aberdeen Hackathon 2022
Differential Revision:	https://reviews.freebsd.org/D36903
2022-10-14 10:36:16 +02:00
Gleb Smirnoff
3ba34b07a4 inpcb: provide in_pcbremhash() to reduce copy-paste 2022-10-13 09:03:38 -07:00
Michael Tuexen
dd36606b1b sctp: improve sending of ABORT packets in response to INIT-ACKs
Ensure that the initiate tag of the INIT-ACK chunk is used as the
verification tag of the packet containing the ABORT chunk.

Reported by:	Suganya Dharma
MFC after:	1 week
2022-10-13 01:05:44 +02:00
Alexander Motin
1e9482f433 inet: Simplify if_multiaddrs iteration.
Similar to 2cd6ad766e for inet6 drop ifma_restart use, creating more
problems than solving.  It is no longer needed after epoch introduction.

While there, add NULL check for ifma_ifp in igmp_change_state(), that
sometimes caused panics on interface destruction.

MFC after:	2 weeks
2022-10-08 13:10:07 -04:00
Richard Scheffenegger
6bf91573c1 tcp: update repeat <SYN,ACK> with latest IP ECN info
When multiple <SYN> segments are received, update the <SYN,ACK>
sent in response to the latest IP ECN and TCP ECN information.

On retransmitting the <SYN,ACK>, once ECN maxtries are done, not
only disable RFC3168 ECN, but AccECN also.

Reviewed By:    	tuexen, #transport
Sponsored by:   	NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D36875
2022-10-07 01:51:19 +02:00
Richard Scheffenegger
265d0f767c tcp: honor rfc1323 sysctl on passive sessions
On passive sessions, honor the local settings disabling or
enabling window scaling and timestamp options.

Reviewed By:    	tuexen, #transport
Sponsored by:   	NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D36874
2022-10-07 01:49:10 +02:00
Richard Scheffenegger
9c65583835 siftr: apply filter early on
Quickly check TCP port filter, before investing into
expensive operations.

No functional change.

Obtained from:  	guest-ccui
Reviewed By:    	#transport, tuexen, guest-ccui
Sponsored by:   	NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D36842
2022-10-07 01:39:41 +02:00
Gleb Smirnoff
53af690381 tcp: remove INP_TIMEWAIT flag
Mechanically cleanup INP_TIMEWAIT from the kernel sources.  After
0d7445193a, this commit shall not cause any functional changes.

Note: this flag was very often checked together with INP_DROPPED.
If we modify in_pcblookup*() not to return INP_DROPPED pcbs, we
will be able to remove most of this checks and turn them to
assertions.  Some of them can be turned into assertions right now,
but that should be carefully done on a case by case basis.

Differential revision:	https://reviews.freebsd.org/D36400
2022-10-06 19:24:37 -07:00
Gleb Smirnoff
9c3507f919 tcp: in tcp_usr_detach() remove special handling of compressed time-wait
Differential revision:	https://reviews.freebsd.org/D36399
2022-10-06 19:24:32 -07:00
Gleb Smirnoff
0d7445193a tcp: remove tcptw, the compressed timewait state structure
The memory savings the tcptw brought back in 2003 (see 340c35de6a) no
longer justify the complexity required to maintain it.  For longer
explanation please check out the email [1].

Surpisingly through almost 20 years the TCP stack functionality of
handling the TIME_WAIT state with a normal tcpcb did not bitrot.  The
existing tcp_input() properly handles a tcpcb in TCPS_TIME_WAIT state,
which is confirmed by the packetdrill tcp-testsuite [2].

This change just removes tcptw and leaves INP_TIMEWAIT.  The flag will
be removed in a separate commit.  This makes it easier to review and
possibly debug the changes.

[1] https://lists.freebsd.org/archives/freebsd-net/2022-January/001206.html
[2] https://github.com/freebsd-net/tcp-testsuite

Differential revision:	https://reviews.freebsd.org/D36398
2022-10-06 19:22:23 -07:00
Konstantin Belousov
2220b66fe0 Add mbuf_tstmp2timeval()
Reviewed by:	hselasky, jkim, rscheff
Sponsored by:	NVIDIA networking
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D36870
2022-10-06 00:38:13 +03:00
Hans Petter Selasky
c2a808b977 Fix kernel build after fcb3f813f3 .
By adding missing ifdefs for INET6 .

Differential Revision:	https://reviews.freebsd.org/D36731
Sponsored by:	NVIDIA Networking
2022-10-04 15:55:36 +02:00
Randall Stewart
cd84e78f09 tcp idle reduce does not work for a server.
TCP has an idle-reduce feature that allows a connection to reduce its
cwnd after it has been idle more than an RTT. This feature only works
for a sending side connection. It does this by at output checking the
idle time (t_rcvtime vs ticks) to see if its more than the RTO timeout.

The problem comes if you are a web server. You get a request and
then send out all the data.. then go idle. The next time you would
send is in response to a request from the peer asking for more data.
But the thing is you updated t_rcvtime when the request came in so
you never reduce.

The fix is to do the idle reduce check also on inbound.

Reviewed by: tuexen, rscheff
Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D36721
2022-10-04 07:09:01 -04:00