Commit Graph

713 Commits

Author SHA1 Message Date
Gleb Smirnoff
35bc0bcc51 tcp: reduce argument list to functions that pass a segment
The socket argument is superfluous, as a tcpcb always has one and
only one socket.

Reviewed by:		rrs
Differential Revision:	https://reviews.freebsd.org/D39434
2023-04-07 12:18:06 -07:00
Gleb Smirnoff
78e6c3aacc tcp: update error counter when dropping a packet due to bad source
Use the same counter that ip_input()/ip6_input() use for bad destination
address.  For IPv6 this is already heavily abused ip6s_badscope, which
needs to be split into several separate error counters.

Reviewed by:		markj
Differential Revision:	https://reviews.freebsd.org/D39234
2023-03-27 18:37:15 -07:00
Randall Stewart
69c7c81190 Move access to tcp's t_logstate into inline functions and provide new tracepoint and bbpoint capabilities.
The TCP stacks have long accessed t_logstate directly, but in order to do tracepoints and the new bbpoints
we need to move to using the new inline functions. This adds them and moves rack to now use
the tcp_tracepoints.

Reviewed by: tuexen, gallatin
Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D38831
2023-03-16 11:43:16 -04:00
Mark Johnston
713264f6b8 netinet: Tighten checks for unspecified source addresses
The assertions added in commit b0ccf53f24 ("inpcb: Assert against
wildcard addrs in in_pcblookup_hash_locked()") revealed that protocol
layers may pass the unspecified address to in_pcblookup().

Add some checks to filter out such packets before we attempt an inpcb
lookup:
- Disallow the use of an unspecified source address in in_pcbladdr() and
  in6_pcbladdr().
- Disallow IP packets with an unspecified destination address.
- Disallow TCP packets with an unspecified source address, and add an
  assertion to verify the comment claiming that the case of an
  unspecified destination address is handled by the IP layer.

Reported by:	syzbot+9ca890fb84e984e82df2@syzkaller.appspotmail.com
Reported by:	syzbot+ae873c71d3c71d5f41cb@syzkaller.appspotmail.com
Reported by:	syzbot+e3e689aba1d442905067@syzkaller.appspotmail.com
Reviewed by:	glebius, melifaro
MFC after:	2 weeks
Sponsored by:	Klara, Inc.
Sponsored by:	Modirum MDPay
Differential Revision:	https://reviews.freebsd.org/D38570
2023-03-06 15:06:00 -05:00
Richard Scheffenegger
18b83b626a tcp: reduce the size of t_rttupdated in tcpcb
During tcp session start, various mechanisms need to
track a few initial RTTs before becoming active.
Prevent overflows of the corresponding tracking counter
and reduce the size of tcpcb simultaneously.

Reviewed By:		#transport, tuexen, guest-ccui
Sponsored by:		NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D21117
2023-01-26 18:08:00 +01:00
Gleb Smirnoff
aab8c844b9 tcp/ipfw: fix "ipfw fwd localaddr,port"
The ipfw(4) feature of forwarding to local address without modifying
a packet was broken.  The first lookup needs always be a non-wildcard
one, cause its goal is to find an already existing socket.  Otherwise
a local wildcard listener with the same port number may match resulting
in the connection being forwared to wrong port.

Reported by:	Pavel Polyakov <bsd kobyla.org>
Fixes:		d88eb4654f
2023-01-05 14:34:50 -08:00
Gleb Smirnoff
eaabc93764 tcp: retire TCPDEBUG
This subsystem is superseded by modern debugging facilities,
e.g. DTrace probes and TCP black box logging.

We intentionally leave SO_DEBUG in place, as many utilities may
set it on a socket.  Also the tcp::debug DTrace probes look at
this flag on a socket.

Reviewed by:		gnn, tuexen
Discussed with:		rscheff, rrs, jtl
Differential revision:	https://reviews.freebsd.org/D37694
2022-12-14 09:54:06 -08:00
Gleb Smirnoff
e68b379244 tcp: embed inpcb into tcpcb
For the TCP protocol inpcb storage specify allocation size that would
provide space to most of the data a TCP connection needs, embedding
into struct tcpcb several structures, that previously were allocated
separately.

The most import one is the inpcb itself.  With embedding we can provide
strong guarantee that with a valid TCP inpcb the tcpcb is always valid
and vice versa.  Also we reduce number of allocs/frees per connection.
The embedded inpcb is placed in the beginning of the struct tcpcb,
since in_pcballoc() requires that.  However, later we may want to move
it around for cache line efficiency, and this can be done with a little
effort.  The new intotcpcb() macro is ready for such move.

The congestion algorithm data, the TCP timers and osd(9) data are
also embedded into tcpcb, and temprorary struct tcpcb_mem goes away.
There was no extra allocation here, but we went through extra pointer
every time we accessed this data.

One interesting side effect is that now TCP data is allocated from
SMR-protected zone.  Potentially this allows the TCP stacks or other
TCP related modules to utilize that for their own synchronization.

Large part of the change was done with sed script:

s/tp->ccv->/tp->t_ccv./g
s/tp->ccv/\&tp->t_ccv/g
s/tp->cc_algo/tp->t_cc/g
s/tp->t_timers->tt_/tp->tt_/g
s/CCV\(ccv, osd\)/\&CCV(ccv, t_osd)/g

Dependency side effect is that code that needs to know struct tcpcb
should also know struct inpcb, that added several <netinet/in_pcb.h>.

Differential revision:	https://reviews.freebsd.org/D37127
2022-12-07 09:00:48 -08:00
Michael Tuexen
bd4f986644 tcp: remove unused t_rttbest
No functional change intended.

Reviewed by:		rscheff@
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D37401
2022-11-16 11:22:13 +01:00
Gleb Smirnoff
9eb0e8326d tcp: provide macros to access inpcb and socket from a tcpcb
There should be no functional changes with this commit.

Reviewed by:		rscheff
Differential revision:	https://reviews.freebsd.org/D37123
2022-11-08 10:24:40 -08:00
Gleb Smirnoff
f71cb9f748 tcp: inp_socket is valid through the lifetime of a TCP inpcb
The inp_socket is cleared only in in_pcbdetach(), which for TCP is
always accompanied with inp_pcbfree().  An inpcb that went through
in_pcbfree() shall never be returned by any kind of pcb lookup.

Reviewed by:		tuexen
Differential revision:	https://reviews.freebsd.org/D37062
2022-11-08 10:24:39 -08:00
Gleb Smirnoff
f567d55f51 inpcb: don't return INP_DROPPED entries from pcb lookups
The in_pcbdrop() KPI, which is used solely by TCP, allows to remove a
pcb from hash list and mark it as dropped.  The comment suggests that
such pcb won't be returned by lookups.  Indeed, every call to
in_pcblookup*() is accompanied by a check for INP_DROPPED.  Do what
comment suggests: never return such pcbs and remove unnecessary checks.

Reviewed by:		tuexen
Differential revision:	https://reviews.freebsd.org/D37061
2022-11-08 10:24:39 -08:00
Richard Scheffenegger
004bb636ca tcp: Move sysctl OIDs related to ECN to tcp_ecn.c
Keep all ECN related code in (mostly) one place.

No functional change.

Event:			IETF 115 Hackathon
Reviewed By:		tuexen, #transport
Sponsored by:		NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D37285
2022-11-06 12:38:42 +01:00
Richard Scheffenegger
b1258b7643 tcp: add conservative d.cep accounting algorithm
Accurate ECN asks to conservatively estimate, when the
ACE counter may have wrapped due to a single ACK covering a larger
number of segments. This is described in Annex A.2 of the
accurate-ecn draft.

Event:			IETF 115 Hackathon
Reviewed By:		tuexen, #transport
Sponsored by:		NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D37281
2022-11-06 12:05:22 +01:00
Gleb Smirnoff
c348e88053 tcp: make tcp_handle_wakeup() static and robust
It is called only from tcp_input() and always has valid parameter.

Reviewed by:		rscheff, tuexen
Differential revision:	https://reviews.freebsd.org/D37115
2022-10-31 08:57:15 -07:00
Gleb Smirnoff
eda633455a tcp: remove useless today lock assertion in a middle of function
It was added back in 7cfc690440, when there was a jump label
above and tcp_input() hadn't been locked all through.
2022-10-25 11:09:22 -07:00
Richard Scheffenegger
83c1ec92e4 tcp: ECN preparations for ECN++, AccECN (tcp_respond)
tcp_respond is another function to build a tcp control packet
quickly. With ECN++ and AccECN, both the IP ECN header, and
the TCP ECN flags are supposed to reflect the correct state.

Also ensure that on receiving multiple ECN SYN-ACKs, the
responses triggered will reflect the latest state.

Reviewed By:		tuexen, #transport
Sponsored by:		NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D36973
2022-10-20 21:48:27 +02:00
Gleb Smirnoff
0d7445193a tcp: remove tcptw, the compressed timewait state structure
The memory savings the tcptw brought back in 2003 (see 340c35de6a) no
longer justify the complexity required to maintain it.  For longer
explanation please check out the email [1].

Surpisingly through almost 20 years the TCP stack functionality of
handling the TIME_WAIT state with a normal tcpcb did not bitrot.  The
existing tcp_input() properly handles a tcpcb in TCPS_TIME_WAIT state,
which is confirmed by the packetdrill tcp-testsuite [2].

This change just removes tcptw and leaves INP_TIMEWAIT.  The flag will
be removed in a separate commit.  This makes it easier to review and
possibly debug the changes.

[1] https://lists.freebsd.org/archives/freebsd-net/2022-January/001206.html
[2] https://github.com/freebsd-net/tcp-testsuite

Differential revision:	https://reviews.freebsd.org/D36398
2022-10-06 19:22:23 -07:00
Randall Stewart
cd84e78f09 tcp idle reduce does not work for a server.
TCP has an idle-reduce feature that allows a connection to reduce its
cwnd after it has been idle more than an RTT. This feature only works
for a sending side connection. It does this by at output checking the
idle time (t_rcvtime vs ticks) to see if its more than the RTO timeout.

The problem comes if you are a web server. You get a request and
then send out all the data.. then go idle. The next time you would
send is in response to a request from the peer asking for more data.
But the thing is you updated t_rcvtime when the request came in so
you never reduce.

The fix is to do the idle reduce check also on inbound.

Reviewed by: tuexen, rscheff
Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D36721
2022-10-04 07:09:01 -04:00
Randall Stewart
08af8aac2a Tcp progress timeout
Rack has had the ability to timeout connections that just sit idle automatically. This
feature of course is off by default and requires the user set it on (though the socket option
has been missing in tcp_usrreq.c). Lets get the progress timeout fully supported in
the base stack as well as rack.

Reviewed by: tuexen
Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D36716
2022-09-27 13:38:20 -04:00
Randall Stewart
d1b07f36a2 TCP complete end status work.
The ending of a connection can tell us a lot about what happened i.e. did
it fail to setup, did it timeout, was it a normal close. Often times this is
useful information to help analyze and debug issues. Rack has had
end status for some time but the base stack as not. Lets go a ahead
and add in the missing bits to populate the end status.

Reviewed by: tuexen, rscheff
Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D36712
2022-09-26 15:20:18 -04:00
Randall Stewart
e5049a1733 TCP rack does not work properly with cubic.
Right now if you use rack with cubic (the new default cc) you will have
improper results. This is because rack uses different variables than
the base stack (or bbr) and thus tcp_compute_pipe() always returns
so that cubic will choose a 30% backoff not the 50% backoff it should
when it is newreno compatibility mode. The fix is to allow a stack (rack)
to override its own compute_pipe.

Reviewed by: tuexen, rscheff
Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D36711
2022-09-26 15:12:03 -04:00
Michael Tuexen
5ae83e0d87 tcp: send ACKs when requested
When doing Limited Transmit send an ACK when needed by the protocol
processing (like sending ACKs with a DSACK block).

PR:			264257
PR:			263445
PR:			260393
Reviewed by:		rscheff@
MFC after:		3 days
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D36631
2022-09-22 12:12:11 +02:00
Gleb Smirnoff
493105c2a8 tcp: fix simultaneous open and refine e80062a2d4
- The soisconnected() call on transition from SYN_RCVD to ESTABLISHED
  is also necessary for a half-synchronized connection.  Fix that
  just setting the flag, when we transfer SYN-SENT -> SYN-RECEIVED.
- Provide a comment that explains at what conditions the call to
  soisconnected() is necessary.
- Hence mechanically rename the TF_INCQUEUE flag to TF_SONOTCONN.
- Extend the change to the BBR and RACK stacks.

Note: the interaction between the accept_filter(9) and the socket layer
is not fully consistent, yet.  For most accept filters this call to
soisconnected() will not move the connection from the incomplete queue
to the complete.  The move would happen only when the filter has received
the desired data, and soisconnected() would be called once again from
sorwakeup().  Ideally, we should mark socket as connected only there,
and leave the soisconnected() from SYN_RCVD->ESTABLISHED only for the
simultaneous open case.  However, this doesn't yet work.

Reviewed by:		rscheff, tuexen, rrs
Differential revision:	https://reviews.freebsd.org/D36641
2022-09-21 14:02:49 -07:00
Gleb Smirnoff
e80062a2d4 tcp: avoid call to soisconnected() on transition to ESTABLISHED
This call existed since pre-FreeBSD times, and it is hard to understand
why it was there in the first place.  After 6f3caa6d81 it definitely
became necessary always and commit message from f1ee30ccd6 confirms that.
Now that 6f3caa6d81 is effectively backed out by 07285bb4c2, the call
appears to be useful only for sockets that landed on the incomplete queue,
e.g. sockets that have accept_filter(9) enabled on them.

Provide a new TCP flag to mark connections that are known to be on the
incomplete queue, and call soisconnected() only for those connections.

Reviewed by:		rrs, tuexen
Differential revision:	https://reviews.freebsd.org/D36488
2022-09-08 09:16:04 -07:00
Richard Scheffenegger
c21b7b55be tcp: finish SACK loss recovery on sudden lack of SACK blocks
While a receiver should continue sending SACK blocks for the
duration of a SACK loss recovery, if for some reason the
TCP options no longer contain these SACK blocks, but we
already started maintaining the Scoreboard, keep on handling
incoming ACKs (without SACK) as belonging to the SACK recovery.

Reported by:		thj
Reviewed by:		tuexen, #transport
MFC after:		2 weeks
Sponsored by:		NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D36046
2022-08-31 14:49:47 +02:00
Gleb Smirnoff
c00605751e tcp: remove a dead code leftover from T/TCP,
that doesn't have any value today.
2022-08-29 19:30:12 -07:00
Gleb Smirnoff
d9f6ac882a protosw: retire PRU_ flags and their char names
For many years only TCP debugging used them, but relatively recently
TCP DTrace probes also start to use them.  Move their declarations
into tcp_debug.h, but start including tcp_debug.h unconditionally,
so that compilation with DTrace and without TCPDEBUG is possible.
2022-08-17 11:50:32 -07:00
Gleb Smirnoff
d88eb4654f tcp: address a wire level race with 2 ACKs at the end of TCP handshake
Imagine we are in SYN-RCVD state and two ACKs arrive at the same time,
both valid, e.g. coming from the same host and with valid sequence.

First packet would locate the listening socket in the inpcb database,
write-lock it and start expanding the syncache entry into a socket.
Meanwhile second packet would wait on the write lock of the listening
socket.  First packet will create a new ESTABLISHED socket, free the
syncache entry and unlock the listening socket.  Second packet would
call into syncache_expand(), but this time it will fail as there
is no syncache entry.  Second packet would generate RST, effectively
resetting the remote connection.

It seems to me, that it is impossible to solve this problem with
just rearranging locks, as the race happens at a wire level.

To solve the problem, for an ACK packet arrived on a listening socket,
that failed syncache lookup, perform a second non-wildcard lookup right
away.  That lookup may find the new born socket.  Otherwise, we indeed
send RST.

Tested by:		kp
Reviewed by:		tuexen, rrs
PR:			265154
Differential revision:	https://reviews.freebsd.org/D36066
2022-08-10 07:32:37 -07:00
Gleb Smirnoff
e7231d07a4 tcp_input: update comment to match reality. 2022-08-07 11:18:30 -07:00
Gleb Smirnoff
74703901d8 tcp: use a TCP flag to check if connection has been close(2)d
The flag SS_NOFDREF is a private flag of the socket layer.  It also
is supposed to be read with SOCK_LOCK(), which we don't own here.

Reviewed by:		rrs, tuexen
Differential revision:	https://reviews.freebsd.org/D35663
2022-07-04 12:40:51 -07:00
Gleb Smirnoff
ad3ad06477 blackhole(4): fix operator precedence
Fixes:	3ea9a7cf7b
2022-06-27 17:52:19 -07:00
Hans Petter Selasky
f5766992c0 tcp: Correctly compute the TCP goodput in bits per second by using SEQ_SUB().
TCP sequence number differences should be computed using SEQ_SUB().

Differential Revision:	https://reviews.freebsd.org/D35505
Reviewed by:	rscheff@
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-06-23 21:10:39 +02:00
Gleb Smirnoff
4328318445 sockets: use socket buffer mutexes in struct socket directly
Since c67f3b8b78 the sockbuf mutexes belong to the containing socket,
and socket buffers just point to it.  In 74a68313b5 macros that access
this mutex directly were added.  Go over the core socket code and
eliminate code that reaches the mutex by dereferencing the sockbuf
compatibility pointer.

This change requires a KPI change, as some functions were given the
sockbuf pointer only without any hint if it is a receive or send buffer.

This change doesn't cover the whole kernel, many protocols still use
compatibility pointers internally.  However, it allows operation of a
protocol that doesn't use them.

Reviewed by:		markj
Differential revision:	https://reviews.freebsd.org/D35152
2022-05-12 13:22:12 -07:00
Richard Scheffenegger
f7220c486c tcp: move ECN handling code to a common file
Reduce the burden to maintain correct and
extensible ECN related code across multiple
stacks and codepaths.

Formally no functional change.

Incidentially this establishes correct
ECN operation in one instance.

Reviewed By: rrs, #transport
Sponsored by:        NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D34162
2022-02-05 15:04:42 +01:00
Richard Scheffenegger
7994ef3c39 Revert "tcp: move ECN handling code to a common file"
This reverts commit 0c424c90ea.
2022-02-05 01:07:51 +01:00
Richard Scheffenegger
0c424c90ea tcp: move ECN handling code to a common file
Reduce the burden to maintain correct and
extensible ECN related code across multiple
stacks and codepaths.

Formally no functional change.

Incidentially this establishes correct
ECN operation in one instance.

Reviewed By: rrs, #transport
Sponsored by:        NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D34162
2022-02-04 22:54:41 +01:00
Richard Scheffenegger
1ebf460758 tcp: Access all 12 TCP header flags via inline function
In order to consistently provide access to all
(including reserved) TCP header flag bits,
use an accessor function tcp_get_flags and
tcp_set_flags. Also expand any flag variable from
uint8_t / char to uint16_t.

Reviewed By: hselasky, tuexen, glebius, #transport
Sponsored by:        NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D34130
2022-02-03 16:21:58 +01:00
Richard Scheffenegger
4531b3450b tcp: Tidying up the conditionals for unwinding a spurious RTO
- Use the semantically correct TSTMP_xx macro when comparing
  timestamps. (No functional change)
- check for bad retransmits only when TSopt is present in ACK
  (don't assume there will be a valid TSopt in the TCP options struct)
- exclude tsecr == 0, since that most likely indicates an
  invalid ts echo return (tsecr) value.

Reviewed By: tuexen, #transport
MFC after:   3 days
Sponsored by:        NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D34062
2022-01-27 18:59:55 +01:00
Richard Scheffenegger
68e623c3f0 tcp: Rewind erraneous RTO only while performing RTO retransmissions
Under rare circumstances, a spurious retranmission is
incorrectly detected and rewound, messing up various tcpcb values,
which can lead to a panic when SACK is in use.

Reviewed By: tuexen, chengc_netapp.com, #transport
MFC after:   3 days
Sponsored by:        NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D33979
2022-01-27 18:49:42 +01:00
Gleb Smirnoff
40fa3e40b5 tcp: mechanically substitute call to tfb_tcp_output to new method.
Made with sed(1) execution:

sed -Ef sed -i "" $(grep --exclude tcp_var.h -lr tcp_output sys/)

sed:
s/tp->t_fb->tfb_tcp_output\(tp\)/tcp_output(tp)/
s/to tfb_tcp_output\(\)/to tcp_output()/

Reviewed by:		rrs, tuexen
Differential revision:	https://reviews.freebsd.org/D33366
2021-12-26 08:47:59 -08:00
Gleb Smirnoff
75add59a8e tcp: allocate statistics in the main tcp_init()
No reason to have a separate SYSINIT.
2021-12-17 10:50:56 -08:00
Cy Schubert
db0ac6ded6 Revert "wpa: Import wpa_supplicant/hostapd commit 14ab4a816"
This reverts commit 266f97b5e9, reversing
changes made to a10253cffe.

A mismerge of a merge to catch up to main resulted in files being
committed which should not have been.
2021-12-02 14:45:04 -08:00
Cy Schubert
266f97b5e9 wpa: Import wpa_supplicant/hostapd commit 14ab4a816
This is the November update to vendor/wpa committed upstream 2021-11-26.

MFC after:      1 month
2021-12-02 13:35:14 -08:00
Gleb Smirnoff
de2d47842e SMR protection for inpcbs
With introduction of epoch(9) synchronization to network stack the
inpcb database became protected by the network epoch together with
static network data (interfaces, addresses, etc).  However, inpcb
aren't static in nature, they are created and destroyed all the
time, which creates some traffic on the epoch(9) garbage collector.

Fairly new feature of uma(9) - Safe Memory Reclamation allows to
safely free memory in page-sized batches, with virtually zero
overhead compared to uma_zfree().  However, unlike epoch(9), it
puts stricter requirement on the access to the protected memory,
needing the critical(9) section to access it.  Details:

- The database is already build on CK lists, thanks to epoch(9).
- For write access nothing is changed.
- For a lookup in the database SMR section is now required.
  Once the desired inpcb is found we need to transition from SMR
  section to r/w lock on the inpcb itself, with a check that inpcb
  isn't yet freed.  This requires some compexity, since SMR section
  itself is a critical(9) section.  The complexity is hidden from
  KPI users in inp_smr_lock().
- For a inpcb list traversal (a pcblist sysctl, or broadcast
  notification) also a new KPI is provided, that hides internals of
  the database - inp_next(struct inp_iterator *).

Reviewed by:		rrs
Differential revision:	https://reviews.freebsd.org/D33022
2021-12-02 10:48:48 -08:00
Gleb Smirnoff
3ea9a7cf7b blackhole(4): disable for locally originated TCP/UDP packets
In most cases blackholing for locally originated packets is undesired,
leads to different kind of lags and delays. Provide sysctls to enforce
it, e.g. for debugging purposes.

Reviewed by:		rrs
Differential revision:	https://reviews.freebsd.org/D32718
2021-11-03 13:02:44 -07:00
Richard Scheffenegger
74d7fc8753 tcp: Add PRR cwnd reduction for non-SACK loss
This completes PRR cwnd reduction in all circumstances
for the base TCP stack (SACK loss recovery, ECN window reduction,
non-SACK loss recovery), preventing the arriving ACKs to
clock out new data at the old, too high rate. This
reduces the chance to induce additional losses while
recovering from loss (during congested network conditions).

For non-SACK loss recovery, each ACK is assumed to have
one MSS delivered. In order to prevent ACK-split attacks,
only one window worth of ACKs is considered to actually
have delivered new data.

MFC after: 6 weeks
Reviewed By: rrs, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D29441
2021-06-19 19:25:22 +02:00
Mark Johnston
f4bb1869dd Consistently use the SOLISTENING() macro
Some code was using it already, but in many places we were testing
SO_ACCEPTCONN directly.  As a small step towards fixing some bugs
involving synchronization with listen(2), make the kernel consistently
use SOLISTENING().  No functional change intended.

MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-06-14 17:32:27 -04:00
Randall Stewart
4747500dea tcp: A better fix for the previously attempted fix of the ack-war issue with tcp.
So it turns out that my fix before was not correct. It ended with us failing
some of the "improved" SYN tests, since we are not in the correct states.
With more digging I have figured out the root of the problem is that when
we receive a SYN|FIN the reassembly code made it so we create a segq entry
to hold the FIN. In the established state where we were not in order this
would be correct i.e. a 0 len with a FIN would need to be accepted. But
if you are in a front state we need to strip the FIN so we correctly handle
the ACK but ignore the FIN. This gets us into the proper states
and avoids the previous ack war.

I back out some of the previous changes but then add a new change
here in tcp_reass() that fixes the root cause of the issue. We still
leave the rack panic fixes in place however.

Reviewed by: mtuexen
Sponsored by: Netflix Inc
Differential Revision:	https://reviews.freebsd.org/D30627
2021-06-04 05:26:43 -04:00
Randall Stewart
8c69d988a8 tcp: When we have an out-of-order FIN we do want to strip off the FIN bit.
The last set of commits fixed both a panic (in rack) and an ACK-war (in freebsd and bbr).
However there was a missing case, i.e. where we get an out-of-order FIN by itself.
In such a case we don't want to leave the FIN bit set, otherwise we will do the
wrong thing and ack the FIN incorrectly. Instead we need to go through the
tcp_reasm() code and that way the FIN will be stripped and all will be well.

Reviewed by: mtuexen,rscheff
Sponsored by: Netflix Inc
Differential Revision:	https://reviews.freebsd.org/D30497
2021-05-27 10:50:32 -04:00