In case of a paritial chunk only pretend the result is OK if
the packet is not the last fragment and there is a valid association.
PR: 267476
MFC after: 3 days
Have tcpstats (netstat -s) differentiate between received and sent
ECN-marked packets. Also account for IP ECN bits (on TCP packets)
even when the tcp session has not negotiated ECN support.
Event: IETF 115 Hackathon
Reviewed By: glebius, tuexen, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D37314
The NAT module use of the tcphdr.th_x2 field now collides with the
use of this TCP header flag as AccECN (AE) bit. Use the topmost
bit instead to allow negotiation of AccECN across a NAT device.
Event: IETF 115 Hackathon
Reviewed By: #transport, tuexen
MFC after: 3 days
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D37300
The inp_socket is cleared only in in_pcbdetach(), which for TCP is
always accompanied with inp_pcbfree(). An inpcb that went through
in_pcbfree() shall never be returned by any kind of pcb lookup.
Reviewed by: tuexen
Differential revision: https://reviews.freebsd.org/D37062
These functions tcp_notify(), tcp_drop_syn_sent() and tcp_mtudisc()
are called from tcp*_ctlinput*() right after successfull
in_pcblookup*(). They shall never get a pcb that is dropped.
The in_pcbdrop() KPI, which is used solely by TCP, allows to remove a
pcb from hash list and mark it as dropped. The comment suggests that
such pcb won't be returned by lookups. Indeed, every call to
in_pcblookup*() is accompanied by a check for INP_DROPPED. Do what
comment suggests: never return such pcbs and remove unnecessary checks.
Reviewed by: tuexen
Differential revision: https://reviews.freebsd.org/D37061
This adds the capability for a modular congestion control
to select which variant of ECN-capable-transport it wants to use
when sending out elegible segments. As an initial CC to utilize
this, DCTCP was selected.
Event: IETF 115 Hackathon
Reviewed By: tuexen, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D24869
Keep all ECN related code in (mostly) one place.
No functional change.
Event: IETF 115 Hackathon
Reviewed By: tuexen, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D37285
Accurate ECN asks to conservatively estimate, when the
ACE counter may have wrapped due to a single ACK covering a larger
number of segments. This is described in Annex A.2 of the
accurate-ecn draft.
Event: IETF 115 Hackathon
Reviewed By: tuexen, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D37281
Provide diagnostics information around AccECN into
the tcpinfo struct.
Event: IETF 115 Hackathon
Reviewed By: tuexen, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D37280
Currently SO_REUSEPORT_LB silently does nothing when set by a jailed
process. It is trivial to support this option in VNET jails, but it's
also useful in traditional jails.
This patch enables LB groups in jails with the following semantics:
- all PCBs in a group must belong to the same jail,
- PCB lookup prefers jailed groups to non-jailed groups
This is a straightforward extension of the semantics used for individual
listening sockets. One pre-existing quirk of the lbgroup implementation
is that non-jailed lbgroups are searched before jailed listening
sockets; that is preserved with this change.
Discussed with: glebius
MFC after: 1 month
Sponsored by: Modirum MDPay
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D37029
If a memory allocation failure causes bind to fail, we should take the
inpcb back out of its LB group since it's not prepared to handle
connections.
Reviewed by: glebius
MFC after: 2 weeks
Sponsored by: Modirum MDPay
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D37027
Some auditing of the code shows that "cred" is never non-NULL in these
functions, either because all callers pass a non-NULL reference or
because they unconditionally dereference "cred". So, let's simplify the
code a bit and remove NULL checks. No functional change intended.
Reviewed by: glebius
MFC after: 1 week
Sponsored by: Modirum MDPay
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D37025
It is called only from tcp_input() and always has valid parameter.
Reviewed by: rscheff, tuexen
Differential revision: https://reviews.freebsd.org/D37115
The suppresion was added in 5f311da2cc with no explanation in the
commit message of the exact problem that was fixed. In the BSDCan
2006 talk [1], slides 12 to 14, we can find that it seems that there
was some problem with the TIME_WAIT state not properly being handled
on the remote side (also FreeBSD!), and this switching off the
suppression had hidden the problem. The rationale of the change was
that other stacks may also be buggy wrt the TIME_WAIT.
I did not find the actual problem in TIME_WAIT that the suppression
has hidden, neither a commit that would fix it. However, since that
time we started to handle SYNs with RFC5961 instead of RFC793, see
3220a2121c. We also now have the tcp-testsuite [2], that has full
coverage of all possible scenarios of receiving SYN in TIME_WAIT.
This effectively reverts 5f311da2cc
and 6ee79c59d2.
[1] https://www.bsdcan.org/2006/papers/ImprovingTCPIP.pdf
[2] https://github.com/freebsd-net/tcp-testsuite
Reviewed by: rscheff
Discussed with: rscheff, rrs, tuexen
Differential revision: https://reviews.freebsd.org/D37042
The assertion was incorrectly removed in 0d7445193a. The leak of
a TIME-WAIT state into tfb_do_segment_nounlock method was fixed in
31bc602ff8. The TIME-WAIT connections are processed by the main
tcp_input() always.
We recently got rid of the explicit INP_TIMEWAIT state, this has caused some
minor breakage to both rack and bbr. Basically the timewait check that was
in tcp_lro.c is now gone. This means that compressed_ack and mbuf_queued
packets will arrive at TCP without going through tcp_input_with_port(). We need
to expand the check that was stripped to look at the tcp_state (t_state) and
not "LRO" packets that are in the TCPS_TIMEWAIT state.
Reviewed by: tuexen, gliebus
Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D37080
tcp_respond is another function to build a tcp control packet
quickly. With ECN++ and AccECN, both the IP ECN header, and
the TCP ECN flags are supposed to reflect the correct state.
Also ensure that on receiving multiple ECN SYN-ACKs, the
responses triggered will reflect the latest state.
Reviewed By: tuexen, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D36973
- Use C99 types uintXX_t instead of u_intXX_t.
- Try to make space/tab usage a little bit more consistent.
- Shorten comments to fit into 80 chars.
Not a functional change, just making future changes easier to read.
Ensure that the initiate tag of the INIT-ACK chunk is used as the
verification tag of the packet containing the ABORT chunk.
Reported by: Suganya Dharma
MFC after: 1 week
Similar to 2cd6ad766e for inet6 drop ifma_restart use, creating more
problems than solving. It is no longer needed after epoch introduction.
While there, add NULL check for ifma_ifp in igmp_change_state(), that
sometimes caused panics on interface destruction.
MFC after: 2 weeks
When multiple <SYN> segments are received, update the <SYN,ACK>
sent in response to the latest IP ECN and TCP ECN information.
On retransmitting the <SYN,ACK>, once ECN maxtries are done, not
only disable RFC3168 ECN, but AccECN also.
Reviewed By: tuexen, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D36875
On passive sessions, honor the local settings disabling or
enabling window scaling and timestamp options.
Reviewed By: tuexen, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D36874
Mechanically cleanup INP_TIMEWAIT from the kernel sources. After
0d7445193a, this commit shall not cause any functional changes.
Note: this flag was very often checked together with INP_DROPPED.
If we modify in_pcblookup*() not to return INP_DROPPED pcbs, we
will be able to remove most of this checks and turn them to
assertions. Some of them can be turned into assertions right now,
but that should be carefully done on a case by case basis.
Differential revision: https://reviews.freebsd.org/D36400
The memory savings the tcptw brought back in 2003 (see 340c35de6a) no
longer justify the complexity required to maintain it. For longer
explanation please check out the email [1].
Surpisingly through almost 20 years the TCP stack functionality of
handling the TIME_WAIT state with a normal tcpcb did not bitrot. The
existing tcp_input() properly handles a tcpcb in TCPS_TIME_WAIT state,
which is confirmed by the packetdrill tcp-testsuite [2].
This change just removes tcptw and leaves INP_TIMEWAIT. The flag will
be removed in a separate commit. This makes it easier to review and
possibly debug the changes.
[1] https://lists.freebsd.org/archives/freebsd-net/2022-January/001206.html
[2] https://github.com/freebsd-net/tcp-testsuite
Differential revision: https://reviews.freebsd.org/D36398
TCP has an idle-reduce feature that allows a connection to reduce its
cwnd after it has been idle more than an RTT. This feature only works
for a sending side connection. It does this by at output checking the
idle time (t_rcvtime vs ticks) to see if its more than the RTO timeout.
The problem comes if you are a web server. You get a request and
then send out all the data.. then go idle. The next time you would
send is in response to a request from the peer asking for more data.
But the thing is you updated t_rcvtime when the request came in so
you never reduce.
The fix is to do the idle reduce check also on inbound.
Reviewed by: tuexen, rscheff
Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D36721