Commit Graph

346 Commits

Author SHA1 Message Date
Randall Stewart
ea9017fb25 tcp: Congestion control move to using reference counting.
In the transport call on 12/3 Gleb asked to move the CC modules towards
using reference counting to prevent folks from unloading a module in use.
It was also agreed that Michael would do a user space utility like tcp_drop
that could be used to move all connections that are using a specific CC
to some other CC.

This is the half I committed to doing, making it so that we maintain a refcount
on a cc module every time a pcb refers to it and decrementing that every
time a pcb no longer uses a cc module. This also helps us simplify the
whole unloading process by getting rid of tcp_ccunload() which munged
through all the tcb's. Instead we mark a module as being removed and
prevent further references to it. We also make sure that if a module is
marked as being removed it cannot be made as the default and also
the opposite of that, if its a default it fails and does not mark it as being
removed.

Reviewed by: Michael Tuexen, Gleb Smirnoff
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D33249
2022-02-21 06:30:17 -05:00
Richard Scheffenegger
0c2832ee4f tcp: Restore 6 tcps padding entries in HEAD
The padding in CURRENT shall not shrink. It is
designed that in CURRENT at always stays
the same, and then when a new stable is branched, it
inherits 6 pointer placeholders that can be used
withing this stable/X lifetime to extend the structure.

Reviewed By: tuexen, #transport
Sponsored by:        NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D34269
2022-02-15 09:24:07 +01:00
Richard Scheffenegger
3f169c54ab tcp: Add/update AccECN related statistics and numbers
Reserve couters in the tcps struct in preparation
for AccECN, extend the debugging output for TF2
flags, optimize the syncache flags from individual
bits to a codepoint for the specifc ECN handshake.

This is in preparation of AccECN.

No functional chance except for extended debug
output capabilities.

Reviewed By: #transport, rrs
Sponsored by:        NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D34161
2022-02-10 00:21:31 +01:00
Michael Tuexen
fd7daa7271 tcp: make tcp_ctloutput_set() non-static
tcp_ctloutput_set() will be used via the sysctl interface in a
upcoming command line tool tcpsso.

Reviewed by:		glebius, rscheff
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D34164
2022-02-08 18:49:44 +01:00
Richard Scheffenegger
1ebf460758 tcp: Access all 12 TCP header flags via inline function
In order to consistently provide access to all
(including reserved) TCP header flag bits,
use an accessor function tcp_get_flags and
tcp_set_flags. Also expand any flag variable from
uint8_t / char to uint16_t.

Reviewed By: hselasky, tuexen, glebius, #transport
Sponsored by:        NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D34130
2022-02-03 16:21:58 +01:00
Michael Tuexen
3b3c08c135 tcp: cleanup functions related to socket option handling
Consistently only pass the inp and the sopt around. Don't pass the
so around, since in a upcoming commit tcp_ctloutput_set() will be
called from a context different from setsockopt(). Also expect
the inp to be locked when calling tcp_ctloutput_[gs]et(), this is
also required for the upcoming use by tcpsso, a command line tool
to set socket options.
Reviewed by:		glebius, rscheff
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D34151
2022-02-02 09:27:59 +01:00
Richard Scheffenegger
68e623c3f0 tcp: Rewind erraneous RTO only while performing RTO retransmissions
Under rare circumstances, a spurious retranmission is
incorrectly detected and rewound, messing up various tcpcb values,
which can lead to a panic when SACK is in use.

Reviewed By: tuexen, chengc_netapp.com, #transport
MFC after:   3 days
Sponsored by:        NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D33979
2022-01-27 18:49:42 +01:00
Gleb Smirnoff
89128ff3e4 protocols: init with standard SYSINIT(9) or VNET_SYSINIT
The historical BSD network stack loop that rolls over domains and
over protocols has no advantages over more modern SYSINIT(9).
While doing the sweep, split global and per-VNET initializers.

Getting rid of pr_init allows to achieve several things:
o Get rid of ifdef's that protect against double foo_init() when
  both INET and INET6 are compiled in.
o Isolate initializers statically to the module they init.
o Makes code easier to understand and maintain.

Reviewed by:		melifaro
Differential revision:	https://reviews.freebsd.org/D33537
2022-01-03 10:15:21 -08:00
Gleb Smirnoff
f64dc2ab5b tcp: TCP output method can request tcp_drop
The advanced TCP stacks (bbr, rack) may decide to drop a TCP connection
when they do output on it.  The default stack never does this, thus
existing framework expects tcp_output() always to return locked and
valid tcpcb.

Provide KPI extension to satisfy demands of advanced stacks.  If the
output method returns negative error code, it means that caller must
call tcp_drop().

In tcp_var() provide three inline methods to call tcp_output():
- tcp_output() is a drop-in replacement for the default stack, so that
  default stack can continue using it internally without modifications.
  For advanced stacks it would perform tcp_drop() and unlock and report
  that with negative error code.
- tcp_output_unlock() handles the negative code and always converts
  it to positive and always unlocks.
- tcp_output_nodrop() just calls the method and leaves the responsibility
  to drop on the caller.

Sweep over the advanced stacks and use new KPI instead of using HPTS
delayed drop queue for that.

Reviewed by:		rrs, tuexen
Differential revision:	https://reviews.freebsd.org/D33370
2021-12-26 08:48:19 -08:00
Gleb Smirnoff
5b08b46a6d tcp: welcome back tcp_output() as the right way to run output on tcpcb.
Reviewed by:		rrs, tuexen
Differential revision:	https://reviews.freebsd.org/D33365
2021-12-26 08:47:42 -08:00
Robert Wing
2a28b045ca tcp_twrespond: send signed segment when connection is TCP-MD5
When a connection is established to use TCP-MD5, tcp_twrespond() doesn't
respond with a signed segment. This results in the host performing the
active close to remain in a TIME_WAIT state and the other host in the
LAST_ACK state. Fix this by sending a signed segment when the connection
is established to use TCP-MD5.

Reviewed by:	glebius
Differential Revision:	https://reviews.freebsd.org/D33490
2021-12-20 11:38:01 -09:00
Gleb Smirnoff
71d2d5adfe tcptw: count how many times a tcptw was actually useful
This will allow a sysadmin to lower net.inet.tcp.msl and
see how long tcptw are actually useful.
2021-12-19 08:22:12 -08:00
Gleb Smirnoff
cb3772639f tcptw: remove unused fields
The structure goes away anyway, but it would be interesting to know
how much memory we used to save with it.  So for the record, structure
size with this revision is 64 bytes.
2021-12-19 08:22:12 -08:00
Cy Schubert
db0ac6ded6 Revert "wpa: Import wpa_supplicant/hostapd commit 14ab4a816"
This reverts commit 266f97b5e9, reversing
changes made to a10253cffe.

A mismerge of a merge to catch up to main resulted in files being
committed which should not have been.
2021-12-02 14:45:04 -08:00
Cy Schubert
266f97b5e9 wpa: Import wpa_supplicant/hostapd commit 14ab4a816
This is the November update to vendor/wpa committed upstream 2021-11-26.

MFC after:      1 month
2021-12-02 13:35:14 -08:00
Gleb Smirnoff
de2d47842e SMR protection for inpcbs
With introduction of epoch(9) synchronization to network stack the
inpcb database became protected by the network epoch together with
static network data (interfaces, addresses, etc).  However, inpcb
aren't static in nature, they are created and destroyed all the
time, which creates some traffic on the epoch(9) garbage collector.

Fairly new feature of uma(9) - Safe Memory Reclamation allows to
safely free memory in page-sized batches, with virtually zero
overhead compared to uma_zfree().  However, unlike epoch(9), it
puts stricter requirement on the access to the protected memory,
needing the critical(9) section to access it.  Details:

- The database is already build on CK lists, thanks to epoch(9).
- For write access nothing is changed.
- For a lookup in the database SMR section is now required.
  Once the desired inpcb is found we need to transition from SMR
  section to r/w lock on the inpcb itself, with a check that inpcb
  isn't yet freed.  This requires some compexity, since SMR section
  itself is a critical(9) section.  The complexity is hidden from
  KPI users in inp_smr_lock().
- For a inpcb list traversal (a pcblist sysctl, or broadcast
  notification) also a new KPI is provided, that hides internals of
  the database - inp_next(struct inp_iterator *).

Reviewed by:		rrs
Differential revision:	https://reviews.freebsd.org/D33022
2021-12-02 10:48:48 -08:00
Gleb Smirnoff
ff94500855 Add tcp_freecb() - single place to free tcpcb.
Until this change there were two places where we would free tcpcb -
tcp_discardcb() in case if all timers are drained and tcp_timer_discard()
otherwise.  They were pretty much copy-n-paste, except that in the
default case we would run tcp_hc_update().  Merge this into single
function tcp_freecb() and move new short version of tcp_timer_discard()
to tcp_timer.c and make it static.

Reviewed by:		rrs, hselasky
Differential revision:	https://reviews.freebsd.org/D32965
2021-11-18 20:27:45 -08:00
Gleb Smirnoff
f581a26e46 Factor out tcp6_use_min_mtu() to handle IPV6_USE_MIN_MTU by TCP.
Pass control for IP/IP6 level options from generic tcp_ctloutput_set()
down to per-stack ctloutput.

Call tcp6_use_min_mtu() from tcp stack tcp_default_ctloutput().

Reviewed by:		rrs
Differential Revision:	https://reviews.freebsd.org/D32655
2021-10-27 08:22:00 -07:00
Peter Lei
e28330832b tcp: socket option to get stack alias name
TCP stack sysctl nodes are currently inserted using the stack
name alias. Allow the user to get the current stack's alias to
allow for programatic sysctl access.

Obtained from:	Netflix
2021-10-27 08:21:59 -07:00
Randall Stewart
a36230f75e tcp: Make dsack stats available in netstat and also make sure its aware of TLP's.
DSACK accounting has been for quite some time under a NETFLIX_STATS ifdef. Statistics
on DSACKs however are very useful in figuring out how much bad retransmissions you
are doing. This is further complicated, however, by stacks that do TLP. A TLP
when discovering a lost ack in the reverse path will cause the generation
of a DSACK. For this situation we introduce a new dsack-tlp-bytes as well
as the more traditional dsack-bytes and dsack-packets. These will now
all display in netstat -p tcp -s. This also updates all stacks that
are currently built to keep track of these stats.

Reviewed by: tuexen
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D32158
2021-10-01 10:36:27 -04:00
Hans Petter Selasky
e3e7d95332 tcp: Avoid division by zero when KERN_TLS is enabled in tcp_account_for_send().
If the "len" variable is non-zero, we can assume that the sum of
"tp->t_snd_rxt_bytes + tp->t_sndbytes" is also non-zero.

It is also assumed that the 64-bit byte counters will never wrap around.

Differential Revision:	https://reviews.freebsd.org/D31959
Reviewed by:	gallatin, rrs and tuexen
Found by:	"I told you so", also called hselasky
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2021-09-15 18:05:31 +02:00
Andrew Gallatin
739de953ec ktls: Move KERN_TLS ifdef to tcp_var.h
This allows us to remove stubs in ktls.h and allows us
to sort the function prototypes.

Reviewed by: jhb
Sponsored by: Netflix
2021-08-05 19:17:35 -04:00
Randall Stewart
ca1a7e1021 tcp: TCP_LRO getting bad checksums and sending it in to TCP incorrectly.
In reviewing tcp_lro.c we have a possibility that some drives may send a mbuf into
LRO without making sure that the checksum passes. Some drivers actually are
aware of this and do not call lro when the csum failed, others do not do this and
thus could end up sending data up that we think has a checksum passing when
it does not.

This change will fix that situation by properly verifying that the mbuf
has the correct markings (CSUM VALID bits as well as csum in mbuf header
is set to 0xffff).

Reviewed by: tuexen, hselasky, gallatin
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D31155
2021-07-13 12:45:15 -04:00
Andrew Gallatin
28d0a740dd ktls: auto-disable ifnet (inline hw) kTLS
Ifnet (inline) hw kTLS NICs typically keep state within
a TLS record, so that when transmitting in-order,
they can continue encryption on each segment sent without
DMA'ing extra state from the host.

This breaks down when transmits are out of order (eg,
TCP retransmits).  In this case, the NIC must re-DMA
the entire TLS record up to and including the segment
being retransmitted.  This means that when re-transmitting
the last 1448 byte segment of a TLS record, the NIC will
have to re-DMA the entire 16KB TLS record. This can lead
to the NIC running out of PCIe bus bandwidth well before
it saturates the network link if a lot of TCP connections have
a high retransmoit rate.

This change introduces a new sysctl (kern.ipc.tls.ifnet_max_rexmit_pct),
where TCP connections with higher retransmit rate will be
switched to SW kTLS so as to conserve PCIe bandwidth.

Reviewed by:	hselasky, markj, rrs
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D30908
2021-07-06 10:28:32 -04:00
Randall Stewart
9e4d9e4c4d tcp: Preparation for allowing hardware TLS to be able to kick a tcp connection that is retransmitting too much out of hardware and back to software.
Hardware TLS is now supported in some interface cards and it works well. Except that
when we have connections that retransmit a lot we get into trouble with all the retransmits.
This prep step makes way for change that Drew will be making so that we can "kick out" a
session from hardware TLS.

Reviewed by: mtuexen, gallatin
Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D30895
2021-06-25 09:30:54 -04:00
Randall Stewart
67e892819b tcp: Mbuf leak while holding a socket buffer lock.
When running at NF the current Rack and BBR changes with the recent
commits from Richard that cause the socket buffer lock to be held over
the ip_output() call and then finally culminating in a call to tcp_handle_wakeup()
we get a lot of leaked mbufs. I don't think that this leak is actually caused
by holding the lock or what Richard has done, but is exposing some other
bug that has probably been lying dormant for a long time. I will continue to
look (using his changes) at what is going on to try to root cause out the issue.

In the meantime I can't leave the leaks out for everyone else. So this commit
will revert all of Richards changes and move both Rack and BBR back to just
doing the old sorwakeup_locked() calls after messing with the so_rcv buffer.

We may want to look at adding back in Richards changes after I have pinpointed
the root cause of the mbuf leak and fixed it.

Reviewed by: mtuexen,rscheff
Sponsored by: Netflix Inc
Differential Revision:	https://reviews.freebsd.org/D30704
2021-06-10 08:33:57 -04:00
Richard Scheffenegger
032bf749fd [tcp] Keep socket buffer locked until upcall
r367492 would unlock the socket buffer before eventually calling the upcall.
This leads to problematic interaction with NFS kernel server/client components
(MP threads) accessing the socket buffer with potentially not correctly updated
state.

Reported by: rmacklem
Reviewed By: tuexen, #transport
Tested by: rmacklem, otis
MFC after: 2 weeks
Sponsored By: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D29690
2021-05-21 11:07:51 +02:00
Richard Scheffenegger
0471a8c734 tcp: SACK Lost Retransmission Detection (LRD)
Recover from excessive losses without reverting to a
retransmission timeout (RTO). Disabled by default, enable
with sysctl net.inet.tcp.do_lrd=1

Reviewed By: #transport, rrs, tuexen, #manpages
Sponsored by: Netapp, Inc.
Differential Revision: https://reviews.freebsd.org/D28931
2021-05-10 19:06:20 +02:00
Randall Stewart
5d8fd932e4 This brings into sync FreeBSD with the netflix versions of rack and bbr.
This fixes several breakages (panics) since the tcp_lro code was
committed that have been reported. Quite a few new features are
now in rack (prefecting of DGP -- Dynamic Goodput Pacing among the
largest). There is also support for ack-war prevention. Documents
comming soon on rack..

Sponsored by:           Netflix
Reviewed by:		rscheff, mtuexen
Differential Revision:	https://reviews.freebsd.org/D30036
2021-05-06 11:22:26 -04:00
Hans Petter Selasky
9ca874cf74 Add TCP LRO support for VLAN and VxLAN.
This change makes the TCP LRO code more generic and flexible with regards
to supporting multiple different TCP encapsulation protocols and in general
lays the ground for broader TCP LRO support. The main job of the TCP LRO code is
to merge TCP packets for the same flow, to reduce the number of calls to upper
layers. This reduces CPU and increases performance, due to being able to send
larger TSO offloaded data chunks at a time. Basically the TCP LRO makes it
possible to avoid per-packet interaction by the host CPU.

Because the current TCP LRO code was tightly bound and optimized for TCP/IP
over ethernet only, several larger changes were needed. Also a minor bug was
fixed in the flushing mechanism for inactive entries, where the expire time,
"le->mtime" was not always properly set.

To avoid having to re-run time consuming regression tests for every change,
it was chosen to squash the following list of changes into a single commit:
- Refactor parsing of all address information into the "lro_parser" structure.
  This easily allows to reuse parsing code for inner headers.
- Speedup header data comparison. Don't compare field by field, but
  instead use an unsigned long array, where the fields get packed.
- Refactor the IPv4/TCP/UDP checksum computations, so that they may be computed
  recursivly, only applying deltas as the result of updating payload data.
- Make smaller inline functions doing one operation at a time instead of
  big functions having repeated code.
- Refactor the TCP ACK compression code to only execute once
  per TCP LRO flush. This gives a minor performance improvement and
  keeps the code simple.
- Use sbintime() for all time-keeping. This change also fixes flushing
  of inactive entries.
- Try to shrink the size of the LRO entry, because it is frequently zeroed.
- Removed unused TCP LRO macros.
- Cleanup unused TCP LRO statistics counters while at it.
- Try to use __predict_true() and predict_false() to optimise CPU branch
  predictions.

Bump the __FreeBSD_version due to changing the "lro_ctrl" structure.

Tested by:	Netflix
Reviewed by:	rrs (transport)
Differential Revision:	https://reviews.freebsd.org/D29564
MFC after:	2 week
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2021-04-20 13:36:22 +02:00
Michael Tuexen
9e644c2300 tcp: add support for TCP over UDP
Adding support for TCP over UDP allows communication with
TCP stacks which can be implemented in userspace without
requiring special priviledges or specific support by the OS.
This is joint work with rrs.

Reviewed by:		rrs
Sponsored by:		Netflix, Inc.
MFC after:		1 week
Differential Revision:	https://reviews.freebsd.org/D29469
2021-04-18 16:16:42 +02:00
Richard Scheffenegger
d1de2b05a0 tcp: Rename rfc6675_pipe to sack.revised, and enable by default
As full support of RFC6675 is in place, deprecating
net.inet.tcp.rfc6675_pipe and enabling by default
net.inet.tcp.sack.revised.

Reviewed By: #transport, kbowling, rrs
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D28702
2021-04-17 14:59:45 +02:00
Richard Scheffenegger
90cca08e91 tcp: Prepare PRR to work with NewReno LossRecovery
Add proper PRR vnet declarations for consistency.
Also add pointer to tcpopt struct to tcp_do_prr_ack, in preparation
for it to deal with non-SACK window reduction (after loss).

No functional change.

MFC after: 2 weeks
Reviewed By: tuexen, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D29440
2021-04-08 19:16:31 +02:00
Richard Scheffenegger
eb3a59a831 tcp: Refactor PRR code
No functional change intended.

MFC after: 2 weeks
Reviewed By: #transport, rrs
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D29411
2021-03-26 00:01:34 +01:00
Richard Scheffenegger
e53138694a tcp: Add prr_out in preparation for PRR/nonSACK and LRD
Reviewed By:           #transport, kbowling
MFC after:             3 days
Sponsored By:          Netapp, Inc.
Differential Revision: https://reviews.freebsd.org/D29058
2021-03-06 00:38:22 +01:00
Randall Stewart
69a34e8d02 Update the LRO processing code so that we can support
a further CPU enhancements for compressed acks. These
are acks that are compressed into an mbuf. The transport
has to be aware of how to process these, and an upcoming
update to rack will do so. You need the rack changes
to actually test and validate these since if the transport
does not support mbuf compression, then the old code paths
stay in place. We do in this commit take out the concept
of logging if you don't have a lock (which was quite
dangerous and was only for some early debugging but has
been left in the code).

Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D28374
2021-02-17 10:41:01 -05:00
Michael Tuexen
d2b3ceddcc tcp: add sysctl to tolerate TCP segments missing timestamps
When timestamp support has been negotiated, TCP segements received
without a timestamp should be discarded. However, there are broken
TCP implementations (for example, stacks used by Omniswitch 63xx and
64xx models), which send TCP segments without timestamps although
they negotiated timestamp support.
This patch adds a sysctl variable which tolerates such TCP segments
and allows to interoperate with broken stacks.

Reviewed by:		jtl@, rscheff@
Differential Revision:	https://reviews.freebsd.org/D28142
Sponsored by:		Netflix, Inc.
PR:			252449
MFC after:		1 week
2021-01-14 19:28:25 +01:00
Richard Scheffenegger
0e1d7c25c5 Add TCP feature Proportional Rate Reduction (PRR) - RFC6937
PRR improves loss recovery and avoids RTOs in a wide range
of scenarios (ACK thinning) over regular SACK loss recovery.

PRR is disabled by default, enable by net.inet.tcp.do_prr = 1.
Performance may be impeded by token bucket rate policers at
the bottleneck, where net.inet.tcp.do_prr_conservate = 1
should be enabled in addition.

Submitted by:	Aris Angelogiannopoulos
Sponsored by:	NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D18892
2020-12-04 11:29:27 +00:00
Richard Scheffenegger
4d0770f172 Prevent premature SACK block transmission during loss recovery
Under specific conditions, a window update can be sent with
outdated SACK information. Some clients react to this by
subsequently delaying loss recovery, making TCP perform very
poorly.

Reported by:	chengc_netapp.com
Reviewed by:	rrs, jtl
MFC after:	2 weeks
Sponsored by:	NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D24237
2020-11-08 18:47:05 +00:00
John Baldwin
ce39811544 Save the current TCP pacing rate in t_pacing_rate.
Reviewed by:	gallatin, gnn
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D26875
2020-10-29 00:03:19 +00:00
Richard Scheffenegger
5432120028 Extend netstat to display TCP stack and detailed congestion state (2)
Extend netstat to display TCP stack and detailed congestion state

Adding the "-c" option used to show detailed per-connection
congestion control state for TCP sessions.

This is one summary patch, which adds the relevant variables into
xtcpcb. As previous "spare" space is used, these changes are ABI
compatible.

Reviewed by:	tuexen
MFC after:	2 weeks
Sponsored by:	NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D26518
2020-10-09 10:55:19 +00:00
Michael Tuexen
42d7560796 Export the name of the congestion control. This will be used by sockstat
and netstat.

Reviewed by:		rscheff
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D26412
2020-09-13 09:06:50 +00:00
Richard Scheffenegger
f359d6ebbc Improve SACK support code for RFC6675 and PRR
Adding proper accounting of sacked_bytes and (per-ACK)
delivered data to the SACK scoreboard. This will
allow more aspects of RFC6675 to be implemented as well
as Proportional Rate Reduction (RFC6937).

Prior to this change, the pipe calculation controlled with
net.inet.tcp.rfc6675_pipe was also susceptible to incorrect
results when more than 3 (or 4) holes in the sequence space
were present, which can no longer all fit into a single
ACK's SACK option.

Reviewed by:	kbowling, rgrimes (mentor)
Approved by:	rgrimes (mentor, blanket)
MFC after:	3 weeks
Sponsored by:	NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D18624
2020-08-13 16:30:09 +00:00
Richard Scheffenegger
9dc7d8a246 TCP: make after-idle work for transactional sessions.
The use of t_rcvtime as proxy for the last transmission
fails for transactional IO, where the client requests
data before the server can respond with a bulk transfer.

Set aside a dedicated variable to actually track the last
locally sent segment going forward.

Reported by:	rrs
Reviewed by:	rrs, tuexen (mentor)
Approved by:	tuexen (mentor), rgrimes (mentor)
MFC after:	2 weeks
Sponsored by:	NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D25016
2020-06-24 13:42:42 +00:00
Randall Stewart
e854dd38ac An important statistic in determining if a server process (or client) is being delayed
is to know the time to first byte in and time to first byte out. Currently we
have no way to know these all we have is t_starttime. That (t_starttime) tells us
what time the 3 way handshake completed. We don't know when the first
request came in or how quickly we responded. Nor from a client perspective
do we know how long from when we sent out the first byte before the
server responded.

This small change adds the ability to track the TTFB's. This will show up in
BB logging which then can be pulled for later analysis. Note that currently
the tracking is via the ticks variable of all three variables. This provides
a very rough estimate (hz=1000 its 1ms). A follow-on set of work will be
to change all three of these values into something with a much finer resolution
(either microseconds or nanoseconds), though we may want to make the resolution
configurable so that on lower powered machines we could still use the much
cheaper ticks variable.

Sponsored by:	Netflix Inc.
Differential Revision:	https://reviews.freebsd.org/D24902
2020-06-08 11:48:07 +00:00
Randall Stewart
d3b6c96b7d Adjust the fb to have a way to ask the underlying stack
if it can support the PRUS option (OOB). And then have
the new function call that to validate and give the
correct error response if needed to the user (rack
and bbr do not support obsoleted OOB data).

Sponsoered by: Netflix Inc.
Differential Revision:	 https://reviews.freebsd.org/D24574
2020-05-04 20:19:57 +00:00
Randall Stewart
e570d231f4 This change does a small prepratory step in getting the
latest rack and bbr in from the NF repo. When those come
in the OOB data handling will be fixed where Skyzaller crashes.

Differential Revision:	https://reviews.freebsd.org/D24575
2020-04-27 16:30:29 +00:00
Michael Tuexen
b89af8e16d Improve the TCP blackhole detection. The principle is to reduce the
MSS in two steps and try each candidate two times. However, if two
candidates are the same (which is the case in TCP/IPv6), this candidate
was tested four times. This patch ensures that each candidate actually
reduced the MSS and is only tested 2 times. This reduces the time window
of missclassifying a temporary outage as an MTU issue.

Reviewed by:		jtl
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D24308
2020-04-14 16:35:05 +00:00
Michael Tuexen
7ca6e2963f Use KMOD_TCPSTAT_INC instead of TCPSTAT_INC for RACK and BBR, since
these are kernel modules. Also add a KMOD_TCPSTAT_ADD and use that
instead of TCPSTAT_ADD.

Reviewed by:		jtl@, rrs@
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D23904
2020-03-12 15:37:41 +00:00
Michael Tuexen
a357466592 sack_newdata and snd_recover hold the same value. Therefore, use only
a single instance: use snd_recover also where sack_newdata was used.

Submitted by:		Richard Scheffenegger
Differential Revision:	https://reviews.freebsd.org/D18811
2020-02-13 15:14:46 +00:00