Commit Graph

7703 Commits

Author SHA1 Message Date
Michael Tuexen
e044a0bce4 bblog: inherit TCP_LOG option from listener
When the TCP_LOG option is used to enable logging on a listening
socket, inherit this if the listener is not auto selected and does
not have a log id set.

Reviewed by:		cc
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D38436
2023-05-06 11:21:16 +02:00
Michael Tuexen
c2399dd2e2 tcp: improve BBLoging for PRUs
Log all errors for PRUs, except when INP_DROPPED is set. In that case,
don't log it.

Reviewed by:		glebius, rrs
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D39591
2023-05-06 11:12:06 +02:00
Michael Tuexen
113f56ba49 tcp_rack: allow the module to be loaded without TCP_BLACKBOX
PR:			271091
Reviewed by:		cc
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D39860
2023-05-06 11:02:50 +02:00
Michael Tuexen
04ede3675e sctp: only start shutdown guard timer when sending SHUTDOWN chunk
The intention is to protect a malicious peer not following the
shutdown procedures.

MFC after:	1 week
2023-05-03 20:28:46 +02:00
Michael Tuexen
d9ae4adff2 sctp: improve shutdown(..., SHUT_WR) handling
When shutdown(..., SHUT_WR) is called in the front states, send a
SHUTDOWN chunk when a COOKIE ACK chunk is received and there is
no outstanding data.

MFC after:	1 week
2023-05-03 17:33:49 +02:00
Michael Tuexen
1f0e13449b sctp: improve handling of stale cookie error causes
* If a measure of staleness of 0 is reported, use the RTT instead.
* Ensure that we always send a cookie preservative parameter by
  rounding up during the calculation.
* If allowed, perform a round trip time measurement.
* Clear the overall error counter, since the error cause also
  acts like an ACK.

MFC after:	1 week
2023-04-30 11:39:32 +02:00
Cheng Cui
60167184ab
siftr: remove barely used hash generation per record
Reviewers: rscheff, tuexen
Approved by: rscheff, tuexen
Subscribers: imp, melifaro, glebius
Differential Revision: https://reviews.freebsd.org/D39835
2023-04-26 15:57:42 -04:00
Gleb Smirnoff
f2e75b967a tcp_hpts: add missing "inline"
Fixes:	c2a69e846f
2023-04-25 15:18:26 -07:00
Cheng Cui
d090464ecd
Change the unit of srtt and rto to usec, inspired by these in struct "tcp_info". Therefore, no need hz and tcp_rtt_scale in the headline of the log. Update the man page as well.
Summary: Simplify srtt and rto values in siftr log.

Test Plan:
Tested in Emulab testbed:
cc@s1:~ % sudo sysctl net.inet.siftr
net.inet.siftr.port_filter: 0
net.inet.siftr.genhashes: 0
net.inet.siftr.ppl: 1
net.inet.siftr.logfile: /var/log/siftr.log
net.inet.siftr.enabled: 0
cc@s1:~ % sudo sysctl net.inet.siftr.port_filter=5001
net.inet.siftr.port_filter: 0 -> 5001
cc@s1:~ % sudo sysctl net.inet.siftr.enabled=1
net.inet.siftr.enabled: 0 -> 1
cc@s1:~ %
cc@s1:~ % iperf -c r1 -n 1M
------------------------------------------------------------
Client connecting to r1, TCP port 5001
TCP window size: 32.0 KByte (default)
------------------------------------------------------------
[  1] local 10.1.1.2 port 33817 connected with 10.1.1.3 port 5001
[ ID] Interval       Transfer     Bandwidth
[  1] 0.00-0.91 sec  1.00 MBytes  9.22 Mbits/sec
cc@s1:~ % sudo sysctl net.inet.siftr.enabled=0
net.inet.siftr.enabled: 1 -> 0

cc@s1:~ % ll /var/log/siftr.log
-rw-r--r--  1 root  wheel    91K Apr 25 09:38 /var/log/siftr.log
cc@s1:~ % cat /var/log/siftr.log
enable_time_secs=1682437111	enable_time_usecs=121115	siftrver=1.3.0	sysname=FreeBSD	sysver=1400088	ipmode=4
o,0x00000000,1682437125.907343,10.1.1.2,33817,10.1.1.3,5001,1073725440,1073725440,2,0,0,0,0,2,536,0,1,672,1000000,32768,0,65536,0,0,0,0,0
i,0x00000000,1682437126.106759,10.1.1.2,33817,10.1.1.3,5001,1073725440,1073725440,2,0,0,0,0,2,536,0,1,672,1000000,32768,0,65536,0,1,0,0,0
o,0x00000000,1682437126.106767,10.1.1.2,33817,10.1.1.3,5001,1073725440,14480,2,65535,65700,9,9,4,1460,201000,1,16778209,803000,33580,0,65700,0,0,0,0,0
o,0x00000000,1682437126.107141,10.1.1.2,33817,10.1.1.3,5001,1073725440,14480,2,65535,65700,9,9,4,1460,201000,1,16778208,803000,33580,60,65700,0,0,0,0,0
...
i,0x00000000,1682437127.016754,10.1.1.2,33817,10.1.1.3,5001,1073725440,606109,1030,748544,66048,9,9,9,1460,100812,1,1008,303000,475948,0,65700,0,0,0,0,0
o,0x00000000,1682437127.016759,10.1.1.2,33817,10.1.1.3,5001,1073725440,606109,1030,748544,66048,9,9,10,1460,100812,1,1011,303000,475948,0,65700,0,0,0,0,0
disable_time_secs=1682437131	disable_time_usecs=767582	num_inbound_tcp_pkts=371	num_outbound_tcp_pkts=186	total_tcp_pkts=557	num_inbound_skipped_pkts_malloc=0	num_outbound_skipped_pkts_malloc=0	num_inbound_skipped_pkts_tcpcb=0	num_outbound_skipped_pkts_tcpcb=0	num_inbound_skipped_pkts_inpcb=0	num_outbound_skipped_pkts_inpcb=0	total_skipped_tcp_pkts=0	flow_list=10.1.1.2;33817-10.1.1.3;5001,

Reviewers: rscheff, tuexen
Approved by: rscheff, tuexen
Subscribers: imp, melifaro, glebius
Differential Revision: https://reviews.freebsd.org/D39803
2023-04-25 13:26:39 -04:00
Gleb Smirnoff
c3c20de3b2 tcp: move HPTS/LRO flags out of inpcb to tcpcb
These flags are TCP specific.  While here, make also several LRO
internal functions to pass tcpcb pointer instead of inpcb one.

Reviewed by:		rrs
Differential Revision:	https://reviews.freebsd.org/D39698
2023-04-25 12:19:48 -07:00
Gleb Smirnoff
c2a69e846f tcp_hpts: move HPTS related fields from inpcb to tcpcb
This makes inpcb lighter and allows future cache line optimizations
of tcpcb.  The reason why HPTS originally used inpcb is the compressed
TIME-WAIT state (see 0d7445193a), that used to free a tcpcb, while the
associated connection is still on the HPTS ring.

Reviewed by:		rrs
Differential Revision:	https://reviews.freebsd.org/D39697
2023-04-25 12:18:33 -07:00
Gleb Smirnoff
144259f673 tcp: purge the input queue from tcp_discardcb()
The purge was intentionally removed in a540cdca31.  My assumption
was that the stacks that use the input queue always call the
tcp_handle_orphaned_packets() in their tfb_tcp_fb_fini method.
However, rack will skip doing that if t_fb_ptr is NULL and there are
scenarios when it is NULL, e.g. close(2) on a socket (but some
special close(2)).  Instead of working out all possible scenarios
let's put this safebelt back.

Reviewed by:		rrs
Differential Revision:	https://reviews.freebsd.org/D39696
2023-04-25 12:18:19 -07:00
Boris Lytochkin
fc727ad63d ipfw: add [fw]mark implementation for ipfw
Packet Mark is an analogue to ipfw tags with O(1) lookup from mbuf while
regular tags require a single-linked list traversal.
Mark is a 32-bit number that can be looked up in a table
[with 'number' table-type], matched or compared with a number with optional
mask applied before comparison.
Having generic nature, Mark can be used in a variety of needs.
For example, it could be used as a security group: mark will hold a security
group id and represent a group of packet flows that shares same access
control policy.

Reviewed By: pauamma_gundo.com
Differential Revision: https://reviews.freebsd.org/D39555
MFC after:	1 month
2023-04-25 12:40:23 +00:00
Alexander V. Chernikov
ca1850478f lltable: properly set expire time to 0 for static IPv4 entries.
MFC after:	2 weeks
2023-04-25 10:59:50 +00:00
Cheng Cui
1f782fcc0c
Remove unused fields in siftr_stats. Thus, update the man page as well.
Summary: Remove unused fields in siftr_stats. Thus, update the man page as well.

Test Plan: Tested in Emulab testbed.

Reviewers: rscheff, tuexen
Approved by: rscheff, tuexen
Subscribers: imp, melifaro, glebius
Differential Revision: https://reviews.freebsd.org/D39776
2023-04-24 15:31:15 -04:00
Cheng Cui
8aa2be695e
Correct the value of macro TF2_TCP_ACCOUNTING.
Summary: Make sure the values are in order.

Reviewers: rscheff, tuexen, #transport!
Approved by: rscheff, tuexen, glebius
Subscribers: imp, melifaro, glebius
Differential Revision: https://reviews.freebsd.org/D39716
2023-04-24 15:28:41 -04:00
Mark Johnston
93998d166f inpcb: Fix some bugs in _in_pcbinshash_wild()
- In _in_pcbinshash_wild(), we should avoid returning v6 sockets unless
  no other matches are available.  This preserves pre-existing
  semantics.
- Fix an inverted test: when inserting a non-jailed PCB, we want to
  search for the first non-jailed PCB in the hash chain.
- Test the right PCB when searching for a non-jailed PCB.

While here, add a required locking assertion.

Fixes:	7b92493ab1 ("inpcb: Avoid inp_cred dereferences in SMR-protected lookup")
2023-04-23 13:55:57 -04:00
Michael Tuexen
66d6fd5322 sctp: use constants from RFC 8260 to improve compliance
Keep the old constants for backwards compatibility.

MFC after:	1 week
2023-04-23 17:48:05 +02:00
Zhenlei Huang
b658c0fce1 ip_mroute: Delete unreachable code
As the flag M_WAITOK is passed to ip_encap_attach(), then the function
will never return NULL, and the following code within NULL check branch
will be unreachable.

No functional change intended.

Reviewed by:	kp
Fixes:		6d8fdfa9d5 Rework IP encapsulation handling code
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D39746
2023-04-23 12:47:57 +08:00
Randall Stewart
01216268f8 tcp: hpts needs to still call output even after input.
The other stacks it turns out actually expect the output to be called and can become stuck if it is
not. This is because they run there timer code from there and the input routine does not always
assure a timer is running. The real longterm fix here might be to go into the other stacks (rack and bbr)
and make sure that a timer is running after input if you don't do output.. as well as call the timer functions.
This would cut down on calls from hpts. But I think its too dramatic of a change for the immediate time.

Reviewed by: tuexen, glebius
Sponsored by: Netflix Inc
Differential Revision:https://reviews.freebsd.org/D39738
2023-04-21 07:12:25 -04:00
Gleb Smirnoff
8e813d07c6 netstat: fix printing of TCP pcbs with -A
This change touches both kernel and netstat(1), but either of the changes
will fix printing pcb addresses with -A.

The thing is that historically netstat(1) treated TCP differently, and
printed tcpcb address instead of inpcb address.  This is not documented
anywhere!  With e68b379244 these two addresses became the same.  It is
highly likely they will be the same for a long time, but it might be they
will start to differ again in a far future.  My proposal is to stop
treating TCP differently with netstat(1) and right now is a good opportunity
to do that, since there will be no behavior change at all.  The kernel
change to tcp_inptoxtp() will go into stable/14 to make it compatible with
netstat(1) binary from stable/13.  We can drop it later, probably together
with in_ppcb pointer from inpcb.  The in_ppcb in xinpcb will stay for size
compatibility.

Reviewed by:		tuexen, rrs
Differential Revision:	https://reviews.freebsd.org/D39736
2023-04-20 12:42:42 -07:00
Mark Johnston
5fd1a67e88 inpcb: Release the inpcb cred reference before freeing the structure
Now that the inp_cred pointer is accessed only while the inpcb lock is
held, we can avoid deferring a crfree() call when freeing an inpcb.

This fixes a problem introduced when inpcb hash tables started being
synchronized with SMR: the credential reference previously could not be
released until all lockless readers have drained, and there is no
mechanism to explicitly purge cached, freed UMA items.  Thus, ucred
references could linger indefinitely, and since ucreds hold a jail
reference, the jail would linger indefinitely as well.  This manifests
as jails getting stuck in the DYING state.

Discussed with:	glebius
Tested by:	glebius
Sponsored by:	Klara, Inc.
Sponsored by:	Modirum MDPay
Differential Revision:	https://reviews.freebsd.org/D38573
2023-04-20 12:13:06 -04:00
Mark Johnston
7b92493ab1 inpcb: Avoid inp_cred dereferences in SMR-protected lookup
The SMR-protected inpcb lookup algorithm currently has to check whether
a matching inpcb belongs to a jail, in order to prioritize jailed
bound sockets.  To do this it has to maintain a ucred reference, and for
this to be safe, the reference can't be released until the UMA
destructor is called, and this will not happen within any bounded time
period.

Changing SMR to periodically recycle garbage is not trivial.  Instead,
let's implement SMR-synchronized lookup without needing to dereference
inp_cred.  This will allow the inpcb code to free the inp_cred reference
immediately when a PCB is freed, ensuring that ucred (and thus jail)
references are released promptly.

Commit 220d892129 ("inpcb: immediately return matching pcb on lookup")
gets us part of the way there.  This patch goes further to handle
lookups of unconnected sockets.  Here, the strategy is to maintain a
well-defined order of items within a hash chain so that a wild lookup
can simply return the first match and preserve existing semantics.  This
makes insertion of listening sockets more complicated in order to make
lookup simpler, which seems like the right tradeoff anyway given that
bind() is already a fairly expensive operation and lookups are more
common.

In particular, when inserting an unconnected socket, in_pcbinhash() now
keeps the following ordering:
- jailed sockets before non-jailed sockets,
- specified local addresses before unspecified local addresses.

Most of the change adds a separate SMR-based lookup path for inpcb hash
lookups.  When a match is found, we try to lock the inpcb and
re-validate its connection info.  In the common case, this works well
and we can simply return the inpcb.  If this fails, typically because
something is concurrently modifying the inpcb, we go to the slow path,
which performs a serialized lookup.

Note, I did not touch lbgroup lookup, since there the credential
reference is formally synchronized by net_epoch, not SMR.  In
particular, lbgroups are rarely allocated or freed.

I think it is possible to simplify in_pcblookup_hash_wild_locked() now,
but I didn't do it in this patch.

Discussed with:	glebius
Tested by:	glebius
Sponsored by:	Klara, Inc.
Sponsored by:	Modirum MDPay
Differential Revision:	https://reviews.freebsd.org/D38572
2023-04-20 12:13:06 -04:00
Mark Johnston
3e98dcb3d5 inpcb: Move inpcb matching logic into separate functions
These functions will get some additional callers in future revisions.

No functional change intended.

Discussed with:	glebius
Tested by:	glebius
Sponsored by:	Modirum MDPay
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D38571
2023-04-20 12:13:06 -04:00
Mark Johnston
fdb987bebd inpcb: Split PCB hash tables
Currently we use a single hash table per PCB database for connected and
bound PCBs.  Since we started using net_epoch to synchronize hash table
lookups, there's been a bug, noted in a comment above in_pcbrehash():
connecting a socket can cause an inpcb to move between hash chains, and
this can cause a concurrent lookup to follow the wrong linkage pointers.
I believe this could cause rare, spurious ECONNREFUSED errors in the
worse case.

Address the problem by introducing a second hash table and adding more
linkage pointers to struct inpcb.  Now the database has one table each
for connected and unconnected sockets.

When inserting an inpcb into the hash table, in_pcbinhash() now looks at
the foreign address of the inpcb to figure out which table to use.  This
ensures that queue linkage pointers are stable until the socket is
disconnected, so the problem described above goes away.  There is also a
small benefit in that in_pcblookup_*() can now search just one of the
two possible hash buckets.

I also made the "rehash" parameter of in(6)_pcbconnect() unused.  This
parameter seems confusing and it is simpler to let the inpcb code figure
out what to do using the existing INP_INHASHLIST flag.

UDP sockets pose a special problem since they can be connected and
disconnected multiple times during their lifecycle.  To handle this, the
patch plugs a hole in the inpcb structure and uses it to store an SMR
sequence number.  When an inpcb is disconnected - an operation which
requires the global PCB database hash lock - the write sequence number
is advanced, and in order to reconnect, the connecting thread must wait
for readers to drain before reusing the inpcb's hash chain linkage
pointers.

raw_ip (ab)uses the hash table without using the corresponding
accessors.  Since there are now two hash tables, it arbitrarily uses the
"connected" table for all of its PCBs.  This will be addressed in some
way in the future.

inp interators which specify a hash bucket will only visit connected
PCBs.  This is not really correct, but nothing in the tree uses that
functionality except raw_ip, which as mentioned above places all of its
PCBs in the "connected" table and so is unaffected.

Discussed with:	glebius
Tested by:	glebius
Sponsored by:	Klara, Inc.
Sponsored by:	Modirum MDPay
Differential Revision:	https://reviews.freebsd.org/D38569
2023-04-20 12:13:06 -04:00
Randall Stewart
4e8a20a764 tcp: rack the request level logging is a bit too noisy when doing point logging.
When doing request level BB logging the hybrid_bw_log() does not have proper screening to minimize logging
when point level logging is in use. Lets fix it properly so you have to have the proper knobs set to get the
more noisy logging.

Reviewed by: tuexen
Sponsored by: Netflix Inc
Differential Revision:https://reviews.freebsd.org/D39699
2023-04-19 14:02:12 -04:00
Randall Stewart
7a842346c3 tcp: Rack can crash with the new non-TSO fix..
Turns out the location of the check to see if we can do output is in the wrong place. We need
to jump off to the compressed acks before handling that case since th is NULL in the
compressed ack case which is handled differently anyway.

Reviewed by: tuexen
Sponsored by: Netflix Inc
Differential Revision:https://reviews.freebsd.org/D39690
2023-04-19 13:17:04 -04:00
Randall Stewart
303246dcdf We have a TCP_LOG_CONNEND log that should come out at the very last log of every connection. This
holds some nice stats about why/how the connection ended. Though with the current code it does not
come out without accounting due to the placement of the ifdefs. Also we need to make sure the stacks
fini has ran before calling in from tcp_subr so we get all logs the stack may make at its ending.

Reviewed by: rscheff
Sponsored by: Netflix Inc
Differential Revision:https://reviews.freebsd.org/D39693
2023-04-19 12:54:25 -04:00
Randall Stewart
960985a209 tcp: bbr.c is non-capable of doing ECN and sets an INP flag to fend off ECN however our syncache is not aware of that flag.
We need to make the syncache aware of the flag and not do ECN if its set. Note that this
is not 100% full proof but the best we can do (i.e. its still possible that you can get in a
situation where the peer try's to do ecn).

Reviewed by: tuexen, glebius, rscheff
Sponsored by: Netflix Inc
Differential Revision:https://reviews.freebsd.org/D39672
2023-04-18 12:21:56 -04:00
Randall Stewart
2ad584c555 tcp: Inconsistent use of hpts_calling flag
Gleb has noticed there were some inconsistency's in the way the inp_hpts_calls flag was being used. One
such inconsistency results in a bug when we can't allocate enough sendmap entries to entertain a call to
rack_output().. basically a timer won't get started like it should. Also in cleaning this up I find that the
"no_output" side of input needs to be adjusted to make sure we don't try to re-pace too quickly outside
the hpts assurance of 250useconds.

Another thing here is we end up with duplicate calls to tcp_output() which we should not. If packets go
from hpts for processing the input side of tcp will call the output side of tcp on the last packet if it is needed.
This means that when that occurs a second call to tcp_output would be made that is not needed and if pacing
is going on may be harmful.

Lets fix all this and explicitly state the contract that hpts is making with transports that care about the
flag.

Reviewed by: tuexen, glebius
Sponsored by: Netflix Inc
Differential Revision:https://reviews.freebsd.org/D39653
2023-04-17 17:10:26 -04:00
Randall Stewart
37229fed38 tcp: Blackbox logging and tcp accounting together can cause a crash.
If you currently turn BB logging on and in combination have TCP Accounting on we can get a
crash where we have no NULL check and we run out of memory. Also lets make sure we
don't do a divide by 0 in calculating any BB ratios.

Reviewed by: tuexen
Sponsored by: Netflix Inc
Differential Revision:https://reviews.freebsd.org/D39622
2023-04-17 13:52:00 -04:00
Gleb Smirnoff
3232b1f4a9 tcp: fix build
The recent 25685b7537 came in conflict with a540cdca31.  Remove the
code that cleans up the old style input queue.  Note that two lines
below we assert that the new style input queue is empty.  The TCP
stacks that use the queue are supposed to flush it in their
tfb_tcp_fb_fini method.
2023-04-17 10:24:20 -07:00
Gleb Smirnoff
a540cdca31 tcp_hpts: use queue(9) STAILQ for the input queue
Reviewed by:		rrs
Differential Revision:	https://reviews.freebsd.org/D39574
2023-04-17 09:07:23 -07:00
Randall Stewart
3cc7b66732 tcp: stack unloading crash in rack and bbr
Its possible to induce a crash in either rack or bbr. This would be done
if the rack stack were say the default and bbr was being used by a connection.
If the bbr stack is then unloaded and it was active, we will trigger a MPASS assert
in tcp_hpts since the new stack (default rack) would start a timer, and the old stack
(bbr) would have the inp already in hpts.

Reviewed by: tuexen
Sponsored by: Netflix Inc
Differential Revision:https://reviews.freebsd.org/D39576
2023-04-14 15:42:23 -04:00
Randall Stewart
9903bf34f0 tcp: rack pacing has some caveats that need to be obeyed when LRO is missing
n further non-LRO testing I found a case where rack is supposed to be waking up but
it is not now. In this special case it sets the flag rc_ack_can_sendout_data. When that is
set we should not prohibit output.

Reviewed by: tuexen
Sponsored by: Netflix Inc
Differential Revision:https://reviews.freebsd.org/D39565
2023-04-14 09:33:36 -04:00
Randall Stewart
25685b7537 TCP: Misc cleanups of tcp_subr.c
In going through all the TCP stacks I have found we have a few little bugs and niggles that need to
be cleaned up in tcp_subr.c including the following:

a) Set tcp_restoral_thresh to 450 (45%) not 550. This is a better proven value in my testing.
b) Lets track when we try to do pacing but fail via a counter for connections that do pace.
c) If a switch away from the default stack occurs and it fails we need to make sure the time
   scale is in the right mode (just in case the other stack changed it but then failed).
d) Use the TP_RXTCUR() macro when starting the TT_REXMT timer.
e) When we end a default flow lets log that in BBlogs as well as cleanup any t_acktime (disable).
f) When we respond with a RST lets make sure to update the log_end_status properly.
g) When starting a new pcb lets assure that all LRO features are off.
h) When discarding a connection lets make sure that any t_in_pkt's that might be there are freed properly.

Reviewed by: tuexen
Sponsored by: Netflix Inc
Differential Revision:https://reviews.freebsd.org/D39501
2023-04-13 09:29:05 -04:00
Michael Tuexen
2ba2849c82 tcp: fix typo in comment
Reported by:	cc
MFC after:	1 week
Sponsored by:	Netflix, Inc.
2023-04-12 18:08:21 +02:00
Michael Tuexen
c687f21add tcp: make net.inet.tcp.functions_default vnet specific
Reviewed by:		cc, rrs
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D39516
2023-04-12 18:04:27 +02:00
Randall Stewart
1073f41657 tcp_lro: When processing compressed acks lets support the new early wake feature for rack.
During compressed ack and mbuf queuing we determine if we need to wake up. A
new function was added that is optional to the tfb so that the stack itself can also
be asked if a wakeup should happen. This helps compensate for late hpts calls.

Reviewed by: tuexen
Sponsored by: Netflix Inc
Differential Revision:https://reviews.freebsd.org/D39502
2023-04-12 11:35:14 -04:00
Michael Tuexen
73c48d9d8f tcp: fix deregistering stacks when vnets are used
This fixes a bug where stacks could not be deregistered when
end points in the non-default vnet are using it.

Reviewed by:		glebius, zlei
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D39514
2023-04-12 10:52:53 +02:00
Richard Scheffenegger
2169f71277 tcp: use IPV6_FLOWLABEL_LEN
Avoid magic numbers when handling the IPv6 flow ID for
DSCP and ECN fields and use the named variable instead.

Reviewed By:		tuexen, #transport
Sponsored by:		NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D39503
2023-04-11 18:53:51 +02:00
Randall Stewart
a2b33c9a7a tcp: Rack - in the absence of LRO fixed rate pacing (loopback or interfaces with no LRO) does not work correctly.
Rack is capable of fixed rate or dynamic rate pacing. Both of these can get mixed up when
LRO is not available. This is because LRO will hold off waking up the tcp connection to
processing the inbound packets until the pacing timer is up. Without LRO the pacing only
sort-of works. Sometimes we pace correctly, other times not so much.

This set of changes will make it so pacing works properly in the absence of LRO.

Reviewed by: tuexen
Sponsored by: Netflix Inc
Differential Revision:https://reviews.freebsd.org/D39494
2023-04-10 16:33:56 -04:00
John Baldwin
e222461790 rack: mask and tclass are only used for INET6.
This fixes the LINT-NOINET6 build.
2023-04-10 12:21:03 -07:00
John Baldwin
4b6228906f libalias: Mark set but unused variables as unused.
This function is clearly a stub, but it seems better to leave the stub
bits in place than to remove the function entirely.

Differential Revision:	https://reviews.freebsd.org/D39355
2023-04-10 10:35:29 -07:00
Gleb Smirnoff
66fbc19fbd tcp: pass tcpcb in the tfb_tcp_ctloutput() method instead of inpcb
Just matches rest of the KPI.

Reviewed by:		rrs
Differential Revision:	https://reviews.freebsd.org/D39435
2023-04-07 12:18:10 -07:00
Gleb Smirnoff
35bc0bcc51 tcp: reduce argument list to functions that pass a segment
The socket argument is superfluous, as a tcpcb always has one and
only one socket.

Reviewed by:		rrs
Differential Revision:	https://reviews.freebsd.org/D39434
2023-04-07 12:18:06 -07:00
Gleb Smirnoff
de4368dd84 tcp: retire tfb_tcp_hpts_do_segment()
Isn't in use anymore.  Correct comments that mention it.

Reviewed by:		rrs
Differential Revision:	https://reviews.freebsd.org/D39433
2023-04-07 12:18:02 -07:00
Randall Stewart
945f9a7cc9 tcp: misc cleanup of options for rack as well as socket option logging.
Both BBR and Rack have the ability to log socket options, which is currently disabled. Rack
has an experimental SaD (Sack Attack Detection) algorithm that should be made available. Also
there is a t_maxpeak_rate that needs to be removed (its un-used).

Reviewed by: tuexen, cc
Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D39427
2023-04-07 10:15:29 -04:00
Mateusz Guzik
c67eb393fa tcp_hpts: plug a compiler warn
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2023-04-05 14:32:13 +00:00
Gleb Smirnoff
84b42df834 rack: fix build on powerpc 2023-04-04 16:35:36 -07:00