Commit Graph

332 Commits

Author SHA1 Message Date
Gleb Smirnoff
4287aa5619 tcp_usr_shutdown: don't cast inp_ppcb to tcpcb before checking inp_flags
While here move out one more erroneous condition out of the epoch and
common return.  The only functional change is that if we send control
on a shut down socket we would get EINVAL instead of ECONNRESET.

Reviewed by:	tuexen
Reported by:	syzbot+8388cf7f401a7b6bece6@syzkaller.appspotmail.com
Fixes:		f64dc2ab5b
2021-12-28 08:50:02 -08:00
Gleb Smirnoff
0af4ce4547 tcp_usr_shutdown: don't cast inp_ppcb to tcpcb before checking inp_flags
Fixes:	f64dc2ab5b
2021-12-27 16:58:09 -08:00
Gleb Smirnoff
37a7f55737 tcp_usr_rcvd: don't cast inp_ppcb to tcpcb before checking inp_flags
Fixes:	f64dc2ab5b
2021-12-27 10:41:51 -08:00
Gleb Smirnoff
a370832bec tcp: remove delayed drop KPI
No longer needed after tcp_output() can ask caller to drop.

Reviewed by:		rrs, tuexen
Differential revision:	https://reviews.freebsd.org/D33371
2021-12-26 08:48:24 -08:00
Gleb Smirnoff
f64dc2ab5b tcp: TCP output method can request tcp_drop
The advanced TCP stacks (bbr, rack) may decide to drop a TCP connection
when they do output on it.  The default stack never does this, thus
existing framework expects tcp_output() always to return locked and
valid tcpcb.

Provide KPI extension to satisfy demands of advanced stacks.  If the
output method returns negative error code, it means that caller must
call tcp_drop().

In tcp_var() provide three inline methods to call tcp_output():
- tcp_output() is a drop-in replacement for the default stack, so that
  default stack can continue using it internally without modifications.
  For advanced stacks it would perform tcp_drop() and unlock and report
  that with negative error code.
- tcp_output_unlock() handles the negative code and always converts
  it to positive and always unlocks.
- tcp_output_nodrop() just calls the method and leaves the responsibility
  to drop on the caller.

Sweep over the advanced stacks and use new KPI instead of using HPTS
delayed drop queue for that.

Reviewed by:		rrs, tuexen
Differential revision:	https://reviews.freebsd.org/D33370
2021-12-26 08:48:19 -08:00
Gleb Smirnoff
40fa3e40b5 tcp: mechanically substitute call to tfb_tcp_output to new method.
Made with sed(1) execution:

sed -Ef sed -i "" $(grep --exclude tcp_var.h -lr tcp_output sys/)

sed:
s/tp->t_fb->tfb_tcp_output\(tp\)/tcp_output(tp)/
s/to tfb_tcp_output\(\)/to tcp_output()/

Reviewed by:		rrs, tuexen
Differential revision:	https://reviews.freebsd.org/D33366
2021-12-26 08:47:59 -08:00
Gleb Smirnoff
ef396441ce tcp_usr_detach: revert debugging piece from f5cf1e5f5a.
The code was probably useful during the problem being chased down,
but for brevity makes sense just to return to the original KASSERT.

Reviewed by:		rrs
Differential revision:	https://reviews.freebsd.org/D32968
2021-11-13 08:33:32 -08:00
Michael Tuexen
df07bfda67 tcp: Fix a locking issue
INP_WLOCK_RECHECK_CLEANUP() and INP_WLOCK_RECHECK() might return
from the function, so any locks held must be released.

Reported by:		syzbot+b1a888df08efaa7b4bf1@syzkaller.appspotmail.com
Reviewed by:		markj
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D32975
2021-11-12 22:13:50 +01:00
Randall Stewart
b8d60729de tcp: Congestion control cleanup.
NOTE: HEADS UP read the note below if your kernel config is not including GENERIC!!

This patch does a bit of cleanup on TCP congestion control modules. There were some rather
interesting surprises that one could get i.e. where you use a socket option to change
from one CC (say cc_cubic) to another CC (say cc_vegas) and you could in theory get
a memory failure and end up on cc_newreno. This is not what one would expect. The
new code fixes this by requiring a cc_data_sz() function so we can malloc with M_WAITOK
and pass in to the init function preallocated memory. The CC init is expected in this
case *not* to fail but if it does and a module does break the
"no fail with memory given" contract we do fall back to the CC that was in place at the time.

This also fixes up a set of common newreno utilities that can be shared amongst other
CC modules instead of the other CC modules reaching into newreno and executing
what they think is a "common and understood" function. Lets put these functions in
cc.c and that way we have a common place that is easily findable by future developers or
bug fixers. This also allows newreno to evolve and grow support for its features i.e. ABE
and HYSTART++ without having to dance through hoops for other CC modules, instead
both newreno and the other modules just call into the common functions if they desire
that behavior or roll there own if that makes more sense.

Note: This commit changes the kernel configuration!! If you are not using GENERIC in
some form you must add a CC module option (one of CC_NEWRENO, CC_VEGAS, CC_CUBIC,
CC_CDG, CC_CHD, CC_DCTCP, CC_HTCP, CC_HD). You can have more than one defined
as well if you desire. Note that if you create a kernel configuration that does not
define a congestion control module and includes INET or INET6 the kernel compile will
break. Also you need to define a default, generic adds 'options CC_DEFAULT=\"newreno\"
but you can specify any string that represents the name of the CC module (same names
that show up in the CC module list under net.inet.tcp.cc). If you fail to add the
options CC_DEFAULT in your kernel configuration the kernel build will also break.

Reviewed by: Michael Tuexen
Sponsored by: Netflix Inc.
RELNOTES:YES
Differential Revision: https://reviews.freebsd.org/D32693
2021-11-11 06:28:18 -05:00
Gleb Smirnoff
f581a26e46 Factor out tcp6_use_min_mtu() to handle IPV6_USE_MIN_MTU by TCP.
Pass control for IP/IP6 level options from generic tcp_ctloutput_set()
down to per-stack ctloutput.

Call tcp6_use_min_mtu() from tcp stack tcp_default_ctloutput().

Reviewed by:		rrs
Differential Revision:	https://reviews.freebsd.org/D32655
2021-10-27 08:22:00 -07:00
Gleb Smirnoff
de156263a5 Several IP level socket options may affect TCP.
After handling them in IP level ctloutput, pass them down to TCP
ctloutput.

We already have a hack to handle IPV6_USE_MIN_MTU. Leave it in place
for now, but comment out how it should be handled.

For IPv4 we are interested in IP_TOS and IP_TTL.

Reviewed by:		rrs
Differential Revision:	https://reviews.freebsd.org/D32655
2021-10-27 08:21:59 -07:00
Gleb Smirnoff
fc4d53cc2e Split tcp_ctloutput() into set/get parts.
Reviewed by:		rrs
Differential Revision:	https://reviews.freebsd.org/D32655
2021-10-27 08:21:59 -07:00
Peter Lei
e28330832b tcp: socket option to get stack alias name
TCP stack sysctl nodes are currently inserted using the stack
name alias. Allow the user to get the current stack's alias to
allow for programatic sysctl access.

Obtained from:	Netflix
2021-10-27 08:21:59 -07:00
Mark Johnston
bf25678226 ktls: Fix error/mode confusion in TCP_*TLS_MODE getsockopt handlers
ktls_get_(rx|tx)_mode() can return an errno value or a TLS mode, so
errors are effectively hidden.  Fix this by using a separate output
parameter.  Convert to the new socket buffer locking macros while here.

Note that the socket buffer lock is not needed to synchronize the
SOLISTENING check here, we can rely on the PCB lock.

Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31977
2021-09-17 14:19:05 -04:00
Mark Johnston
bd4a39cc93 socket: Properly interlock when transitioning to a listening socket
Currently, most protocols implement pru_listen with something like the
following:

	SOCK_LOCK(so);
	error = solisten_proto_check(so);
	if (error) {
		SOCK_UNLOCK(so);
		return (error);
	}
	solisten_proto(so);
	SOCK_UNLOCK(so);

solisten_proto_check() fails if the socket is connected or connecting.
However, the socket lock is not used during I/O, so this pattern is
racy.

The change modifies solisten_proto_check() to additionally acquire
socket buffer locks, and the calling thread holds them until
solisten_proto() or solisten_proto_abort() is called.  Now that the
socket buffer locks are preserved across a listen(2), this change allows
socket I/O paths to properly interlock with listen(2).

This fixes a large number of syzbot reports, only one is listed below
and the rest will be dup'ed to it.

Reported by:	syzbot+9fece8a63c0e27273821@syzkaller.appspotmail.com
Reviewed by:	tuexen, gallatin
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31659
2021-09-07 17:11:43 -04:00
Michael Tuexen
3f1f6b6ef7 tcp, udp: improve input validation in handling bind()
Reported by:		syzbot+24fcfd8057e9bc339295@syzkaller.appspotmail.com
Reported by:		syzbot+6e90ceb5c89285b2655b@syzkaller.appspotmail.com
Reviewed by:		markj, rscheff
MFC after:		3 days
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D31422
2021-08-05 13:48:44 +02:00
Randall Stewart
4747500dea tcp: A better fix for the previously attempted fix of the ack-war issue with tcp.
So it turns out that my fix before was not correct. It ended with us failing
some of the "improved" SYN tests, since we are not in the correct states.
With more digging I have figured out the root of the problem is that when
we receive a SYN|FIN the reassembly code made it so we create a segq entry
to hold the FIN. In the established state where we were not in order this
would be correct i.e. a 0 len with a FIN would need to be accepted. But
if you are in a front state we need to strip the FIN so we correctly handle
the ACK but ignore the FIN. This gets us into the proper states
and avoids the previous ack war.

I back out some of the previous changes but then add a new change
here in tcp_reass() that fixes the root cause of the issue. We still
leave the rack panic fixes in place however.

Reviewed by: mtuexen
Sponsored by: Netflix Inc
Differential Revision:	https://reviews.freebsd.org/D30627
2021-06-04 05:26:43 -04:00
Mark Johnston
f96603b56f tcp, udp: Permit binding with AF_UNSPEC if the address is INADDR_ANY
Prior to commit f161d294b we only checked the sockaddr length, but now
we verify the address family as well.  This breaks at least ttcp.  Relax
the check to avoid breaking compatibility too much: permit AF_UNSPEC if
the address is INADDR_ANY.

Fixes:		f161d294b
Reported by:	Bakul Shah <bakul@iitbombay.org>
Reviewed by:	tuexen
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D30539
2021-05-31 18:53:34 -04:00
Andrew Gallatin
086a35562f tcp: enter network epoch when calling tfb_tcp_fb_fini
We need to enter the network epoch when calling into
tfb_tcp_fb_fini.  I noticed this when I hit an assert
running the latest rack

Differential Revision: https://reviews.freebsd.org/D30407
Reviewed by: rrs, tuexen
Sponsored by: Netflix
2021-05-25 13:45:37 -04:00
Randall Stewart
13c0e198ca tcp: Fix bugs related to the PUSH bit and rack and an ack war
Michaels testing with UDP tunneling found an issue with the push bit, which was only partly fixed
in the last commit. The problem is the left edge gets transmitted before the adjustments are done
to the send_map, this means that right edge bits must be considered to be added only if
the entire RSM is being retransmitted.

Now syzkaller also continued to find a crash, which Michael sent me the reproducer for. Turns
out that the reproducer on default (freebsd) stack made the stack get into an ack-war with itself.
After fixing the reference issues in rack the same ack-war was found in rack (and bbr). Basically
what happens is we go into the reassembly code and lose the FIN bit. The trick here is we
should not be going into the reassembly code if tlen == 0 i.e. the peer never sent you anything.
That then gets the proper action on the FIN bit but then you end up in LAST_ACK with no
timers running. This is because the usrclosed function gets called and the FIN's and such have
already been exchanged. So when we should be entering FIN_WAIT2 (or even FIN_WAIT1) we get
stuck in LAST_ACK. Fixing this means tweaking the usrclosed function so that we properly
recognize the condition and drop into FIN_WAIT2 where a timer will allow at least TP_MAXIDLE
before closing (to allow time for the peer to retransmit its FIN if the ack is lost). Setting the fast_finwait2
timer can speed this up in testing.

Reviewed by: mtuexen,rscheff
Sponsored by: Netflix Inc
Differential Revision:	https://reviews.freebsd.org/D30451
2021-05-25 13:23:31 -04:00
Mark Johnston
7d2608a5d2 tcp: Make error handling in tcp_usr_send() more consistent
- Free the input mbuf in a single place instead of in every error path.
- Handle PRUS_NOTREADY consistently.
- Flush the socket's send buffer if an implicit connect fails.  At that
  point the mbuf has already been enqueued but we don't want to keep it
  in the send buffer.

Reviewed by:	gallatin, tuexen
Discussed with:	jhb
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D30349
2021-05-21 17:45:18 -04:00
Mark Johnston
d8acd2681b Fix mbuf leaks in various pru_send implementations
The various protocol implementations are not very consistent about
freeing mbufs in error paths.  In general, all protocols must free both
"m" and "control" upon an error, except if PRUS_NOTREADY is specified
(this is only implemented by TCP and unix(4) and requires further work
not handled in this diff), in which case "control" still must be freed.

This diff plugs various leaks in the pru_send implementations.

Reviewed by:	tuexen
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D30151
2021-05-12 13:00:09 -04:00
Richard Scheffenegger
0471a8c734 tcp: SACK Lost Retransmission Detection (LRD)
Recover from excessive losses without reverting to a
retransmission timeout (RTO). Disabled by default, enable
with sysctl net.inet.tcp.do_lrd=1

Reviewed By: #transport, rrs, tuexen, #manpages
Sponsored by: Netapp, Inc.
Differential Revision: https://reviews.freebsd.org/D28931
2021-05-10 19:06:20 +02:00
Mark Johnston
f161d294b9 Add missing sockaddr length and family validation to various protocols
Several protocol methods take a sockaddr as input.  In some cases the
sockaddr lengths were not being validated, or were validated after some
out-of-bounds accesses could occur.  Add requisite checking to various
protocol entry points, and convert some existing checks to assertions
where appropriate.

Reported by:	syzkaller+KASAN
Reviewed by:	tuexen, melifaro
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29519
2021-05-03 13:35:19 -04:00
Michael Tuexen
9e644c2300 tcp: add support for TCP over UDP
Adding support for TCP over UDP allows communication with
TCP stacks which can be implemented in userspace without
requiring special priviledges or specific support by the OS.
This is joint work with rrs.

Reviewed by:		rrs
Sponsored by:		Netflix, Inc.
MFC after:		1 week
Differential Revision:	https://reviews.freebsd.org/D29469
2021-04-18 16:16:42 +02:00
Kyle Evans
4c0bef07be kern: net: remove TCP_LINGERTIME
TCP_LINGERTIME can be traced back to BSD 4.4 Lite and perhaps beyond, in
exactly the same form that it appears here modulo slightly different
context.  It used to be the case that there was a single pr_usrreq
method with requests dispatched to it; these exact two lines appeared in
tcp_usrreq's PRU_ATTACH handling.

The only purpose of this that I can find is to cause surprising behavior
on accepted connections. Newly-created sockets will never hit these
paths as one cannot set SO_LINGER prior to socket(2). If SO_LINGER is
set on a listening socket and inherited, one would expect the timeout to
be inherited rather than changed arbitrarily like this -- noting that
SO_LINGER is nonsense on a listening socket beyond inheritance, since
they cannot be 'connected' by definition.

Neither Illumos nor Linux reset the timer like this based on testing and
inspection of Illumos, and testing of Linux.

Reviewed by:	rscheff, tuexen
Differential Revision:	https://reviews.freebsd.org/D28265
2021-02-18 22:36:01 -06:00
Andrew Gallatin
a034518ac8 Filter TCP connections to SO_REUSEPORT_LB listen sockets by NUMA domain
In order to efficiently serve web traffic on a NUMA
machine, one must avoid as many NUMA domain crossings as
possible. With SO_REUSEPORT_LB, a number of workers can share a
listen socket. However, even if a worker sets affinity to a core
or set of cores on a NUMA domain, it will receive connections
associated with all NUMA domains in the system. This will lead to
cross-domain traffic when the server writes to the socket or
calls sendfile(), and memory is allocated on the server's local
NUMA node, but transmitted on the NUMA node associated with the
TCP connection. Similarly, when the server reads from the socket,
he will likely be reading memory allocated on the NUMA domain
associated with the TCP connection.

This change provides a new socket ioctl, TCP_REUSPORT_LB_NUMA. A
server can now tell the kernel to filter traffic so that only
incoming connections associated with the desired NUMA domain are
given to the server. (Of course, in the case where there are no
servers sharing the listen socket on some domain, then as a
fallback, traffic will be hashed as normal to all servers sharing
the listen socket regardless of domain). This allows a server to
deal only with traffic that is local to its NUMA domain, and
avoids cross-domain traffic in most cases.

This patch, and a corresponding small patch to nginx to use
TCP_REUSPORT_LB_NUMA allows us to serve 190Gb/s of kTLS encrypted
https media content from dual-socket Xeons with only 13% (as
measured by pcm.x) cross domain traffic on the memory controller.

Reviewed by:	jhb, bz (earlier version), bcr (man page)
Tested by: gonzo
Sponsored by:	Netfix
Differential Revision:	https://reviews.freebsd.org/D21636
2020-12-19 22:04:46 +00:00
Mateusz Guzik
662c13053f net: clean up empty lines in .c and .h files 2020-09-01 21:19:14 +00:00
Michael Tuexen
f903a308a1 (Re)-allow 0.0.0.0 to be used as an address in connect() for TCP
In r361752 an error handling was introduced for using 0.0.0.0 or
255.255.255.255 as the address in connect() for TCP, since both
addresses can't be used. However, the stack maps 0.0.0.0 implicitly
to a local address and at least two regressions were reported.
Therefore, re-allow the usage of 0.0.0.0.
While there, change the error indicated when using 255.255.255.255
from EAFNOSUPPORT to EACCES as mentioned in the man-page of connect().

Reviewed by:		rrs
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D25401
2020-07-16 16:46:24 +00:00
Randall Stewart
e854dd38ac An important statistic in determining if a server process (or client) is being delayed
is to know the time to first byte in and time to first byte out. Currently we
have no way to know these all we have is t_starttime. That (t_starttime) tells us
what time the 3 way handshake completed. We don't know when the first
request came in or how quickly we responded. Nor from a client perspective
do we know how long from when we sent out the first byte before the
server responded.

This small change adds the ability to track the TTFB's. This will show up in
BB logging which then can be pulled for later analysis. Note that currently
the tracking is via the ticks variable of all three variables. This provides
a very rough estimate (hz=1000 its 1ms). A follow-on set of work will be
to change all three of these values into something with a much finer resolution
(either microseconds or nanoseconds), though we may want to make the resolution
configurable so that on lower powered machines we could still use the much
cheaper ticks variable.

Sponsored by:	Netflix Inc.
Differential Revision:	https://reviews.freebsd.org/D24902
2020-06-08 11:48:07 +00:00
Randall Stewart
2cf21ae559 We should never allow either the broadcast or IN_ADDR_ANY to be
connected to or sent to. This was fond when working with Michael
Tuexen and Skyzaller. Skyzaller seems to want to use either of
these two addresses to connect to at times. And it really is
an error to do so, so lets not allow that behavior.

Sponsored by:	Netflix Inc.
Differential Revision:	https://reviews.freebsd.org/D24852
2020-06-03 14:16:40 +00:00
Michael Tuexen
d442a65733 Restrict enabling TCP-FASTOPEN to end-points in CLOSED or LISTEN state
Enabling TCP-FASTOPEN on an end-point which is in a state other than
CLOSED or LISTEN, is a bug in the application. So it should not work.
Also the TCP code does not (and needs not to) handle this.
While there, also simplify the setting of the TF_FASTOPEN flag.

This issue was found by running syzkaller.

Reviewed by:		rrs
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D25115
2020-06-03 13:51:53 +00:00
Mike Karels
2510235150 Allow TCP to reuse local port with different destinations
Previously, tcp_connect() would bind a local port before connecting,
forcing the local port to be unique across all outgoing TCP connections
for the address family. Instead, choose a local port after selecting
the destination and the local address, requiring only that the tuple
is unique and does not match a wildcard binding.

Reviewed by:	tuexen (rscheff, rrs previous version)
MFC after:	1 month
Sponsored by:	Forcepoint LLC
Differential Revision:	https://reviews.freebsd.org/D24781
2020-05-18 22:53:12 +00:00
Michael Tuexen
e240ce42bf Allow only IPv4 addresses in sendto() for TCP on AF_INET sockets.
This problem was found by looking at syzkaller reproducers for some other
problems.

Reviewed by:		rrs
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D24831
2020-05-15 14:06:37 +00:00
Michael Tuexen
9d176904ae Remove trailing whitespace. 2020-05-10 17:43:42 +00:00
Randall Stewart
d3b6c96b7d Adjust the fb to have a way to ask the underlying stack
if it can support the PRUS option (OOB). And then have
the new function call that to validate and give the
correct error response if needed to the user (rack
and bbr do not support obsoleted OOB data).

Sponsoered by: Netflix Inc.
Differential Revision:	 https://reviews.freebsd.org/D24574
2020-05-04 20:19:57 +00:00
John Baldwin
f1f9347546 Initial support for kernel offload of TLS receive.
- Add a new TCP_RXTLS_ENABLE socket option to set the encryption and
  authentication algorithms and keys as well as the initial sequence
  number.

- When reading from a socket using KTLS receive, applications must use
  recvmsg().  Each successful call to recvmsg() will return a single
  TLS record.  A new TCP control message, TLS_GET_RECORD, will contain
  the TLS record header of the decrypted record.  The regular message
  buffer passed to recvmsg() will receive the decrypted payload.  This
  is similar to the interface used by Linux's KTLS RX except that
  Linux does not return the full TLS header in the control message.

- Add plumbing to the TOE KTLS interface to request either transmit
  or receive KTLS sessions.

- When a socket is using receive KTLS, redirect reads from
  soreceive_stream() into soreceive_generic().

- Note that this interface is currently only defined for TLS 1.1 and
  1.2, though I believe we will be able to reuse the same interface
  and structures for 1.3.
2020-04-27 23:17:19 +00:00
John Baldwin
ec1db6e13d Add the initial sequence number to the TLS enable socket option.
This will be needed for KTLS RX.

Reviewed by:	gallatin
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D24451
2020-04-27 22:31:42 +00:00
Michael Tuexen
a357466592 sack_newdata and snd_recover hold the same value. Therefore, use only
a single instance: use snd_recover also where sack_newdata was used.

Submitted by:		Richard Scheffenegger
Differential Revision:	https://reviews.freebsd.org/D18811
2020-02-13 15:14:46 +00:00
Randall Stewart
481be5de9d White space cleanup -- remove trailing tab's or spaces
from any line.

Sponsored by:	Netflix Inc.
2020-02-12 13:31:36 +00:00
Gleb Smirnoff
42ce79378d Fix missing NET_EPOCH_ENTER() when compiled with TCP_OFFLOAD.
Reported by:	Coverity
CID:		1413162
2020-01-29 22:48:18 +00:00
Bjoern A. Zeeb
7754e281c0 Fix NOINET kernels after r356983.
All gotos to the label are within the #ifdef INET section, which leaves
us with an unused label.  Cover the label under #ifdef INET as well to
avoid the warning and compile time error.
2020-01-22 15:06:59 +00:00
Gleb Smirnoff
c1604fe4d2 Make in_pcbladdr() require network epoch entered by its callers. Together
with this widen network epoch coverage up to tcp_connect() and udp_connect().

Revisions from r356974 and up to this revision cover D23187.

Differential Revision:	https://reviews.freebsd.org/D23187
2020-01-22 06:10:41 +00:00
Gleb Smirnoff
e2636f0a78 Remove extraneous NET_EPOCH_ASSERT - the full function is covered. 2020-01-22 06:07:27 +00:00
Gleb Smirnoff
3fed74e90f Re-absorb tcp_detach() back into tcp_usr_detach() as the comment suggests.
Not a functional change.
2020-01-22 06:06:27 +00:00
Gleb Smirnoff
5fc8df3c49 Don't enter network epoch in tcp_usr_detach. A PCB removal doesn't
require that.
2020-01-22 06:04:56 +00:00
Gleb Smirnoff
7669c586da tcp_usr_attach() doesn't need network epoch. in_pcbfree() and
in_pcbdetach() perform all necessary synchronization themselves.
2020-01-22 06:01:26 +00:00
Gleb Smirnoff
0f6385e705 Inline tcp_attach() into tcp_usr_attach(). Not a functional change. 2020-01-22 05:54:58 +00:00
Gleb Smirnoff
109eb549e1 Make tcp_output() require network epoch.
Enter the epoch before calling into tcp_output() from those
functions, that didn't do that before.

This eliminates a bunch of epoch recursions in TCP.
2020-01-22 05:53:16 +00:00
Edward Tomasz Napierala
adc56f5a38 Make use of the stats(3) framework in the TCP stack.
This makes it possible to retrieve per-connection statistical
information such as the receive window size, RTT, or goodput,
using a newly added TCP_STATS getsockopt(3) option, and extract
them using the stats_voistat_fetch(3) API.

See the net/tcprtt port for an example consumer of this API.

Compared to the existing TCP_INFO system, the main differences
are that this mechanism is easy to extend without breaking ABI,
and provides statistical information instead of raw "snapshots"
of values at a given point in time.  stats(3) is more generic
and can be used in both userland and the kernel.

Reviewed by:	thj
Tested by:	thj
Obtained from:	Netflix
Relnotes:	yes
Sponsored by:	Klara Inc, Netflix
Differential Revision:	https://reviews.freebsd.org/D20655
2019-12-02 20:58:04 +00:00