freebsd-dev

Author	SHA1	Message	Date
Gleb Smirnoff	4287aa5619	tcp_usr_shutdown: don't cast inp_ppcb to tcpcb before checking inp_flags While here move out one more erroneous condition out of the epoch and common return. The only functional change is that if we send control on a shut down socket we would get EINVAL instead of ECONNRESET. Reviewed by: tuexen Reported by: syzbot+8388cf7f401a7b6bece6@syzkaller.appspotmail.com Fixes: `f64dc2ab5b`	2021-12-28 08:50:02 -08:00
Gleb Smirnoff	0af4ce4547	tcp_usr_shutdown: don't cast inp_ppcb to tcpcb before checking inp_flags Fixes: `f64dc2ab5b`	2021-12-27 16:58:09 -08:00
Gleb Smirnoff	37a7f55737	tcp_usr_rcvd: don't cast inp_ppcb to tcpcb before checking inp_flags Fixes: `f64dc2ab5b`	2021-12-27 10:41:51 -08:00
Gleb Smirnoff	a370832bec	tcp: remove delayed drop KPI No longer needed after tcp_output() can ask caller to drop. Reviewed by: rrs, tuexen Differential revision: https://reviews.freebsd.org/D33371	2021-12-26 08:48:24 -08:00
Gleb Smirnoff	f64dc2ab5b	tcp: TCP output method can request tcp_drop The advanced TCP stacks (bbr, rack) may decide to drop a TCP connection when they do output on it. The default stack never does this, thus existing framework expects tcp_output() always to return locked and valid tcpcb. Provide KPI extension to satisfy demands of advanced stacks. If the output method returns negative error code, it means that caller must call tcp_drop(). In tcp_var() provide three inline methods to call tcp_output(): - tcp_output() is a drop-in replacement for the default stack, so that default stack can continue using it internally without modifications. For advanced stacks it would perform tcp_drop() and unlock and report that with negative error code. - tcp_output_unlock() handles the negative code and always converts it to positive and always unlocks. - tcp_output_nodrop() just calls the method and leaves the responsibility to drop on the caller. Sweep over the advanced stacks and use new KPI instead of using HPTS delayed drop queue for that. Reviewed by: rrs, tuexen Differential revision: https://reviews.freebsd.org/D33370	2021-12-26 08:48:19 -08:00
Gleb Smirnoff	40fa3e40b5	tcp: mechanically substitute call to tfb_tcp_output to new method. Made with sed(1) execution: sed -Ef sed -i "" $(grep --exclude tcp_var.h -lr tcp_output sys/) sed: s/tp->t_fb->tfb_tcp_output$tp$/tcp_output(tp)/ s/to tfb_tcp_output/to tcp_output()/ Reviewed by: rrs, tuexen Differential revision: https://reviews.freebsd.org/D33366	2021-12-26 08:47:59 -08:00
Gleb Smirnoff	ef396441ce	tcp_usr_detach: revert debugging piece from `f5cf1e5f5a`. The code was probably useful during the problem being chased down, but for brevity makes sense just to return to the original KASSERT. Reviewed by: rrs Differential revision: https://reviews.freebsd.org/D32968	2021-11-13 08:33:32 -08:00
Michael Tuexen	df07bfda67	tcp: Fix a locking issue INP_WLOCK_RECHECK_CLEANUP() and INP_WLOCK_RECHECK() might return from the function, so any locks held must be released. Reported by: syzbot+b1a888df08efaa7b4bf1@syzkaller.appspotmail.com Reviewed by: markj Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D32975	2021-11-12 22:13:50 +01:00
Randall Stewart	b8d60729de	tcp: Congestion control cleanup. NOTE: HEADS UP read the note below if your kernel config is not including GENERIC!! This patch does a bit of cleanup on TCP congestion control modules. There were some rather interesting surprises that one could get i.e. where you use a socket option to change from one CC (say cc_cubic) to another CC (say cc_vegas) and you could in theory get a memory failure and end up on cc_newreno. This is not what one would expect. The new code fixes this by requiring a cc_data_sz() function so we can malloc with M_WAITOK and pass in to the init function preallocated memory. The CC init is expected in this case not to fail but if it does and a module does break the "no fail with memory given" contract we do fall back to the CC that was in place at the time. This also fixes up a set of common newreno utilities that can be shared amongst other CC modules instead of the other CC modules reaching into newreno and executing what they think is a "common and understood" function. Lets put these functions in cc.c and that way we have a common place that is easily findable by future developers or bug fixers. This also allows newreno to evolve and grow support for its features i.e. ABE and HYSTART++ without having to dance through hoops for other CC modules, instead both newreno and the other modules just call into the common functions if they desire that behavior or roll there own if that makes more sense. Note: This commit changes the kernel configuration!! If you are not using GENERIC in some form you must add a CC module option (one of CC_NEWRENO, CC_VEGAS, CC_CUBIC, CC_CDG, CC_CHD, CC_DCTCP, CC_HTCP, CC_HD). You can have more than one defined as well if you desire. Note that if you create a kernel configuration that does not define a congestion control module and includes INET or INET6 the kernel compile will break. Also you need to define a default, generic adds 'options CC_DEFAULT=\"newreno\" but you can specify any string that represents the name of the CC module (same names that show up in the CC module list under net.inet.tcp.cc). If you fail to add the options CC_DEFAULT in your kernel configuration the kernel build will also break. Reviewed by: Michael Tuexen Sponsored by: Netflix Inc. RELNOTES:YES Differential Revision: https://reviews.freebsd.org/D32693	2021-11-11 06:28:18 -05:00
Gleb Smirnoff	f581a26e46	Factor out tcp6_use_min_mtu() to handle IPV6_USE_MIN_MTU by TCP. Pass control for IP/IP6 level options from generic tcp_ctloutput_set() down to per-stack ctloutput. Call tcp6_use_min_mtu() from tcp stack tcp_default_ctloutput(). Reviewed by: rrs Differential Revision: https://reviews.freebsd.org/D32655	2021-10-27 08:22:00 -07:00
Gleb Smirnoff	de156263a5	Several IP level socket options may affect TCP. After handling them in IP level ctloutput, pass them down to TCP ctloutput. We already have a hack to handle IPV6_USE_MIN_MTU. Leave it in place for now, but comment out how it should be handled. For IPv4 we are interested in IP_TOS and IP_TTL. Reviewed by: rrs Differential Revision: https://reviews.freebsd.org/D32655	2021-10-27 08:21:59 -07:00
Gleb Smirnoff	fc4d53cc2e	Split tcp_ctloutput() into set/get parts. Reviewed by: rrs Differential Revision: https://reviews.freebsd.org/D32655	2021-10-27 08:21:59 -07:00
Peter Lei	e28330832b	tcp: socket option to get stack alias name TCP stack sysctl nodes are currently inserted using the stack name alias. Allow the user to get the current stack's alias to allow for programatic sysctl access. Obtained from: Netflix	2021-10-27 08:21:59 -07:00
Mark Johnston	bf25678226	ktls: Fix error/mode confusion in TCP_*TLS_MODE getsockopt handlers ktls_get_(rx\|tx)_mode() can return an errno value or a TLS mode, so errors are effectively hidden. Fix this by using a separate output parameter. Convert to the new socket buffer locking macros while here. Note that the socket buffer lock is not needed to synchronize the SOLISTENING check here, we can rely on the PCB lock. Reviewed by: jhb Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31977	2021-09-17 14:19:05 -04:00
Mark Johnston	bd4a39cc93	socket: Properly interlock when transitioning to a listening socket Currently, most protocols implement pru_listen with something like the following: SOCK_LOCK(so); error = solisten_proto_check(so); if (error) { SOCK_UNLOCK(so); return (error); } solisten_proto(so); SOCK_UNLOCK(so); solisten_proto_check() fails if the socket is connected or connecting. However, the socket lock is not used during I/O, so this pattern is racy. The change modifies solisten_proto_check() to additionally acquire socket buffer locks, and the calling thread holds them until solisten_proto() or solisten_proto_abort() is called. Now that the socket buffer locks are preserved across a listen(2), this change allows socket I/O paths to properly interlock with listen(2). This fixes a large number of syzbot reports, only one is listed below and the rest will be dup'ed to it. Reported by: syzbot+9fece8a63c0e27273821@syzkaller.appspotmail.com Reviewed by: tuexen, gallatin MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31659	2021-09-07 17:11:43 -04:00
Michael Tuexen	3f1f6b6ef7	tcp, udp: improve input validation in handling bind() Reported by: syzbot+24fcfd8057e9bc339295@syzkaller.appspotmail.com Reported by: syzbot+6e90ceb5c89285b2655b@syzkaller.appspotmail.com Reviewed by: markj, rscheff MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D31422	2021-08-05 13:48:44 +02:00
Randall Stewart	4747500dea	tcp: A better fix for the previously attempted fix of the ack-war issue with tcp. So it turns out that my fix before was not correct. It ended with us failing some of the "improved" SYN tests, since we are not in the correct states. With more digging I have figured out the root of the problem is that when we receive a SYN\|FIN the reassembly code made it so we create a segq entry to hold the FIN. In the established state where we were not in order this would be correct i.e. a 0 len with a FIN would need to be accepted. But if you are in a front state we need to strip the FIN so we correctly handle the ACK but ignore the FIN. This gets us into the proper states and avoids the previous ack war. I back out some of the previous changes but then add a new change here in tcp_reass() that fixes the root cause of the issue. We still leave the rack panic fixes in place however. Reviewed by: mtuexen Sponsored by: Netflix Inc Differential Revision: https://reviews.freebsd.org/D30627	2021-06-04 05:26:43 -04:00
Mark Johnston	f96603b56f	tcp, udp: Permit binding with AF_UNSPEC if the address is INADDR_ANY Prior to commit `f161d294b` we only checked the sockaddr length, but now we verify the address family as well. This breaks at least ttcp. Relax the check to avoid breaking compatibility too much: permit AF_UNSPEC if the address is INADDR_ANY. Fixes: `f161d294b` Reported by: Bakul Shah <bakul@iitbombay.org> Reviewed by: tuexen MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D30539	2021-05-31 18:53:34 -04:00
Andrew Gallatin	086a35562f	tcp: enter network epoch when calling tfb_tcp_fb_fini We need to enter the network epoch when calling into tfb_tcp_fb_fini. I noticed this when I hit an assert running the latest rack Differential Revision: https://reviews.freebsd.org/D30407 Reviewed by: rrs, tuexen Sponsored by: Netflix	2021-05-25 13:45:37 -04:00
Randall Stewart	13c0e198ca	tcp: Fix bugs related to the PUSH bit and rack and an ack war Michaels testing with UDP tunneling found an issue with the push bit, which was only partly fixed in the last commit. The problem is the left edge gets transmitted before the adjustments are done to the send_map, this means that right edge bits must be considered to be added only if the entire RSM is being retransmitted. Now syzkaller also continued to find a crash, which Michael sent me the reproducer for. Turns out that the reproducer on default (freebsd) stack made the stack get into an ack-war with itself. After fixing the reference issues in rack the same ack-war was found in rack (and bbr). Basically what happens is we go into the reassembly code and lose the FIN bit. The trick here is we should not be going into the reassembly code if tlen == 0 i.e. the peer never sent you anything. That then gets the proper action on the FIN bit but then you end up in LAST_ACK with no timers running. This is because the usrclosed function gets called and the FIN's and such have already been exchanged. So when we should be entering FIN_WAIT2 (or even FIN_WAIT1) we get stuck in LAST_ACK. Fixing this means tweaking the usrclosed function so that we properly recognize the condition and drop into FIN_WAIT2 where a timer will allow at least TP_MAXIDLE before closing (to allow time for the peer to retransmit its FIN if the ack is lost). Setting the fast_finwait2 timer can speed this up in testing. Reviewed by: mtuexen,rscheff Sponsored by: Netflix Inc Differential Revision: https://reviews.freebsd.org/D30451	2021-05-25 13:23:31 -04:00
Mark Johnston	7d2608a5d2	tcp: Make error handling in tcp_usr_send() more consistent - Free the input mbuf in a single place instead of in every error path. - Handle PRUS_NOTREADY consistently. - Flush the socket's send buffer if an implicit connect fails. At that point the mbuf has already been enqueued but we don't want to keep it in the send buffer. Reviewed by: gallatin, tuexen Discussed with: jhb MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D30349	2021-05-21 17:45:18 -04:00
Mark Johnston	d8acd2681b	Fix mbuf leaks in various pru_send implementations The various protocol implementations are not very consistent about freeing mbufs in error paths. In general, all protocols must free both "m" and "control" upon an error, except if PRUS_NOTREADY is specified (this is only implemented by TCP and unix(4) and requires further work not handled in this diff), in which case "control" still must be freed. This diff plugs various leaks in the pru_send implementations. Reviewed by: tuexen MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D30151	2021-05-12 13:00:09 -04:00
Richard Scheffenegger	0471a8c734	tcp: SACK Lost Retransmission Detection (LRD) Recover from excessive losses without reverting to a retransmission timeout (RTO). Disabled by default, enable with sysctl net.inet.tcp.do_lrd=1 Reviewed By: #transport, rrs, tuexen, #manpages Sponsored by: Netapp, Inc. Differential Revision: https://reviews.freebsd.org/D28931	2021-05-10 19:06:20 +02:00
Mark Johnston	f161d294b9	Add missing sockaddr length and family validation to various protocols Several protocol methods take a sockaddr as input. In some cases the sockaddr lengths were not being validated, or were validated after some out-of-bounds accesses could occur. Add requisite checking to various protocol entry points, and convert some existing checks to assertions where appropriate. Reported by: syzkaller+KASAN Reviewed by: tuexen, melifaro MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D29519	2021-05-03 13:35:19 -04:00
Michael Tuexen	9e644c2300	tcp: add support for TCP over UDP Adding support for TCP over UDP allows communication with TCP stacks which can be implemented in userspace without requiring special priviledges or specific support by the OS. This is joint work with rrs. Reviewed by: rrs Sponsored by: Netflix, Inc. MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D29469	2021-04-18 16:16:42 +02:00
Kyle Evans	4c0bef07be	kern: net: remove TCP_LINGERTIME TCP_LINGERTIME can be traced back to BSD 4.4 Lite and perhaps beyond, in exactly the same form that it appears here modulo slightly different context. It used to be the case that there was a single pr_usrreq method with requests dispatched to it; these exact two lines appeared in tcp_usrreq's PRU_ATTACH handling. The only purpose of this that I can find is to cause surprising behavior on accepted connections. Newly-created sockets will never hit these paths as one cannot set SO_LINGER prior to socket(2). If SO_LINGER is set on a listening socket and inherited, one would expect the timeout to be inherited rather than changed arbitrarily like this -- noting that SO_LINGER is nonsense on a listening socket beyond inheritance, since they cannot be 'connected' by definition. Neither Illumos nor Linux reset the timer like this based on testing and inspection of Illumos, and testing of Linux. Reviewed by: rscheff, tuexen Differential Revision: https://reviews.freebsd.org/D28265	2021-02-18 22:36:01 -06:00
Andrew Gallatin	a034518ac8	Filter TCP connections to SO_REUSEPORT_LB listen sockets by NUMA domain In order to efficiently serve web traffic on a NUMA machine, one must avoid as many NUMA domain crossings as possible. With SO_REUSEPORT_LB, a number of workers can share a listen socket. However, even if a worker sets affinity to a core or set of cores on a NUMA domain, it will receive connections associated with all NUMA domains in the system. This will lead to cross-domain traffic when the server writes to the socket or calls sendfile(), and memory is allocated on the server's local NUMA node, but transmitted on the NUMA node associated with the TCP connection. Similarly, when the server reads from the socket, he will likely be reading memory allocated on the NUMA domain associated with the TCP connection. This change provides a new socket ioctl, TCP_REUSPORT_LB_NUMA. A server can now tell the kernel to filter traffic so that only incoming connections associated with the desired NUMA domain are given to the server. (Of course, in the case where there are no servers sharing the listen socket on some domain, then as a fallback, traffic will be hashed as normal to all servers sharing the listen socket regardless of domain). This allows a server to deal only with traffic that is local to its NUMA domain, and avoids cross-domain traffic in most cases. This patch, and a corresponding small patch to nginx to use TCP_REUSPORT_LB_NUMA allows us to serve 190Gb/s of kTLS encrypted https media content from dual-socket Xeons with only 13% (as measured by pcm.x) cross domain traffic on the memory controller. Reviewed by: jhb, bz (earlier version), bcr (man page) Tested by: gonzo Sponsored by: Netfix Differential Revision: https://reviews.freebsd.org/D21636	2020-12-19 22:04:46 +00:00
Mateusz Guzik	662c13053f	net: clean up empty lines in .c and .h files	2020-09-01 21:19:14 +00:00
Michael Tuexen	f903a308a1	(Re)-allow 0.0.0.0 to be used as an address in connect() for TCP In r361752 an error handling was introduced for using 0.0.0.0 or 255.255.255.255 as the address in connect() for TCP, since both addresses can't be used. However, the stack maps 0.0.0.0 implicitly to a local address and at least two regressions were reported. Therefore, re-allow the usage of 0.0.0.0. While there, change the error indicated when using 255.255.255.255 from EAFNOSUPPORT to EACCES as mentioned in the man-page of connect(). Reviewed by: rrs MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D25401	2020-07-16 16:46:24 +00:00
Randall Stewart	e854dd38ac	An important statistic in determining if a server process (or client) is being delayed is to know the time to first byte in and time to first byte out. Currently we have no way to know these all we have is t_starttime. That (t_starttime) tells us what time the 3 way handshake completed. We don't know when the first request came in or how quickly we responded. Nor from a client perspective do we know how long from when we sent out the first byte before the server responded. This small change adds the ability to track the TTFB's. This will show up in BB logging which then can be pulled for later analysis. Note that currently the tracking is via the ticks variable of all three variables. This provides a very rough estimate (hz=1000 its 1ms). A follow-on set of work will be to change all three of these values into something with a much finer resolution (either microseconds or nanoseconds), though we may want to make the resolution configurable so that on lower powered machines we could still use the much cheaper ticks variable. Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D24902	2020-06-08 11:48:07 +00:00
Randall Stewart	2cf21ae559	We should never allow either the broadcast or IN_ADDR_ANY to be connected to or sent to. This was fond when working with Michael Tuexen and Skyzaller. Skyzaller seems to want to use either of these two addresses to connect to at times. And it really is an error to do so, so lets not allow that behavior. Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D24852	2020-06-03 14:16:40 +00:00
Michael Tuexen	d442a65733	Restrict enabling TCP-FASTOPEN to end-points in CLOSED or LISTEN state Enabling TCP-FASTOPEN on an end-point which is in a state other than CLOSED or LISTEN, is a bug in the application. So it should not work. Also the TCP code does not (and needs not to) handle this. While there, also simplify the setting of the TF_FASTOPEN flag. This issue was found by running syzkaller. Reviewed by: rrs MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D25115	2020-06-03 13:51:53 +00:00
Mike Karels	2510235150	Allow TCP to reuse local port with different destinations Previously, tcp_connect() would bind a local port before connecting, forcing the local port to be unique across all outgoing TCP connections for the address family. Instead, choose a local port after selecting the destination and the local address, requiring only that the tuple is unique and does not match a wildcard binding. Reviewed by: tuexen (rscheff, rrs previous version) MFC after: 1 month Sponsored by: Forcepoint LLC Differential Revision: https://reviews.freebsd.org/D24781	2020-05-18 22:53:12 +00:00
Michael Tuexen	e240ce42bf	Allow only IPv4 addresses in sendto() for TCP on AF_INET sockets. This problem was found by looking at syzkaller reproducers for some other problems. Reviewed by: rrs Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D24831	2020-05-15 14:06:37 +00:00
Michael Tuexen	9d176904ae	Remove trailing whitespace.	2020-05-10 17:43:42 +00:00
Randall Stewart	d3b6c96b7d	Adjust the fb to have a way to ask the underlying stack if it can support the PRUS option (OOB). And then have the new function call that to validate and give the correct error response if needed to the user (rack and bbr do not support obsoleted OOB data). Sponsoered by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D24574	2020-05-04 20:19:57 +00:00
John Baldwin	f1f9347546	Initial support for kernel offload of TLS receive. - Add a new TCP_RXTLS_ENABLE socket option to set the encryption and authentication algorithms and keys as well as the initial sequence number. - When reading from a socket using KTLS receive, applications must use recvmsg(). Each successful call to recvmsg() will return a single TLS record. A new TCP control message, TLS_GET_RECORD, will contain the TLS record header of the decrypted record. The regular message buffer passed to recvmsg() will receive the decrypted payload. This is similar to the interface used by Linux's KTLS RX except that Linux does not return the full TLS header in the control message. - Add plumbing to the TOE KTLS interface to request either transmit or receive KTLS sessions. - When a socket is using receive KTLS, redirect reads from soreceive_stream() into soreceive_generic(). - Note that this interface is currently only defined for TLS 1.1 and 1.2, though I believe we will be able to reuse the same interface and structures for 1.3.	2020-04-27 23:17:19 +00:00
John Baldwin	ec1db6e13d	Add the initial sequence number to the TLS enable socket option. This will be needed for KTLS RX. Reviewed by: gallatin Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D24451	2020-04-27 22:31:42 +00:00
Michael Tuexen	a357466592	sack_newdata and snd_recover hold the same value. Therefore, use only a single instance: use snd_recover also where sack_newdata was used. Submitted by: Richard Scheffenegger Differential Revision: https://reviews.freebsd.org/D18811	2020-02-13 15:14:46 +00:00
Randall Stewart	481be5de9d	White space cleanup -- remove trailing tab's or spaces from any line. Sponsored by: Netflix Inc.	2020-02-12 13:31:36 +00:00
Gleb Smirnoff	42ce79378d	Fix missing NET_EPOCH_ENTER() when compiled with TCP_OFFLOAD. Reported by: Coverity CID: 1413162	2020-01-29 22:48:18 +00:00
Bjoern A. Zeeb	7754e281c0	Fix NOINET kernels after r356983. All gotos to the label are within the #ifdef INET section, which leaves us with an unused label. Cover the label under #ifdef INET as well to avoid the warning and compile time error.	2020-01-22 15:06:59 +00:00
Gleb Smirnoff	c1604fe4d2	Make in_pcbladdr() require network epoch entered by its callers. Together with this widen network epoch coverage up to tcp_connect() and udp_connect(). Revisions from r356974 and up to this revision cover D23187. Differential Revision: https://reviews.freebsd.org/D23187	2020-01-22 06:10:41 +00:00
Gleb Smirnoff	e2636f0a78	Remove extraneous NET_EPOCH_ASSERT - the full function is covered.	2020-01-22 06:07:27 +00:00
Gleb Smirnoff	3fed74e90f	Re-absorb tcp_detach() back into tcp_usr_detach() as the comment suggests. Not a functional change.	2020-01-22 06:06:27 +00:00
Gleb Smirnoff	5fc8df3c49	Don't enter network epoch in tcp_usr_detach. A PCB removal doesn't require that.	2020-01-22 06:04:56 +00:00
Gleb Smirnoff	7669c586da	tcp_usr_attach() doesn't need network epoch. in_pcbfree() and in_pcbdetach() perform all necessary synchronization themselves.	2020-01-22 06:01:26 +00:00
Gleb Smirnoff	0f6385e705	Inline tcp_attach() into tcp_usr_attach(). Not a functional change.	2020-01-22 05:54:58 +00:00
Gleb Smirnoff	109eb549e1	Make tcp_output() require network epoch. Enter the epoch before calling into tcp_output() from those functions, that didn't do that before. This eliminates a bunch of epoch recursions in TCP.	2020-01-22 05:53:16 +00:00
Edward Tomasz Napierala	adc56f5a38	Make use of the stats(3) framework in the TCP stack. This makes it possible to retrieve per-connection statistical information such as the receive window size, RTT, or goodput, using a newly added TCP_STATS getsockopt(3) option, and extract them using the stats_voistat_fetch(3) API. See the net/tcprtt port for an example consumer of this API. Compared to the existing TCP_INFO system, the main differences are that this mechanism is easy to extend without breaking ABI, and provides statistical information instead of raw "snapshots" of values at a given point in time. stats(3) is more generic and can be used in both userland and the kernel. Reviewed by: thj Tested by: thj Obtained from: Netflix Relnotes: yes Sponsored by: Klara Inc, Netflix Differential Revision: https://reviews.freebsd.org/D20655	2019-12-02 20:58:04 +00:00

1 2 3 4 5 ...

332 Commits