freebsd-dev

Author	SHA1	Message	Date
Hans Petter Selasky	c2a808b977	Fix kernel build after `fcb3f813f3` . By adding missing ifdefs for INET6 . Differential Revision: https://reviews.freebsd.org/D36731 Sponsored by: NVIDIA Networking	2022-10-04 15:55:36 +02:00
Randall Stewart	cd84e78f09	tcp idle reduce does not work for a server. TCP has an idle-reduce feature that allows a connection to reduce its cwnd after it has been idle more than an RTT. This feature only works for a sending side connection. It does this by at output checking the idle time (t_rcvtime vs ticks) to see if its more than the RTO timeout. The problem comes if you are a web server. You get a request and then send out all the data.. then go idle. The next time you would send is in response to a request from the peer asking for more data. But the thing is you updated t_rcvtime when the request came in so you never reduce. The fix is to do the idle reduce check also on inbound. Reviewed by: tuexen, rscheff Sponsored by: Netflix Inc Differential Revision: https://reviews.freebsd.org/D36721	2022-10-04 07:09:01 -04:00
Gleb Smirnoff	77198a945a	tcp_timers: provide tcp_timer_drop() and tcp_timer_close() Two functions to call tcp_drop() and tcp_close() from a callout context. Garbage collect tcp_inpinfo_lock_del(), it has a single use now. Differential revision: https://reviews.freebsd.org/D36397	2022-10-03 22:21:55 -07:00
Gleb Smirnoff	775e20c159	tcp: make tcp_drop_syn_sent() static	2022-10-03 21:11:17 -07:00
Gleb Smirnoff	fcb3f813f3	netinet: remove PRC_ constants and streamline ICMP processing In the original design of the network stack from the protocol control input method pr_ctlinput was used notify the protocols about two very different kinds of events: internal system events and receival of an ICMP messages from outside. These events were coded with PRC_ codes. Today these methods are removed from the protosw(9) and are isolated to IPv4 and IPv6 stacks and are called only from icmp_input(). The PRC_ codes now just create a shim layer between ICMP codes and errors or actions taken by protocols. - Change ipproto_ctlinput_t to pass just pointer to ICMP header. This allows protocols to not deduct it from the internal IP header. - Change ip6proto_ctlinput_t to pass just struct ip6ctlparam pointer. It has all the information needed to the protocols. In the structure, change ip6c_finaldst fields to sockaddr_in6. The reason is that icmp6_input() already has this address wrapped in sockaddr, and the protocols want this address as sockaddr. - For UDP tunneling control input, as well as for IPSEC control input, change the prototypes to accept a transparent union of either ICMP header pointer or struct ip6ctlparam pointer. - In icmp_input() and icmp6_input() do only validation of ICMP header and count bad packets. The translation of ICMP codes to errors/actions is done by protocols. - Provide icmp_errmap() and icmp6_errmap() as substitute to inetctlerrmap, inet6ctlerrmap arrays. - In protocol ctlinput methods either trust what icmp_errmap() recommend, or do our own logic based on the ICMP header. Differential revision: https://reviews.freebsd.org/D36731	2022-10-03 20:53:04 -07:00
Gleb Smirnoff	c0fc81e913	netinet: remove dead code from TCP, UDP, SCTP control input Now these functions are called only from icmp_input(). The pointer to the ICMP data is never NULL and cmd has a limited set of values. In the past the functions were demultiplexing control messages from ICMP layer, as well as internally generated events. In the latter case the the pointer to IP would be NULL. Differential revision: https://reviews.freebsd.org/D36729	2022-10-03 20:53:04 -07:00
Gleb Smirnoff	7f3b00a87a	netinet: filter out invalid ICMP responses in ip_icmp() instead of doing that in every ipproto_ctlinput_t method. Reviewed by: melifaro Differential revision: https://reviews.freebsd.org/D36728	2022-10-03 20:53:04 -07:00
Gleb Smirnoff	53807a8a27	netinet: use sparse C99 initializer for inetctlerrmap and mark those PRC_ codes, that are used. The rest are dead code. This is not a functional change, but illustrative to make easier review of following changes.	2022-10-03 20:53:04 -07:00
Gleb Smirnoff	43d39ca7e5	netinet*: de-void control input IP protocol methods After decoupling of protosw(9) and IP wire protocols in `78b1fc05b2` for IPv4 we got vector ip_ctlprotox[] that is executed only and only from icmp_input() and respectively for IPv6 we got ip6_ctlprotox[] executed only and only from icmp6_input(). This allows to use protocol specific argument types in these methods instead of struct sockaddr and void. Reviewed by: melifaro Differential revision: https://reviews.freebsd.org/D36727	2022-10-03 20:53:04 -07:00
Gleb Smirnoff	46ddeb6be8	netinet6: retire ip6protosw.h The netinet/ipprotosw.h and netinet6/ip6protosw.h were KAME relics, with the former removed in `f0ffb944d2` in 2001 and the latter survived until today. It has been reduced down to only one useful declaration that moves to ip6_var.h Reviewed by: melifaro Differential revision: https://reviews.freebsd.org/D36726	2022-10-03 20:53:04 -07:00
Gleb Smirnoff	0ab46f28dc	tcp: remove unnecessary include of tcp6_var.h Reviewed by: rscheff, melifaro Differential revision: https://reviews.freebsd.org/D36725	2022-10-03 20:53:04 -07:00
Gleb Smirnoff	bb77f0c204	udp: typedef udp tunneling functions to functions, not pointers With this change one can make a forward declaration of a function that is of UDP tunneling type. Reviewed by: melifaro Differential revision: https://reviews.freebsd.org/D36724	2022-10-03 20:53:04 -07:00
Gleb Smirnoff	24b96f35b9	netinet*: move ipproto_register() and co to ip_var.h and ip6_var.h This is a FreeBSD KPI and belongs to private header not netinet/in.h. Reviewed by: melifaro Differential revision: https://reviews.freebsd.org/D36723	2022-10-03 20:53:04 -07:00
Richard Scheffenegger	4edff766cb	tcp: correct simultaneous SYN ECN reaction in RFC3168 mode. Ensure that an RFC3168 ECN reaction only occurs on non-SYN segments. Reviewed By: tuexen, #transport Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D36867	2022-10-04 00:24:28 +02:00
Richard Scheffenegger	0924ae8f47	tcp: allow window scale and timestamps to be toggled individually Simple change to allow for the individual toggling of RFC7323 window scaling and timestamp option. Reviewed By: rrs, tuexen, glebius, guest-ccui, #transport Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D36863	2022-10-03 19:21:46 +02:00
Michael Tuexen	2515552e62	tcp: improve handling of SYN-ACK segments in TIMEWAIT state Only consider segments with the SYN bit set and the ACK bit cleared as "new connection attempts", which result in re-using a connection being in TIMEWAIT state. This results in consistent handling of SYN-ACK segments. Reviewed by: rscheff@ MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D36864	2022-10-03 14:46:47 +02:00
Michael Tuexen	f8b5681094	tcp: honor drop_synfin sysctl variable in TIME-WAIT Reviewed by: rrs@ MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D36862	2022-10-03 12:48:30 +02:00
Randall Stewart	08af8aac2a	Tcp progress timeout Rack has had the ability to timeout connections that just sit idle automatically. This feature of course is off by default and requires the user set it on (though the socket option has been missing in tcp_usrreq.c). Lets get the progress timeout fully supported in the base stack as well as rack. Reviewed by: tuexen Sponsored by: Netflix Inc Differential Revision: https://reviews.freebsd.org/D36716	2022-09-27 13:38:20 -04:00
Randall Stewart	d1b07f36a2	TCP complete end status work. The ending of a connection can tell us a lot about what happened i.e. did it fail to setup, did it timeout, was it a normal close. Often times this is useful information to help analyze and debug issues. Rack has had end status for some time but the base stack as not. Lets go a ahead and add in the missing bits to populate the end status. Reviewed by: tuexen, rscheff Sponsored by: Netflix Inc Differential Revision: https://reviews.freebsd.org/D36712	2022-09-26 15:20:18 -04:00
Randall Stewart	e5049a1733	TCP rack does not work properly with cubic. Right now if you use rack with cubic (the new default cc) you will have improper results. This is because rack uses different variables than the base stack (or bbr) and thus tcp_compute_pipe() always returns so that cubic will choose a 30% backoff not the 50% backoff it should when it is newreno compatibility mode. The fix is to allow a stack (rack) to override its own compute_pipe. Reviewed by: tuexen, rscheff Sponsored by: Netflix Inc Differential Revision: https://reviews.freebsd.org/D36711	2022-09-26 15:12:03 -04:00
Alexander V. Chernikov	f375bf0e6f	netinet: pass cred instead of the curthread to ifaddr manipulation funcs. Pass the credentials directly to the functions, so non-ioctl kernel users can also performan address manipulations. MFC after: 2 weeks	2022-09-26 13:46:13 +00:00
Michael Tuexen	0fdc247274	tcp: make RACK loadable again using the default configuration Without this patch, loading the RACK stack required the newreno CC module to be compiled into the kernel. This is not the case anymore since CUBIC is the default now. Reviewed by: rscheff@ Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D36707	2022-09-26 12:30:50 +02:00
Richard Scheffenegger	a743fc8826	tcp: fix cwnd restricted SACK retransmission loop While doing the initial SACK retransmission segment while heavily cwnd constrained, tcp_ouput can erroneously send out the entire sendbuffer again. This may happen after an retransmission timeout, which resets snd_nxt to snd_una while the SACK scoreboard is still populated. Reviewed By: tuexen, #transport PR: 264257 PR: 263445 PR: 260393 MFC after: 3 days Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D36637	2022-09-22 13:28:43 +02:00
Michael Tuexen	5ae83e0d87	tcp: send ACKs when requested When doing Limited Transmit send an ACK when needed by the protocol processing (like sending ACKs with a DSACK block). PR: 264257 PR: 263445 PR: 260393 Reviewed by: rscheff@ MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D36631	2022-09-22 12:12:11 +02:00
Gleb Smirnoff	9453ec6619	tcp: increment tcpstats in tcp_respond() tcp_respond() crafts a packet and sends it directly to ip[6]output(), bypassing tcp_output(). Hence it must increment TCP send statistics. Reviewed by: rscheff, tuexen, rrs (implicitly) Differential revision: https://reviews.freebsd.org/D36641	2022-09-21 14:03:33 -07:00
Gleb Smirnoff	493105c2a8	tcp: fix simultaneous open and refine `e80062a2d4` - The soisconnected() call on transition from SYN_RCVD to ESTABLISHED is also necessary for a half-synchronized connection. Fix that just setting the flag, when we transfer SYN-SENT -> SYN-RECEIVED. - Provide a comment that explains at what conditions the call to soisconnected() is necessary. - Hence mechanically rename the TF_INCQUEUE flag to TF_SONOTCONN. - Extend the change to the BBR and RACK stacks. Note: the interaction between the accept_filter(9) and the socket layer is not fully consistent, yet. For most accept filters this call to soisconnected() will not move the connection from the incomplete queue to the complete. The move would happen only when the filter has received the desired data, and soisconnected() would be called once again from sorwakeup(). Ideally, we should mark socket as connected only there, and leave the soisconnected() from SYN_RCVD->ESTABLISHED only for the simultaneous open case. However, this doesn't yet work. Reviewed by: rscheff, tuexen, rrs Differential revision: https://reviews.freebsd.org/D36641	2022-09-21 14:02:49 -07:00
Gleb Smirnoff	0c7f3ae8c6	tcpcb: fix tabulation count in i4012ef7754c and abbreviate "packets" This lines up comments to the rest of the file. Abbreviation helps to fit in to 80 char terminal. Not a functional change.	2022-09-19 10:29:53 -07:00
Michael Tuexen	6d9e911fba	tcp: fix computation of offset Only update the offset if actually retransmitting from the scoreboard. If not done correctly, this may result in trying to (re)-transmit data not being being in the socket buffe and therefore resulting in a panic. PR: 264257 PR: 263445 PR: 260393 Reviewed by: rscheff@ MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D36626	2022-09-19 12:49:31 +02:00
Gleb Smirnoff	da6715bbb1	ip_output: always increase "cantfrag" stat if ip_fragment() fails While here, join two unlikely cases into one if clause. Submitted by: Ivan Rozhuk <rozhuk.im gmail.com> PR: 265718 Reviewed by: mjg, melifaro Differential revision: https://reviews.freebsd.org/D36584	2022-09-14 19:22:40 -07:00
Gleb Smirnoff	15b73a2a14	ip_reass: use correct comparison in ipreass_callout() Reported-by: syzbot+55415dc73f9b89b87fce@syzkaller.appspotmail.com	2022-09-14 08:32:07 -07:00
Richard Scheffenegger	bb1d472d79	tcp: make CUBIC the default congestion control mechanism. This changes the default TCP Congestion Control (CC) to CUBIC. For small, transactional exchanges (e.g. web objects <15kB), this will not have a material effect. However, for long duration data transfers, CUBIC allocates a slightly higher fraction of the available bandwidth, when competing against NewReno CC. Reviewed By: tuexen, mav, #transport, guest-ccui, emaste Relnotes: Yes Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D36537	2022-09-13 12:09:21 +02:00
Richard Scheffenegger	ea6d0de299	tcp: Make all references to CUBIC uppercase Consistently refer to the CUBIC congestion control mechanism in uppercase throughout all comments. No functional change. Reviewed By: #transport, tuexen, mav, guest-ccui, emaste Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D36547	2022-09-13 12:07:06 +02:00
Dag-Erling Smørgrav	c198adf394	siftr: spell PFIL_PASS correctly. Sponsored by: NetApp Sponsored by: Klara Inc. Differential Revision: https://reviews.freebsd.org/D36539	2022-09-12 19:20:10 +02:00
Mateusz Guzik	1760a6950a	Fixup build after recent getsock changes	2022-09-10 20:40:43 +00:00
Mateusz Guzik	3212ad15ab	Add getsock All but one consumers of getsock_cap only pass 4 arguments. Take advantage of it.	2022-09-10 19:47:47 +00:00
Gleb Smirnoff	29b4b63c59	ip_reass: optimize ipreass_drain_vnet() - Call ipreass_reschedule() only once per slot [1] - Aggregate stats and update them once Suggested by: jtl [1]	2022-09-10 02:17:15 -07:00
Gleb Smirnoff	13018bfae8	ip_reass: make stray callout assertion more verbose Syzcaller hits this assertion, but can't find reproducer. I also never seen it hit in my testing. Try to get more information via syzcaller.	2022-09-10 02:11:39 -07:00
Gleb Smirnoff	c8bc874172	ip_reass: fixup the just added tunable - Don't use hardcoded hash mask - free the memory on VNET destroy Fixes: `1494f4776a`	2022-09-09 09:19:39 -07:00
Randall Stewart	81560c5582	TCP: Rack ends up sending all that is outstanding every timeout. In doing some testing for a different problem, I have found rack retransmitting all outstanding data every time a timeout occurs. The outstanding is sent 1ms apart between each packet, and then the timeout runs off again. This causes extra retransmissions when we should be waiting for an ack after sending the very first segment. Reviewed by: tuexen Sponsored by: Netflix Inc Differential Revision: https://reviews.freebsd.org/D36494	2022-09-09 08:59:21 -04:00
Gleb Smirnoff	1494f4776a	ip_reass: add loader tunable to tune the reassembly hash size	2022-09-08 13:49:58 -07:00
Gleb Smirnoff	a30cb31589	ip_reass: retire ipreass_slowtimo() in favor of per-slot callout o Retire global always running ipreass_slowtimo(). o Instead use one callout entry per hash slot. The per-slot callout would be scheduled only if a slot has entries, and would be driven by TTL of the very last entry. o Make net.inet.ip.fragttl read/write and document it. o Retire IPFRAGTTL, which used to be meaningful only with PR_SLOWTIMO. Differential revision: https://reviews.freebsd.org/D36275	2022-09-08 13:49:58 -07:00
Mateusz Guzik	dda6376b04	net: employ newly added pfil_mbuf_{in,out} where approriate Reviewed by: glebius Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D36454	2022-09-08 16:21:08 +00:00
Gleb Smirnoff	e80062a2d4	tcp: avoid call to soisconnected() on transition to ESTABLISHED This call existed since pre-FreeBSD times, and it is hard to understand why it was there in the first place. After `6f3caa6d81` it definitely became necessary always and commit message from `f1ee30ccd6` confirms that. Now that `6f3caa6d81` is effectively backed out by `07285bb4c2`, the call appears to be useful only for sockets that landed on the incomplete queue, e.g. sockets that have accept_filter(9) enabled on them. Provide a new TCP flag to mark connections that are known to be on the incomplete queue, and call soisconnected() only for those connections. Reviewed by: rrs, tuexen Differential revision: https://reviews.freebsd.org/D36488	2022-09-08 09:16:04 -07:00
Mateusz Guzik	14c9a2dbfb	net: retire PFIL_FWD It is now unused and not having it allows further clean ups. Reviewed by: cy, glebius, kp Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D36452	2022-09-07 10:04:31 +00:00
Mateusz Guzik	223a73a1c4	net: remove stale altq_input reference Code setting it was removed in: commit `325fab802e` Author: Eric van Gyzen <vangyzen@FreeBSD.org> Date: Tue Dec 4 23:46:43 2018 +0000 altq: remove ALTQ3_COMPAT code Reviewed by: glebius, kp Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D36471	2022-09-07 10:03:12 +00:00
Gleb Smirnoff	aa74cc6d6f	divert(4): do not depend on ipfw(4) Although originally socket was intended to use with ipfw(4) only, now it also can be used with pf(4). On a kernel without packet filters, it still can be used to inject traffic.	2022-09-06 20:54:57 -07:00
Gleb Smirnoff	999c9fd733	divert(4): don't check for CSUM_SCTP without INET This compiles, but actually is a dead code. Noticed by: bz Fixes: `e72c522858`	2022-09-06 20:54:57 -07:00
Gleb Smirnoff	0773b44e82	tcp: tcp6_connect() requires net epoch PR: 262663 Reported & tested by: dch MFC after: 2 weeks	2022-09-05 10:19:11 -07:00
Gordon Bergling	347b1991b0	netdump(4): Correct a typo in source code comment - s/occured/occurred/ MFC after: 3 days	2022-09-04 12:59:29 +02:00
Gordon Bergling	c3679af313	tcp_rack: Correct some typos in source code comments - s/occured/occurred/ MFC after: 3 days	2022-09-04 12:58:13 +02:00

1 2 3 4 5 ...

7509 Commits