freebsd-nq

Author	SHA1	Message	Date
Michael Tuexen	a92d501617	Improve the handling of cookie life times. The staleness reported in an error cause is in us, not ms. Enforce limits on the life time via sysct; and socket options consistently. Update the description of the sysctl variable to use the right unit. Also do some minor cleanups. This also fixes an interger overflow issue if the peer can modify the cookie. This was reported by Felix Weinrank by fuzz testing the userland stack and in https://oss-fuzz.com/testcase-detail/4800394024452096 MFC after: 3 days	2020-10-16 10:44:48 +00:00
Andrey V. Elsukov	6952c3e1ac	Implement SIOCGIFALIAS. It is lightweight way to check if an IPv4 address exists. Submitted by: Roy Marples Reviewed by: gnn, melifaro MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D26636	2020-10-14 09:22:54 +00:00
Andrey V. Elsukov	3f740d4393	Join to AllHosts multicast group again when adding an existing IPv4 address. When SIOCAIFADDR ioctl configures an IPv4 address that is already exist, it removes old ifaddr. When this IPv4 address is only one configured on the interface, this also leads to leaving from AllHosts multicast group. Then an address is added again, but due to the bug, this doesn't lead to joining to AllHosts multicast group. Submitted by: yannis.planus_alstomgroup.com Reviewed by: gnn MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D26757	2020-10-13 19:34:36 +00:00
Bjoern A. Zeeb	506512b170	ip_mroute: fix the viftable export sysctl It seems that in r354857 I got more than one thing wrong. Convert the SYSCTL_OPAQUE to a SYSCTL_PROC to properly export the these days allocated and not longer static per-vnet viftable array. This fixes a problem with netstat -g which would show bogus information for the IPv4 Virtual Interface Table. PR: 246626 Reported by: Ozkan KIRIK (ozkan.kirik gmail.com) MFC after: 3 days	2020-10-11 00:01:00 +00:00
Richard Scheffenegger	4b72ae16ed	Stop sending tiny new data segments during SACK recovery Consider the currently in-use TCP options when calculating the amount of new data to be injected during SACK loss recovery. That addresses the effect that very small (new) segments could be injected on partial ACKs while still performing a SACK loss recovery. Reported by: Liang Tian Reviewed by: tuexen, chengc_netapp.com MFC after: 2 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D26446	2020-10-09 12:44:56 +00:00
Richard Scheffenegger	868aabb470	Add IP(V6)_VLAN_PCP to set 802.1 priority per-flow. This adds a new IP_PROTO / IPV6_PROTO setsockopt (getsockopt) option IP(V6)_VLAN_PCP, which can be set to -1 (interface default), or explicitly to any priority between 0 and 7. Note that for untagged traffic, explicitly adding a priority will insert a special 801.1Q vlan header with vlan ID = 0 to carry the priority setting Reviewed by: gallatin, rrs MFC after: 2 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D26409	2020-10-09 12:06:43 +00:00
Richard Scheffenegger	5432120028	Extend netstat to display TCP stack and detailed congestion state (2) Extend netstat to display TCP stack and detailed congestion state Adding the "-c" option used to show detailed per-connection congestion control state for TCP sessions. This is one summary patch, which adds the relevant variables into xtcpcb. As previous "spare" space is used, these changes are ABI compatible. Reviewed by: tuexen MFC after: 2 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D26518	2020-10-09 10:55:19 +00:00
Michael Tuexen	e7a39b856a	Minor cleanups. MFC after: 3 days	2020-10-07 15:22:48 +00:00
John Baldwin	9aed26b906	Check if_capenable, not if_capabilities when enabling rate limiting. if_capabilities is a read-only mask of supported capabilities. if_capenable is a mask under administrative control via ifconfig(8). Reviewed by: gallatin Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D26690	2020-10-06 18:02:33 +00:00
Michael Tuexen	6f155d690b	Reset delayed SACK state when restarting an SCTP association. MFC after: 3 days	2020-10-06 14:26:05 +00:00
Michael Tuexen	b954d81662	Ensure variables are initialized before used. MFC after: 3 days	2020-10-06 11:29:08 +00:00
Michael Tuexen	6176f9d6df	Remove dead stores reported by clang static code analysis MFC after: 3 days	2020-10-06 11:08:52 +00:00
Michael Tuexen	11daa73adc	Cleanup, no functional change intended. MFC after: 3 days	2020-10-06 10:41:04 +00:00
Michael Tuexen	c8e55b3c0c	Whitespace changes. MFC after: 3 days	2020-10-06 09:51:40 +00:00
Michael Tuexen	9f2d6263bb	Use __func__ instead of __FUNCTION__ for consistency. MFC after: 3 days	2020-10-04 15:37:34 +00:00
Michael Tuexen	d0ed75b3b1	Cleanup, no functional change intended. MFC after: 3 days	2020-10-04 15:22:14 +00:00
Alexander V. Chernikov	fedeb08b6a	Introduce scalable route multipath. This change is based on the nexthop objects landed in D24232. The change introduces the concept of nexthop groups. Each group contains the collection of nexthops with their relative weights and a dataplane-optimized structure to enable efficient nexthop selection. Simular to the nexthops, nexthop groups are immutable. Dataplane part gets compiled during group creation and is basically an array of nexthop pointers, compiled w.r.t their weights. With this change, `rt_nhop` field of `struct rtentry` contains either nexthop or nexthop group. They are distinguished by the presense of NHF_MULTIPATH flag. All dataplane lookup functions returns pointer to the nexthop object, leaving nexhop groups details inside routing subsystem. User-visible changes: The change is intended to be backward-compatible: all non-mpath operations should work as before with ROUTE_MPATH and net.route.multipath=1. All routes now comes with weight, default weight is 1, maximum is 2^24-1. Current maximum multipath group width is statically set to 64. This will become sysctl-tunable in the followup changes. Using functionality: * Recompile kernel with ROUTE_MPATH * set net.route.multipath to 1 route add -6 2001:db8::/32 2001:db8::2 -weight 10 route add -6 2001:db8::/32 2001:db8::3 -weight 20 netstat -6On Nexthop groups data Internet6: GrpIdx NhIdx Weight Slots Gateway Netif Refcnt 1 ------- ------- ------- --------------------------------------- --------- 1 13 10 1 2001:db8::2 vlan2 14 20 2 2001:db8::3 vlan2 Next steps: * Land outbound hashing for locally-originated routes ( D26523 ). * Fix net/bird multipath (net/frr seems to work fine) * Add ROUTE_MPATH to GENERIC * Set net.route.multipath=1 by default Tested by: olivier Reviewed by: glebius Relnotes: yes Differential Revision: https://reviews.freebsd.org/D26449	2020-10-03 10:47:17 +00:00
Michael Tuexen	b15f541113	Improve the input validation and processing of cookies. This avoids setting the association in an inconsistent state, which could result in a use-after-free situation. This can be triggered by a malicious peer, if the peer can modify the cookie without the local endpoint recognizing it. Thanks to Ned Williamson for reporting the issue. MFC after: 3 days	2020-09-29 09:36:06 +00:00
Michael Tuexen	fbc6840bae	Minor cleanup. MFC after: 3 days	2020-09-28 14:11:53 +00:00
Michael Tuexen	1d1b4bce53	Cleanup, no functional change intended. MFC after: 3 days	2020-09-27 13:32:02 +00:00
Michael Tuexen	8f269b8242	Improve the handling of receiving unordered and unreliable user messages using DATA chunks. Don't use fsn_included when not being sure that it is set to an appropriate value. If the default is used, which is -1, this can result in SCTP associaitons not making any user visible progress. Thanks to Yutaka Takeda for reporting this issue for the the userland stack in https://github.com/pion/sctp/issues/138. MFC after: 3 days	2020-09-27 13:24:01 +00:00
Richard Scheffenegger	e399566123	TCP: send full initial window when timestamps are in use The fastpath in tcp_output tries to send out full segments, and avoid sending partial segments by comparing against the static t_maxseg variable. That value does not consider tcp options like timestamps, while the initial window calculation is using the correct dynamic tcp_maxseg() function. Due to this interaction, the last, full size segment is considered too short and not sent out immediately. Reviewed by: tuexen MFC after: 2 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D26478	2020-09-25 10:38:19 +00:00
Richard Scheffenegger	1567c937e2	TCP newreno: improve after_idle ssthresh Adjust ssthresh in after_idle to the maximum of the prior ssthresh, or 3/4 of the prior cwnd. See RFC2861 section 2 for an in depth explanation for the rationale around this. As newreno is the default "fall-through" reaction, most tcp variants will benefit from this. Reviewed by: tuexen MFC after: 2 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D22438	2020-09-25 10:23:14 +00:00
Michael Tuexen	b6db274d1e	Whitespace changes. MFC after: 3 days	2020-09-24 12:26:06 +00:00
Alexander V. Chernikov	2259a03020	Rework part of routing code to reduce difference to D26449. * Split rt_setmetrics into get_info_weight() and rt_set_expire_info(), as these two can be applied at different entities and at different times. * Start filling route weight in route change notifications * Pass flowid to UDP/raw IP route lookups * Rework nd6_subscription_cb() and sysctl_dumpentry() to prepare for the fact that rtentry can contain multiple nexthops. Differential Revision: https://reviews.freebsd.org/D26497	2020-09-21 20:02:26 +00:00
Alexander V. Chernikov	1440f62266	Remove unused nhop_ref_any() function. Remove "opt_mpath.h" header where not needed. No functional changes.	2020-09-20 21:32:52 +00:00
Mitchell Horne	374ce2488a	Initialize some local variables earlier Move the initialization of these variables to the beginning of their respective functions. On our end this creates a small amount of unneeded churn, as these variables are properly initialized before their first use in all cases. However, changing this benefits at least one downstream consumer (NetApp) by allowing local and future modifications to these functions to be made without worrying about where the initialization occurs. Reviewed by: melifaro, rscheff Sponsored by: NetApp, Inc. Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D26454	2020-09-18 14:01:10 +00:00
Navdeep Parhar	b092fd6c97	if_vxlan(4): add support for hardware assisted checksumming, TSO, and RSS. This lets a VXLAN pseudo-interface take advantage of hardware checksumming (tx and rx), TSO, and RSS if the NIC is capable of performing these operations on inner VXLAN traffic. A VXLAN interface inherits the capabilities of its vxlandev interface if one is specified or of the interface that hosts the vxlanlocal address. If other interfaces will carry traffic for that VXLAN then they must have the same hardware capabilities. On transmit, if_vxlan verifies that the outbound interface has the required capabilities and then translates the CSUM_ flags to their inner equivalents. This tells the hardware ifnet that it needs to operate on the inner frame and not the outer VXLAN headers. An event is generated when a VXLAN ifnet starts. This allows hardware drivers to configure their devices to expect VXLAN traffic on the specified incoming port. On receive, the hardware does RSS and checksum verification on the inner frame. if_vxlan now does a direct netisr dispatch to take full advantage of RSS. It is not very clear why it didn't do this already. Future work: Rx: it should be possible to avoid the first trip up the protocol stack to get the frame to if_vxlan just so it can decapsulate and requeue for a second trip up the stack. The hardware NIC driver could directly call an if_vxlan receive routine for VXLAN traffic instead. Rx: LRO. depends on what happens with the previous item. There will have to to be a mechanism to indicate that it's time for if_vxlan to flush its LRO state. Reviewed by: kib@ Relnotes: Yes Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D25873	2020-09-18 02:37:57 +00:00
Navdeep Parhar	72cc43df17	Add a knob to allow zero UDP checksums for UDP/IPv6 traffic on the given UDP port. This will be used by some upcoming changes to if_vxlan(4). RFC 7348 (VXLAN) says that the UDP checksum "SHOULD be transmitted as zero. When a packet is received with a UDP checksum of zero, it MUST be accepted for decapsulation." But the original IPv6 RFCs did not allow zero UDP checksum. RFC 6935 attempts to resolve this. Reviewed by: kib@ Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D25873	2020-09-18 02:21:15 +00:00
Michael Tuexen	42d7560796	Export the name of the congestion control. This will be used by sockstat and netstat. Reviewed by: rscheff MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D26412	2020-09-13 09:06:50 +00:00
Richard Scheffenegger	e74e64a191	cc_mod: remove unused CCF_DELACK definition During the DCTCP improvements, use of CCF_DELACK was removed. This change is just to rename the unused flag bit to prevent use of it, without also re-implementing the tcp_input and tcp_output interfaces. No functional change. Reviewed by: chengc_netapp.com, tuexen MFC after: 2 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D26181	2020-09-10 00:46:38 +00:00
Randall Stewart	285385ba56	So it turns out that syzkaller hit another crash. It has to do with switching stacks with a SENT_FIN outstanding. Both rack and bbr will only send a FIN if all data is ack'd so this must be enforced. Also if the previous stack sent the FIN we need to make sure in rack that when we manufacture the "unknown" sends that we include the proper HAS_FIN bits. Note for BBR we take a simpler approach and just refuse to switch. Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D26269	2020-09-09 11:11:50 +00:00
Bjoern A. Zeeb	67d224ef43	bbr: remove unused static function bbr_log_type_hrdwtso() is a file local static unused function. Remove it to avoid warnings on kernel compiles. Reviewed by: gallatin Differential Revision: https://reviews.freebsd.org/D26331	2020-09-05 00:20:32 +00:00
Mateusz Guzik	662c13053f	net: clean up empty lines in .c and .h files	2020-09-01 21:19:14 +00:00
Alexander V. Chernikov	a624ca3dff	Move net/route/shared.h definitions to net/route/route_var.h. No functional changes. net/route/shared.h was created in the inital phases of nexthop conversion. It was intended to serve the same purpose as route_var.h - share definitions of functions and structures between the routing subsystem components. At that time route_var.h was included by many files external to the routing subsystem, which largerly defeats its purpose. As currently this is not the case anymore and amount of route_var.h includes is roughly the same as shared.h, retire the latter in favour of the former.	2020-08-28 22:50:20 +00:00
Michael Tuexen	404ff76bda	Fix a regression with the explicit EOR mode I introduced in r364268. A short MFC time as discussed with the secteam. Reported by: Taylor Brandstetter MFC after: 1 day	2020-08-28 20:05:18 +00:00
Michael Tuexen	1951fa791e	RFC 3465 defines a limit L used in TCP slow start for limiting the number of acked bytes as described in Section 2.2 of that document. This patch ensures that this limit is not also applied in congestion avoidance. Applying this limit also in congestion avoidance can result in using less bandwidth than allowed. Reported by: l.tian.email@gmail.com Reviewed by: rrs, rscheff MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D26120	2020-08-25 09:42:03 +00:00
Warner Losh	773e541e8d	Use devctl.h instead of bus.h to reduce newbus pollution. There's no need for these parts of the kernel to know about newbus, so narrow what is included to devctl.h for device_notify_*. Suggested by: kib@	2020-08-21 00:03:24 +00:00
Andrew Gallatin	b99781834f	TCP: remove special treatment for hardware (ifnet) TLS Remove most special treatment for ifnet TLS in the TCP stack, except for code to avoid mixing handshakes and bulk data. This code made heroic efforts to send down entire TLS records to NICs. It was added to improve the PCIe bus efficiency of older TLS offload NICs which did not keep state per-session, and so would need to re-DMA the first part(s) of a TLS record if a TLS record was sent in multiple TCP packets or TSOs. Newer TLS offload NICs do not need this feature. At Netflix, we've run extensive QoE tests which show that this feature reduces client quality metrics, presumably because the effort to send TLS records atomically causes the server to both wait too long to send data (leading to buffers running dry), and to send too much data at once (leading to packet loss). Reviewed by: hselasky, jhb, rrs Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D26103	2020-08-19 17:59:06 +00:00
Richard Scheffenegger	ad7a0eb189	TCP Cubic: recalculate cwnd for every ACK. Since cubic calculates cwnd based on absolute time, retaining RFC3465 (ABC) once-per-window updates can lead to dramatic changes of cwnd in the convex region. Updating cwnd for each incoming ack minimizes this delta, preventing unintentional line-rate bursts. Reviewed by: chengc_netapp.com, tuexen (mentor) MFC after: 2 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D26060	2020-08-18 19:34:31 +00:00
Michael Tuexen	d7351394da	Fix two bugs I introduced in r362563. Found by running syzkaller. MFC after: 3 days	2020-08-18 19:25:03 +00:00
Michael Tuexen	d59f3890c3	Remove a line which is needed and was added in https://svnweb.freebsd.org/changeset/base/364268 MFC after: 3 days	2020-08-16 13:31:14 +00:00
Michael Tuexen	f5d30f7f76	Improve the handling of concurrent send() calls for SCTP sockets, especially when having the explicit EOR mode enabled. Reported by: Megan2013678@protonmail.com Reported by: syzbot+bc02585076c3cc977f9b@syzkaller.appspotmail.com MFC after: 3 days	2020-08-16 11:50:37 +00:00
Michael Tuexen	04996cb74b	Enter epoch earlier. This is needed because we are exiting it also in error cases. MFC after: 1 week	2020-08-15 11:22:07 +00:00
Alexander V. Chernikov	2f23f45b20	Simplify dom_<rtattach\|rtdetach>. Remove unused arguments from dom_rtattach/dom_rtdetach functions and make them return/accept 'struct rib_head' instead of 'void **'. Declare inet/inet6 implementations in the relevant _var.h headers similar to domifattach / domifdetach. Add rib_subscribe_internal() function to accept subscriptions to the rnh directly. Differential Revision: https://reviews.freebsd.org/D26053	2020-08-14 21:29:56 +00:00
Richard Scheffenegger	a459638fc4	TCP Cubic: Have Fast Convergence Heuristic work for ECN, and align concave region The Cubic concave region was not aligned nicely for the very first exit from slow start, where a 50% cwnd reduction is done instead of the normal 30%. This addresses an issue, where a short line-rate burst could result from that sudden jump of cwnd. In addition, the Fast Convergence Heuristic has been expanded to work also with ECN induced congestion response. Submitted by: chengc_netapp.com Reported by: chengc_netapp.com Reviewed by: tuexen (mentor), rgrimes (mentor) Approved by: tuexen (mentor), rgrimes (mentor) MFC after: 3 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D25976	2020-08-13 16:45:55 +00:00
Richard Scheffenegger	2bb6dfabbe	TCP Cubic: After leaving slowstart fix unintended cwnd jump. Initializing K to zero in D23655 introduced a miscalculation, where cwnd would suddenly jump to cwnd_max instead of gradually increasing, after leaving slow-start. Properly calculating K instead of resetting it to zero resolves this issue. Also making sure, that cwnd is recalculated at the earliest opportunity once slow-start is over. Reported by: chengc_netapp.com Reviewed by: chengc_netapp.com, tuexen (mentor), rgrimes (mentor) Approved by: tuexen (mentor), rgrimes (mentor) MFC after: 3 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D25746	2020-08-13 16:38:51 +00:00
Richard Scheffenegger	f359d6ebbc	Improve SACK support code for RFC6675 and PRR Adding proper accounting of sacked_bytes and (per-ACK) delivered data to the SACK scoreboard. This will allow more aspects of RFC6675 to be implemented as well as Proportional Rate Reduction (RFC6937). Prior to this change, the pipe calculation controlled with net.inet.tcp.rfc6675_pipe was also susceptible to incorrect results when more than 3 (or 4) holes in the sequence space were present, which can no longer all fit into a single ACK's SACK option. Reviewed by: kbowling, rgrimes (mentor) Approved by: rgrimes (mentor, blanket) MFC after: 3 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D18624	2020-08-13 16:30:09 +00:00
Hans Petter Selasky	b453d3d239	Use a static initializer for the multicast free tasks. This makes the SYSINIT() function updated in r364072 superfluous. Suggested by: glebius@ MFC after: 1 week Sponsored by: Mellanox Technologies	2020-08-11 08:31:40 +00:00
Michael Tuexen	cf8a49ab6e	Fix the following issues related to the TCP SYN-cache: * Let the accepted TCP/IPv4 socket inherit the configured TTL and TOS value. * Let the accepted TCP/IPv6 socket inherit the configured Hop Limit. * Use the configured Hop Limit and Traffic Class when sending IPv6 packets. Reviewed by: rrs, lutz_donnerhacke.de MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D25909	2020-08-10 20:24:48 +00:00

1 2 3 4 5 ...

6740 Commits