Commit Graph

6772 Commits

Author SHA1 Message Date
Michael Tuexen
fbc6840bae Minor cleanup.
MFC after:		3 days
2020-09-28 14:11:53 +00:00
Michael Tuexen
1d1b4bce53 Cleanup, no functional change intended.
MFC after:		3 days
2020-09-27 13:32:02 +00:00
Michael Tuexen
8f269b8242 Improve the handling of receiving unordered and unreliable user
messages using DATA chunks. Don't use fsn_included when not being
sure that it is set to an appropriate value. If the default is
used, which is -1, this can result in SCTP associaitons not
making any user visible progress.

Thanks to Yutaka Takeda for reporting this issue for the the
userland stack in https://github.com/pion/sctp/issues/138.

MFC after:		3 days
2020-09-27 13:24:01 +00:00
Richard Scheffenegger
e399566123 TCP: send full initial window when timestamps are in use
The fastpath in tcp_output tries to send out
full segments, and avoid sending partial segments by
comparing against the static t_maxseg variable.
That value does not consider tcp options like timestamps,
while the initial window calculation is using
the correct dynamic tcp_maxseg() function.

Due to this interaction, the last, full size segment
is considered too short and not sent out immediately.

Reviewed by:	tuexen
MFC after:	2 weeks
Sponsored by:	NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D26478
2020-09-25 10:38:19 +00:00
Richard Scheffenegger
1567c937e2 TCP newreno: improve after_idle ssthresh
Adjust ssthresh in after_idle to the maximum of
the prior ssthresh, or 3/4 of the prior cwnd. See
RFC2861 section 2 for an in depth explanation for
the rationale around this.

As newreno is the default "fall-through" reaction,
most tcp variants will benefit from this.

Reviewed by:	tuexen
MFC after:	2 weeks
Sponsored by:	NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D22438
2020-09-25 10:23:14 +00:00
Michael Tuexen
b6db274d1e Whitespace changes.
MFC after:		3 days
2020-09-24 12:26:06 +00:00
Alexander V. Chernikov
2259a03020 Rework part of routing code to reduce difference to D26449.
* Split rt_setmetrics into get_info_weight() and rt_set_expire_info(),
 as these two can be applied at different entities and at different times.
* Start filling route weight in route change notifications
* Pass flowid to UDP/raw IP route lookups
* Rework nd6_subscription_cb() and sysctl_dumpentry() to prepare for the fact
 that rtentry can contain multiple nexthops.

Differential Revision:	https://reviews.freebsd.org/D26497
2020-09-21 20:02:26 +00:00
Alexander V. Chernikov
1440f62266 Remove unused nhop_ref_any() function.
Remove "opt_mpath.h" header where not needed.

No functional changes.
2020-09-20 21:32:52 +00:00
Mitchell Horne
374ce2488a Initialize some local variables earlier
Move the initialization of these variables to the beginning of their
respective functions.

On our end this creates a small amount of unneeded churn, as these
variables are properly initialized before their first use in all cases.
However, changing this benefits at least one downstream consumer
(NetApp) by allowing local and future modifications to these functions
to be made without worrying about where the initialization occurs.

Reviewed by:	melifaro, rscheff
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D26454
2020-09-18 14:01:10 +00:00
Navdeep Parhar
b092fd6c97 if_vxlan(4): add support for hardware assisted checksumming, TSO, and RSS.
This lets a VXLAN pseudo-interface take advantage of hardware checksumming (tx
and rx), TSO, and RSS if the NIC is capable of performing these operations on
inner VXLAN traffic.

A VXLAN interface inherits the capabilities of its vxlandev interface if one is
specified or of the interface that hosts the vxlanlocal address. If other
interfaces will carry traffic for that VXLAN then they must have the same
hardware capabilities.

On transmit, if_vxlan verifies that the outbound interface has the required
capabilities and then translates the CSUM_ flags to their inner equivalents.
This tells the hardware ifnet that it needs to operate on the inner frame and
not the outer VXLAN headers.

An event is generated when a VXLAN ifnet starts. This allows hardware drivers to
configure their devices to expect VXLAN traffic on the specified incoming port.

On receive, the hardware does RSS and checksum verification on the inner frame.
if_vxlan now does a direct netisr dispatch to take full advantage of RSS. It is
not very clear why it didn't do this already.

Future work:
Rx: it should be possible to avoid the first trip up the protocol stack to get
the frame to if_vxlan just so it can decapsulate and requeue for a second trip
up the stack. The hardware NIC driver could directly call an if_vxlan receive
routine for VXLAN traffic instead.

Rx: LRO. depends on what happens with the previous item. There will have to to
be a mechanism to indicate that it's time for if_vxlan to flush its LRO state.

Reviewed by:	kib@
Relnotes:	Yes
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D25873
2020-09-18 02:37:57 +00:00
Navdeep Parhar
72cc43df17 Add a knob to allow zero UDP checksums for UDP/IPv6 traffic on the given UDP port.
This will be used by some upcoming changes to if_vxlan(4).  RFC 7348 (VXLAN)
says that the UDP checksum "SHOULD be transmitted as zero.  When a packet is
received with a UDP checksum of zero, it MUST be accepted for decapsulation."
But the original IPv6 RFCs did not allow zero UDP checksum.  RFC 6935 attempts
to resolve this.

Reviewed by:	kib@
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D25873
2020-09-18 02:21:15 +00:00
Michael Tuexen
42d7560796 Export the name of the congestion control. This will be used by sockstat
and netstat.

Reviewed by:		rscheff
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D26412
2020-09-13 09:06:50 +00:00
Richard Scheffenegger
e74e64a191 cc_mod: remove unused CCF_DELACK definition
During the DCTCP improvements, use of CCF_DELACK was
removed. This change is just to rename the unused flag
bit to prevent use of it, without also re-implementing
the tcp_input and tcp_output interfaces.

No functional change.

Reviewed by:	chengc_netapp.com, tuexen
MFC after:	2 weeks
Sponsored by:	NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D26181
2020-09-10 00:46:38 +00:00
Randall Stewart
285385ba56 So it turns out that syzkaller hit another crash. It has to do with switching
stacks with a SENT_FIN outstanding. Both rack and bbr will only send a
FIN if all data is ack'd so this must be enforced. Also if the previous stack
sent the FIN we need to make sure in rack that when we manufacture the
"unknown" sends that we include the proper HAS_FIN bits.

Note for BBR we take a simpler approach and just refuse to switch.

Sponsored by:	Netflix Inc.
Differential Revision:	https://reviews.freebsd.org/D26269
2020-09-09 11:11:50 +00:00
Bjoern A. Zeeb
67d224ef43 bbr: remove unused static function
bbr_log_type_hrdwtso() is a file local static unused function.
Remove it to avoid warnings on kernel compiles.

Reviewed by:	gallatin
Differential Revision:	https://reviews.freebsd.org/D26331
2020-09-05 00:20:32 +00:00
Mateusz Guzik
662c13053f net: clean up empty lines in .c and .h files 2020-09-01 21:19:14 +00:00
Alexander V. Chernikov
a624ca3dff Move net/route/shared.h definitions to net/route/route_var.h.
No functional changes.

net/route/shared.h was created in the inital phases of nexthop conversion.
It was intended to serve the same purpose as route_var.h - share definitions
 of functions and structures between the routing subsystem components. At
 that time route_var.h was included by many files external to the routing
 subsystem, which largerly defeats its purpose.

As currently this is not the case anymore and amount of route_var.h includes
 is roughly the same as shared.h, retire the latter in favour of the former.
2020-08-28 22:50:20 +00:00
Michael Tuexen
404ff76bda Fix a regression with the explicit EOR mode I introduced in r364268.
A short MFC time as discussed with the secteam.

Reported by:		Taylor Brandstetter
MFC after:		1 day
2020-08-28 20:05:18 +00:00
Michael Tuexen
1951fa791e RFC 3465 defines a limit L used in TCP slow start for limiting the number
of acked bytes as described in Section 2.2 of that document.
This patch ensures that this limit is not also applied in congestion
avoidance. Applying this limit also in congestion avoidance can result in
using less bandwidth than allowed.

Reported by:		l.tian.email@gmail.com
Reviewed by:		rrs, rscheff
MFC after:		3 days
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D26120
2020-08-25 09:42:03 +00:00
Warner Losh
773e541e8d Use devctl.h instead of bus.h to reduce newbus pollution.
There's no need for these parts of the kernel to know about newbus,
so narrow what is included to devctl.h for device_notify_*.

Suggested by: kib@
2020-08-21 00:03:24 +00:00
Andrew Gallatin
b99781834f TCP: remove special treatment for hardware (ifnet) TLS
Remove most special treatment for ifnet TLS in the TCP stack, except
for code to avoid mixing handshakes and bulk data.

This code made heroic efforts to send down entire TLS records to
NICs. It was added to improve the PCIe bus efficiency of older TLS
offload NICs which did not keep state per-session, and so would need
to re-DMA the first part(s) of a TLS record if a TLS record was sent
in multiple TCP packets or TSOs. Newer TLS offload NICs do not need
this feature.

At Netflix, we've run extensive QoE tests which show that this feature
reduces client quality metrics, presumably because the effort to send
TLS records atomically causes the server to both wait too long to send
data (leading to buffers running dry), and to send too much data at
once (leading to packet loss).

Reviewed by:	hselasky,  jhb, rrs
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D26103
2020-08-19 17:59:06 +00:00
Richard Scheffenegger
ad7a0eb189 TCP Cubic: recalculate cwnd for every ACK.
Since cubic calculates cwnd based on absolute
time, retaining RFC3465 (ABC) once-per-window updates
can lead to dramatic changes of cwnd in the convex
region. Updating cwnd for each incoming ack minimizes
this delta, preventing unintentional line-rate bursts.

Reviewed by:	chengc_netapp.com, tuexen (mentor)
MFC after:	2 weeks
Sponsored by:	NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D26060
2020-08-18 19:34:31 +00:00
Michael Tuexen
d7351394da Fix two bugs I introduced in r362563.
Found by running syzkaller.

MFC after:	3 days
2020-08-18 19:25:03 +00:00
Michael Tuexen
d59f3890c3 Remove a line which is needed and was added in
https://svnweb.freebsd.org/changeset/base/364268

MFC after:		3 days
2020-08-16 13:31:14 +00:00
Michael Tuexen
f5d30f7f76 Improve the handling of concurrent send() calls for SCTP sockets,
especially when having the explicit EOR mode enabled.

Reported by:		Megan2013678@protonmail.com
Reported by:		syzbot+bc02585076c3cc977f9b@syzkaller.appspotmail.com
MFC after:		3 days
2020-08-16 11:50:37 +00:00
Michael Tuexen
04996cb74b Enter epoch earlier. This is needed because we are exiting it also
in error cases.

MFC after:	1 week
2020-08-15 11:22:07 +00:00
Alexander V. Chernikov
2f23f45b20 Simplify dom_<rtattach|rtdetach>.
Remove unused arguments from dom_rtattach/dom_rtdetach functions and make
  them return/accept 'struct rib_head' instead of 'void **'.
Declare inet/inet6 implementations in the relevant _var.h headers similar
  to domifattach / domifdetach.
Add rib_subscribe_internal() function to accept subscriptions to the rnh
  directly.

Differential Revision:	https://reviews.freebsd.org/D26053
2020-08-14 21:29:56 +00:00
Richard Scheffenegger
a459638fc4 TCP Cubic: Have Fast Convergence Heuristic work for ECN, and align concave region
The Cubic concave region was not aligned nicely for the very first exit from
slow start, where a 50% cwnd reduction is done instead of the normal 30%.

This addresses an issue, where a short line-rate burst could result from that
sudden jump of cwnd.

In addition, the Fast Convergence Heuristic has been expanded to work also
with ECN induced congestion response.

Submitted by:	chengc_netapp.com
Reported by:	chengc_netapp.com
Reviewed by:	tuexen (mentor), rgrimes (mentor)
Approved by:	tuexen (mentor), rgrimes (mentor)
MFC after:	3 weeks
Sponsored by:	NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D25976
2020-08-13 16:45:55 +00:00
Richard Scheffenegger
2bb6dfabbe TCP Cubic: After leaving slowstart fix unintended cwnd jump.
Initializing K to zero in D23655 introduced a miscalculation,
where cwnd would suddenly jump to cwnd_max instead of gradually
increasing, after leaving slow-start.

Properly calculating K instead of resetting it to zero resolves
this issue. Also making sure, that cwnd is recalculated at the
earliest opportunity once slow-start is over.

Reported by:	chengc_netapp.com
Reviewed by:	chengc_netapp.com, tuexen (mentor), rgrimes (mentor)
Approved by:	tuexen (mentor), rgrimes (mentor)
MFC after:	3 weeks
Sponsored by:	NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D25746
2020-08-13 16:38:51 +00:00
Richard Scheffenegger
f359d6ebbc Improve SACK support code for RFC6675 and PRR
Adding proper accounting of sacked_bytes and (per-ACK)
delivered data to the SACK scoreboard. This will
allow more aspects of RFC6675 to be implemented as well
as Proportional Rate Reduction (RFC6937).

Prior to this change, the pipe calculation controlled with
net.inet.tcp.rfc6675_pipe was also susceptible to incorrect
results when more than 3 (or 4) holes in the sequence space
were present, which can no longer all fit into a single
ACK's SACK option.

Reviewed by:	kbowling, rgrimes (mentor)
Approved by:	rgrimes (mentor, blanket)
MFC after:	3 weeks
Sponsored by:	NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D18624
2020-08-13 16:30:09 +00:00
Hans Petter Selasky
b453d3d239 Use a static initializer for the multicast free tasks.
This makes the SYSINIT() function updated in r364072 superfluous.

Suggested by:	glebius@
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-08-11 08:31:40 +00:00
Michael Tuexen
cf8a49ab6e Fix the following issues related to the TCP SYN-cache:
* Let the accepted TCP/IPv4 socket inherit the configured TTL and
  TOS value.
* Let the accepted TCP/IPv6 socket inherit the configured Hop Limit.
* Use the configured Hop Limit and Traffic Class when sending
  IPv6 packets.

Reviewed by:		rrs, lutz_donnerhacke.de
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D25909
2020-08-10 20:24:48 +00:00
Bjoern A. Zeeb
f9461246a2 MC: add a note with reference to the discussion and history as-to why we
are where we are now.  The main thing is to try to get rid of the delayed
freeing to avoid blocking on the taskq when shutting down vnets.

X-Timeout:	if you still see this before 14-RELEASE remove it.
2020-08-10 10:58:43 +00:00
Hans Petter Selasky
3689652c65 Make sure the multicast release tasks are properly drained when
destroying a VNET or a network interface.

Else the inm release tasks, both IPv4 and IPv6 may cause a panic
accessing a freed VNET or network interface.

Reviewed by:		jmg@
Discussed with:		bz@
Differential Revision:	https://reviews.freebsd.org/D24914
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2020-08-10 10:46:08 +00:00
Hans Petter Selasky
a95ef9d38d Use proper prototype for SYSINIT() functions.
Mark the unused argument using the __unused macro.

Discussed with:		kib@
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2020-08-10 10:40:19 +00:00
Michael Tuexen
1bea15e601 Improve the ECN negotiation when the TCP SYN-cache is used by making
sure that
* ECN is disabled if the client sends an non-ECN-setup SYN segment.
* ECN is disabled is the ECN-setup SYN-ACK segment is retransmitted more
  than net.inet.tcp.ecn.maxretries times.

Reviewed by:		rscheff
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D26008
2020-08-08 19:39:38 +00:00
Bjoern A. Zeeb
a9839c4aee IPV6_PKTINFO support for v4-mapped IPv6 sockets
When using v4-mapped IPv6 sockets with IPV6_PKTINFO we do not
respect the given v4-mapped src address on the IPv4 socket.
Implement the needed functionality. This allows single-socket
UDP applications (such as OpenVPN) to work better on FreeBSD.

Requested by:	Gert Doering (gert greenie.net), pfsense
Tested by:	Gert Doering (gert greenie.net)
Reviewed by:	melifaro
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D24135
2020-08-07 15:13:53 +00:00
Randall Stewart
8315f1ea26 The recent changes to move the ref count increment
back from the end of the function created an issue.
If one of the routines returns NULL during setup
we have inp's with extra references (which is why
the increment was at the end).

Also the stack switch return code was being ignored
and actually has meaning if the stack cannot take over
it should return NULL.

Fix both of these situation by being sure to test the
return code and of course in any case of return NULL (there
are 3) make sure we properly reduce the ref count.

Sponsored by:	Netflix Inc.
Differential Revision:	https://reviews.freebsd.org/D25903
2020-07-31 10:03:32 +00:00
Michael Tuexen
205f3e1597 Clear the pointer to the socket when closing it also in case of
an ungraceful operation.
This fixes a use-after-free bug found and reported by Taylor
Brandstetter of Google by testing the userland stack.

MFC after:		1 week
2020-07-23 19:43:49 +00:00
Michael Tuexen
91e04f9e7a Detect and handle an invalid reassembly constellation, which results in
a memory leak.

Thanks to Felix Weinrank for finding this issue using fuzz testing the
userland stack.

MFC after:		1 week
2020-07-23 01:35:24 +00:00
Richard Scheffenegger
cce999b38f Fix style and comment around concave/convex regions in TCP cubic.
In cubic, the concave region is when snd_cwnd starts growing slower
towards max_cwnd (cwnd at the time of the congestion event), and
the convex region is when snd_cwnd starts to grow faster and
eventually appearing like slow-start like growth.

PR:		238478
Reviewed by:	tuexen (mentor), rgrimes (mentor)
Approved by:	tuexen (mentor), rgrimes (mentor)
MFC after:	2 weeks
Sponsored by:	NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D24657
2020-07-21 16:21:52 +00:00
Richard Scheffenegger
66ba9aafcf Add MODULE_VERSION to TCP loadable congestion control modules.
Without versioning information, using preexisting loader /
linker code is not easily possible when another module may
have dependencies on pre-loaded modules, and also doesn't
allow the automatic loading of dependent modules.

No functional change of the actual modules.

Reviewed by:	tuexen (mentor), rgrimes (mentor)
Approved by:	tuexen (mentor), rgrimes (mentor)
MFC after:	2 weeks
Sponsored by:	NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D25744
2020-07-20 23:47:27 +00:00
Michael Tuexen
8745f898c4 Add reference counts for inp/stcb/net when timers are running.
This avoids a use-after-free reported for the userland stack.
Thanks to Taylor Brandstetter for suggesting a patch for
the userland stack.

MFC after:		1 week
2020-07-19 12:34:19 +00:00
Michael Tuexen
05bceec68e Remove code which is not needed.
MFC after:		1 week
2020-07-18 13:10:02 +00:00
Michael Tuexen
7f0ad2274b Improve the locking of address lists by adding some asserts and
rearranging the addition of address such that the lock is not
given up during checking and adding.

MFC after:		1 week
2020-07-17 15:09:49 +00:00
Michael Tuexen
f903a308a1 (Re)-allow 0.0.0.0 to be used as an address in connect() for TCP
In r361752 an error handling was introduced for using 0.0.0.0 or
255.255.255.255 as the address in connect() for TCP, since both
addresses can't be used. However, the stack maps 0.0.0.0 implicitly
to a local address and at least two regressions were reported.
Therefore, re-allow the usage of 0.0.0.0.
While there, change the error indicated when using 255.255.255.255
from EAFNOSUPPORT to EACCES as mentioned in the man-page of connect().

Reviewed by:		rrs
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D25401
2020-07-16 16:46:24 +00:00
Michael Tuexen
504ee6a001 Improve the error handling in generating ASCONF chunks.
In case of errors, the cleanup was not consistent.
Thanks to Felix Weinrank for fuzzing the userland stack and making
me aware of the issue.

MFC after:		1 week
2020-07-14 20:32:50 +00:00
Michael Tuexen
cd7518203d Cleanup, no functional change intended.
This file is only compiled if INET or INET6 is defined. So there
is no need for checking that.

Reviewed by:		markj
Differential Revision:	https://reviews.freebsd.org/D25635
2020-07-12 18:34:09 +00:00
Michael Tuexen
83b8204f61 (Re)activate SCTP system calls when compiling SCTP support into the kernel
r363079 introduced the possibility of loading the SCTP stack as a module in
addition to compiling it into the kernel. As part of this, the registration
of the system calls was removed and put into the loading of the module.
Therefore, the system calls are not registered anymore when compiling the
SCTP into the kernel. This patch addresses that.

Reviewed by:		markj
Differential Revision:	https://reviews.freebsd.org/D25632
2020-07-12 14:50:12 +00:00
Michael Tuexen
4ef7c2f28f Whitespace changes due to upstreaming r363079. 2020-07-10 16:59:06 +00:00
Mark Johnston
052c5ec4d0 Provide support for building SCTP as a loadable module.
With this change, a kernel compiled with "options SCTP_SUPPORT" and
without "options SCTP" supports dynamic loading of the SCTP stack.

Currently sctp.ko cannot be unloaded since some prerequisite teardown
logic is not yet implemented.  Attempts to unload the module will return
EOPNOTSUPP.

Discussed with:	tuexen
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21997
2020-07-10 14:56:05 +00:00
Michael Tuexen
6ddc843832 Fix a use-after-free bug for the userland stack. The kernel
stack is not affected.
Thanks to Mark Wodrich from Google for finding and reporting the
bug.

MFC after:		1 week
2020-07-10 11:15:10 +00:00
Michael Tuexen
b6734d8f4a Optimize flushing of receive queues.
This addresses an issue found and reported for the userland stack in
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=21243

MFC after:		1 week
2020-07-09 16:18:42 +00:00
Michael Tuexen
fcbfdc0ab6 Improve consistency.
MFC after:		1 week
2020-07-08 16:23:40 +00:00
Michael Tuexen
ef9095c72a Fix error description.
MFC after:		1 week
2020-07-08 16:04:06 +00:00
Michael Tuexen
c96d7c373e Don't accept FORWARD-TSN chunks when I-FORWARD-TSN was negotiated
and vice versa.

MFC after:		1 week
2020-07-08 15:49:30 +00:00
Michael Tuexen
32df1c9ebb Improve handling of PKTDROP chunks. This includes the input validation
to address two issues found by ossfuzz testing the userland stack:
* https://oss-fuzz.com/testcase-detail/5387560242380800
* https://oss-fuzz.com/testcase-detail/4887954068865024
and adding support for I-DATA chunks in addition to DATA chunks.
2020-07-08 12:25:19 +00:00
Richard Scheffenegger
c201ce0b4a Fix KASSERT during tcp_newtcpcb when low on memory
While testing with system default cc set to cubic, and
running a memory exhaustion validation, FreeBSD panics for a
missing inpcb reference / lock.

Reviewed by:	rgrimes (mentor), tuexen (mentor)
Approved by:	rgrimes (mentor), tuexen (mentor)
MFC after:	3 weeks
Sponsored by:	NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D25583
2020-07-07 12:10:59 +00:00
Alexander V. Chernikov
6ad7446c6f Complete conversions from fib<4|6>_lookup_nh_<basic|ext> to fib<4|6>_lookup().
fib[46]_lookup_nh_ represents pre-epoch generation of fib api, providing less guarantees
 over pointer validness and requiring on-stack data copying.

With no callers remaining, remove fib[46]_lookup_nh_ functions.

Submitted by:	Neel Chauhan <neel AT neelc DOT org>
Differential Revision:	https://reviews.freebsd.org/D25445
2020-07-02 21:04:08 +00:00
Michael Tuexen
e54b7cd007 Fix the cleanup handling in a error path for TCP BBR.
Reported by:		syzbot+df7899c55c4cc52f5447@syzkaller.appspotmail.com
Reviewed by:		rscheff
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D25486
2020-07-01 17:17:06 +00:00
Mark Johnston
d16a2e4784 Fix a possible next-hop refcount leak when handling IPSec traffic.
It may be possible to fix this by deferring the lookup, but let's
keep the initial change simple to make MFCs easier.

PR:		246951
Reviewed by:	melifaro
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25519
2020-07-01 15:42:48 +00:00
Michael Tuexen
7a3f60e7f5 Fix a bug introduced in https://svnweb.freebsd.org/changeset/base/362173
Reported by:		syzbot+f3a6fccfa6ae9d3ded29@syzkaller.appspotmail.com
MFC after:		1 week
2020-06-30 21:50:05 +00:00
Michael Tuexen
e99ce3eac5 Don't send packets containing ERROR chunks in response to unknown
chunks when being in a state where the verification tag to be used
is not known yet.

MFC after:		1 week
2020-06-28 14:11:36 +00:00
Michael Tuexen
f2f66ef6d2 Don't check ch for not being NULL, since that is true.
MFC after:		1 week
2020-06-28 11:12:03 +00:00
John Baldwin
4a711b8d04 Use zfree() instead of explicit_bzero() and free().
In addition to reducing lines of code, this also ensures that the full
allocation is always zeroed avoiding possible bugs with incorrect
lengths passed to explicit_bzero().

Suggested by:	cem
Reviewed by:	cem, delphij
Approved by:	csprng (cem)
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D25435
2020-06-25 20:17:34 +00:00
Michael Tuexen
132c073866 Fix the acconting for fragmented unordered messages when using
interleaving.
This was reported for the userland stack in
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=19321

MFC after:		1 week
2020-06-24 14:47:51 +00:00
Richard Scheffenegger
6e26dd0dbe TCP: fix cubic RTO reaction.
Proper TCP Cubic operation requires the knowledge
of the maximum congestion window prior to the
last congestion event.

This restores and improves a bugfix previously added
by jtl@ but subsequently removed due to a revert.

Reported by:	chengc_netapp.com
Reviewed by:	chengc_netapp.com, tuexen (mentor)
Approved by:	tuexen (mentor), rgrimes (mentor)
MFC after:	2 weeks
Sponsored by:	NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D25133
2020-06-24 13:52:53 +00:00
Richard Scheffenegger
9dc7d8a246 TCP: make after-idle work for transactional sessions.
The use of t_rcvtime as proxy for the last transmission
fails for transactional IO, where the client requests
data before the server can respond with a bulk transfer.

Set aside a dedicated variable to actually track the last
locally sent segment going forward.

Reported by:	rrs
Reviewed by:	rrs, tuexen (mentor)
Approved by:	tuexen (mentor), rgrimes (mentor)
MFC after:	2 weeks
Sponsored by:	NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D25016
2020-06-24 13:42:42 +00:00
Michael Tuexen
87c0bf77d9 Fix alignment issue manifesting in the userland stack.
MFC after:		1 wwek
2020-06-23 23:05:05 +00:00
Michael Tuexen
b88082dd39 No need to include netinet/sctp_crc32.h twice. 2020-06-22 14:36:14 +00:00
Mark Johnston
e6db509d10 Move the definition of SCTP's system_base_info into sctp_crc32.c.
This file is the only SCTP source file compiled into the kernel when
SCTP_SUPPORT is configured.  sctp_delayed_checksum() references a couple
of counters defined in system_base_info, so the change allows these
counters to be referenced in a kernel compiled without "options SCTP".

Submitted by:	tuexen
MFC with:	r362338
2020-06-22 14:01:31 +00:00
Michael Tuexen
c5d9e5c99e Cleanup the defintion of struct sctp_getaddresses. This stucture
is used by the IPPROTO_SCTP level socket options SCTP_GET_PEER_ADDRESSES
and SCTP_GET_LOCAL_ADDRESSES, which are used by libc to implement
sctp_getladdrs() and sctp_getpaddrs().
These changes allow an old libc to work on a newer kernel.
2020-06-21 23:12:56 +00:00
Bjoern A. Zeeb
e387af1fa8 Rather than zeroing MAXVIFS times size of pointer [r362289] (still better than
sizeof pointer before [r354857]), we need to zero MAXVIFS times the size of
the struct.  All good things come in threes; I hope this is it on this one.

PR:		246629, 206583
Reported by:	kib
MFC after:	ASAP
2020-06-21 22:09:30 +00:00
Michael Tuexen
171edd2110 Fix the build for an INET6 only configuration.
The fix from the last commit is actually needed twice...

MFC after:		1 week
2020-06-21 09:56:09 +00:00
Michael Tuexen
5087b6e732 Set a variable also in the case of an INET6 only kernel
MFC after:		1 week
2020-06-20 23:48:57 +00:00
Michael Tuexen
ed82c2edd6 Use a struct sockaddr_in pr struct sockaddr_in6 as the option value
for the IPPROTO_SCTP level socket options SCTP_BINDX_ADD_ADDR and
SCTP_BINDX_REM_ADDR. These socket option are intended for internal
use only to implement sctp_bindx().
This is one user of struct sctp_getaddresses less.
struct sctp_getaddresses is strange and will be changed shortly.
2020-06-20 21:06:02 +00:00
Michael Tuexen
7621bd5ead Cleanup the adding and deleting of addresses via sctp_bindx().
There is no need to use the association identifier, so remove it.
While there, cleanup the code a bit.

MFC after:		1 week
2020-06-20 20:20:16 +00:00
Michael Tuexen
7a9dbc33f9 Remove last argument of sctp_addr_mgmt_ep_sa(), since it is not used.
MFC after:		1 week
2020-06-19 12:35:29 +00:00
Mark Johnston
95033af923 Add the SCTP_SUPPORT kernel option.
This is in preparation for enabling a loadable SCTP stack.  Analogous to
IPSEC/IPSEC_SUPPORT, the SCTP_SUPPORT kernel option must be configured
in order to support a loadable SCTP implementation.

Discussed with:	tuexen
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2020-06-18 19:32:34 +00:00
Bjoern A. Zeeb
ce19cceb8d When converting the static arrays to mallocarray() in r356621 I missed
one place where we now need to multiply the size of the struct with the
number of entries.  This lead to problems when restarting user space
daemons, as the cleanup was never properly done, resulting in MRT_ADD_VIF
EADDRINUSE.
Properly zero all array elements to avoid this problem.

PR:		246629, 206583
Reported by:	(many)
MFC after:	4 days
Sponsored by:	Rubicon Communications, LLC (d/b/a "Netgate")
2020-06-17 21:04:38 +00:00
Bjoern A. Zeeb
b7b3d237e7 The call into ifa_ifwithaddr() needs to be epoch protected; ortherwise
we'll panic on an assertion.
While here, leave a comment that the ifp was never protected and stable
(as glebius pointed out) and this needs to be fixed properly.

Discovered while working on:	PR 246629
Reviewed by:	glebius
MFC after:	4 days
Sponsored by:	Rubicon Communications, LLC (d/b/a "Netgate")
2020-06-17 20:58:37 +00:00
Michael Tuexen
2d87bacde4 Allow the self reference to be NULL in case the timer was stopped.
Submitted by:		Timo Voelker
MFC after:		1 week
2020-06-17 15:27:45 +00:00
Tom Jones
d88fe3d964 Add header definition for RFC4340, Datagram Congestion Control Protocol
Add a header definition for DCCP as defined in RFC4340. This header definition
is required to perform validation when receiving and forwarding DCCP packets.
We do not currently support DCCP.

Reviewed by:	gallatin, bz
Approved by:	bz (co-mentor)
MFC after:	1 week
MFC with:	350749
Differential Revision:	https://reviews.freebsd.org/D21179
2020-06-17 13:27:13 +00:00
Randall Stewart
95ef69c63c iSo in doing final checks on OCA firmware with all the latest tweaks the dup-ack checking
packet drill script was failing with a number of unexpected acks. So it turns
out if you have the default recvwin set up to 1Meg (like OCA's do) and you
have no window scaling (like the dupack checking code) then we have another
case where we are always trying to update the rwnd and sending an
ack when we should not.

Sponsored by:	Netflix Inc.
Differential Revision:	https://reviews.freebsd.org/D25298
2020-06-16 18:16:45 +00:00
Randall Stewart
4d418f8da8 So it turns out rack has a shortcoming in dup-ack counting. It counts the dupacks but
then does not properly respond to them. This is because a few missing bits are not present.
BBR actually does properly respond (though it also sends a TLP which is interesting and
maybe something to fix)..

Sponsored by:	Netflix Inc.
Differential Revision:	https://reviews.freebsd.org/D25294
2020-06-16 12:26:23 +00:00
Michael Tuexen
b231bff8b2 Allocate the mbuf for the signature in the COOKIE or the correct size.
While there, do also do some cleanups.

MFC after:		1 week
2020-06-14 16:05:08 +00:00
Michael Tuexen
4471043177 Cleanups, no functional change.
MFC after:		1 week
2020-06-14 09:50:00 +00:00
Michael Tuexen
d60bdf8569 Remove usage of empty macro.
MFC after:		1 week
2020-06-13 21:23:26 +00:00
Michael Tuexen
64c8fc5de8 Simpify a condition, no functional change.
MFC after:		1 week
2020-06-13 18:38:59 +00:00
Randall Stewart
f092a3c71c So it turns out with the right window scaling you can get the code in all stacks to
always want to do a window update, even when no data can be sent. Now in
cases where you are not pacing thats probably ok, you just send an extra
window update or two. However with bbr (and rack if its paced) every time
the pacer goes off its going to send a "window update".

Also in testing bbr I have found that if we are not responding to
data right away we end up staying in startup but incorrectly holding
a pacing gain of 192 (a loss). This is because the idle window code
does not restict itself to only work with PROBE_BW. In all other
states you dont want it doing a PROBE_BW state change.

Sponsored by:	Netflix Inc.
Differential Revision: 	https://reviews.freebsd.org/D25247
2020-06-12 19:56:19 +00:00
Michael Tuexen
3ee11586b2 Whitespace change due to upstream cleanup.
MFC after:		1 week
2020-06-12 16:40:10 +00:00
Michael Tuexen
2f9e6db0be More cleanups due to ifdef cleanup done upstream
MFC after:		1 week
2020-06-12 16:31:13 +00:00
Michael Tuexen
306c2ba375 Small cleanup due to upstream ifdef cleanups.
MFC after:		1 week
2020-06-12 10:13:23 +00:00
Michael Tuexen
28397ac1ed Non-functional changes due to upstream cleanup.
MFC after:		1 week
2020-06-11 13:34:09 +00:00
Richard Scheffenegger
2fda0a6f3a Prevent TCP Cubic to abruptly increase cwnd after app-limited
Cubic calculates the new cwnd based on absolute time
elapsed since the start of an epoch. A cubic epoch is
started on congestion events, or once the congestion
avoidance phase is started, after slow-start has
completed.

When a sender is application limited for an extended
amount of time and subsequently a larger volume of data
becomes ready for sending, Cubic recalculates cwnd
with a lingering cubic epoch. This recalculation
of the cwnd can induce a massive increase in cwnd,
causing a burst of data to be sent at line rate by
the sender.

This adds a flag to reset the cubic epoch once a
session transitions from app-limited to cwnd-limited
to prevent the above effect.

Reviewed by:	chengc_netapp.com, tuexen (mentor)
Approved by:	tuexen (mentor), rgrimes (mentor)
MFC after:	3 weeks
Sponsored by:	NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D25065
2020-06-10 07:32:02 +00:00
Richard Scheffenegger
6907bbae18 Prevent TCP Cubic to abruptly increase cwnd after slow-start
Introducing flags to track the initial Wmax dragging and exit
from slow-start in TCP Cubic. This prevents sudden jumps in the
caluclated cwnd by cubic, especially when the flow is application
limited during slow start (cwnd can not grow as fast as expected).
The downside is that cubic may remain slightly longer in the
concave region before starting the convex region beyond Wmax again.

Reviewed by:	chengc_netapp.com, tuexen (mentor)
Approved by:	tuexen (mentor), rgrimes (mentor, blanket)
MFC after:	3 weeks
Sponsored by:	NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D23655
2020-06-09 21:07:58 +00:00
Michael Tuexen
5fb132abbb Whitespace cleanups and removal of a stale comment.
MFC after:		1 week
2020-06-08 20:23:20 +00:00
Randall Stewart
e854dd38ac An important statistic in determining if a server process (or client) is being delayed
is to know the time to first byte in and time to first byte out. Currently we
have no way to know these all we have is t_starttime. That (t_starttime) tells us
what time the 3 way handshake completed. We don't know when the first
request came in or how quickly we responded. Nor from a client perspective
do we know how long from when we sent out the first byte before the
server responded.

This small change adds the ability to track the TTFB's. This will show up in
BB logging which then can be pulled for later analysis. Note that currently
the tracking is via the ticks variable of all three variables. This provides
a very rough estimate (hz=1000 its 1ms). A follow-on set of work will be
to change all three of these values into something with a much finer resolution
(either microseconds or nanoseconds), though we may want to make the resolution
configurable so that on lower powered machines we could still use the much
cheaper ticks variable.

Sponsored by:	Netflix Inc.
Differential Revision:	https://reviews.freebsd.org/D24902
2020-06-08 11:48:07 +00:00
Michael Tuexen
70486b27ae Retire SCTP_SO_LOCK_TESTING.
This was intended to test the locking used in the MacOS X kernel on a
FreeBSD system, to make use of WITNESS and other debugging infrastructure.
This hasn't been used for ages, to take it out to reduce the #ifdef
complexity.

MFC after:		1 week
2020-06-07 14:39:20 +00:00
Michael Tuexen
3f53d62236 Fix typo in comment.
Submitted by Orgad Shaneh for the userland stack.
MFC after:		1 week
2020-06-06 21:26:34 +00:00