Commit Graph

6569 Commits

Author SHA1 Message Date
tuexen
c1a251abe4 Allocate the mbuf for the signature in the COOKIE or the correct size.
While there, do also do some cleanups.

MFC after:		1 week
2020-06-14 16:05:08 +00:00
tuexen
f47ef49e59 Cleanups, no functional change.
MFC after:		1 week
2020-06-14 09:50:00 +00:00
tuexen
f6fa051646 Remove usage of empty macro.
MFC after:		1 week
2020-06-13 21:23:26 +00:00
tuexen
18a7e41cfe Simpify a condition, no functional change.
MFC after:		1 week
2020-06-13 18:38:59 +00:00
rrs
255d51414d So it turns out with the right window scaling you can get the code in all stacks to
always want to do a window update, even when no data can be sent. Now in
cases where you are not pacing thats probably ok, you just send an extra
window update or two. However with bbr (and rack if its paced) every time
the pacer goes off its going to send a "window update".

Also in testing bbr I have found that if we are not responding to
data right away we end up staying in startup but incorrectly holding
a pacing gain of 192 (a loss). This is because the idle window code
does not restict itself to only work with PROBE_BW. In all other
states you dont want it doing a PROBE_BW state change.

Sponsored by:	Netflix Inc.
Differential Revision: 	https://reviews.freebsd.org/D25247
2020-06-12 19:56:19 +00:00
tuexen
b95ee8b957 Whitespace change due to upstream cleanup.
MFC after:		1 week
2020-06-12 16:40:10 +00:00
tuexen
74d569664c More cleanups due to ifdef cleanup done upstream
MFC after:		1 week
2020-06-12 16:31:13 +00:00
tuexen
912ac9f062 Small cleanup due to upstream ifdef cleanups.
MFC after:		1 week
2020-06-12 10:13:23 +00:00
tuexen
08abaa63c4 Non-functional changes due to upstream cleanup.
MFC after:		1 week
2020-06-11 13:34:09 +00:00
rscheff
649fb27a96 Prevent TCP Cubic to abruptly increase cwnd after app-limited
Cubic calculates the new cwnd based on absolute time
elapsed since the start of an epoch. A cubic epoch is
started on congestion events, or once the congestion
avoidance phase is started, after slow-start has
completed.

When a sender is application limited for an extended
amount of time and subsequently a larger volume of data
becomes ready for sending, Cubic recalculates cwnd
with a lingering cubic epoch. This recalculation
of the cwnd can induce a massive increase in cwnd,
causing a burst of data to be sent at line rate by
the sender.

This adds a flag to reset the cubic epoch once a
session transitions from app-limited to cwnd-limited
to prevent the above effect.

Reviewed by:	chengc_netapp.com, tuexen (mentor)
Approved by:	tuexen (mentor), rgrimes (mentor)
MFC after:	3 weeks
Sponsored by:	NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D25065
2020-06-10 07:32:02 +00:00
rscheff
16beac848f Prevent TCP Cubic to abruptly increase cwnd after slow-start
Introducing flags to track the initial Wmax dragging and exit
from slow-start in TCP Cubic. This prevents sudden jumps in the
caluclated cwnd by cubic, especially when the flow is application
limited during slow start (cwnd can not grow as fast as expected).
The downside is that cubic may remain slightly longer in the
concave region before starting the convex region beyond Wmax again.

Reviewed by:	chengc_netapp.com, tuexen (mentor)
Approved by:	tuexen (mentor), rgrimes (mentor, blanket)
MFC after:	3 weeks
Sponsored by:	NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D23655
2020-06-09 21:07:58 +00:00
tuexen
c20c9cac4f Whitespace cleanups and removal of a stale comment.
MFC after:		1 week
2020-06-08 20:23:20 +00:00
rrs
0b7e58eb67 An important statistic in determining if a server process (or client) is being delayed
is to know the time to first byte in and time to first byte out. Currently we
have no way to know these all we have is t_starttime. That (t_starttime) tells us
what time the 3 way handshake completed. We don't know when the first
request came in or how quickly we responded. Nor from a client perspective
do we know how long from when we sent out the first byte before the
server responded.

This small change adds the ability to track the TTFB's. This will show up in
BB logging which then can be pulled for later analysis. Note that currently
the tracking is via the ticks variable of all three variables. This provides
a very rough estimate (hz=1000 its 1ms). A follow-on set of work will be
to change all three of these values into something with a much finer resolution
(either microseconds or nanoseconds), though we may want to make the resolution
configurable so that on lower powered machines we could still use the much
cheaper ticks variable.

Sponsored by:	Netflix Inc.
Differential Revision:	https://reviews.freebsd.org/D24902
2020-06-08 11:48:07 +00:00
tuexen
55b0f3c9b6 Retire SCTP_SO_LOCK_TESTING.
This was intended to test the locking used in the MacOS X kernel on a
FreeBSD system, to make use of WITNESS and other debugging infrastructure.
This hasn't been used for ages, to take it out to reduce the #ifdef
complexity.

MFC after:		1 week
2020-06-07 14:39:20 +00:00
tuexen
8bd79face7 Fix typo in comment.
Submitted by Orgad Shaneh for the userland stack.
MFC after:		1 week
2020-06-06 21:26:34 +00:00
tuexen
5d90c7b06d Non-functional changes due to cleanup (upstream removing of Panda support)
of the code

MFC after:		1 week
2020-06-06 18:20:09 +00:00
rrs
f747b279f1 We should never allow either the broadcast or IN_ADDR_ANY to be
connected to or sent to. This was fond when working with Michael
Tuexen and Skyzaller. Skyzaller seems to want to use either of
these two addresses to connect to at times. And it really is
an error to do so, so lets not allow that behavior.

Sponsored by:	Netflix Inc.
Differential Revision:	https://reviews.freebsd.org/D24852
2020-06-03 14:16:40 +00:00
rrs
452e9158b0 This fixes a couple of skyzaller crashes. Most
of them have to do with TFO. Even the default stack
had one of the issues:

1) We need to make sure for rack that we don't advance
   snd_nxt beyond iss when we are not doing fast open. We
   otherwise can get a bunch of SYN's sent out incorrectly
   with the seq number advancing.
2) When we complete the 3-way handshake we should not ever
   append to reassembly if the tlen is 0, if TFO is enabled
   prior to this fix we could still call the reasemmbly. Note
   this effects all three stacks.
3) Rack like its cousin BBR should track if a SYN is on a
   send map entry.
4) Both bbr and rack need to only consider len incremented on a SYN
   if the starting seq is iss, otherwise we don't increment len which
   may mean we return without adding a sendmap entry.

This work was done in collaberation with Michael Tuexen, thanks for
all the testing!
Sponsored by:	Netflix Inc
Differential Revision:	https://reviews.freebsd.org/D25000
2020-06-03 14:07:31 +00:00
tuexen
831d2ffcb9 Restrict enabling TCP-FASTOPEN to end-points in CLOSED or LISTEN state
Enabling TCP-FASTOPEN on an end-point which is in a state other than
CLOSED or LISTEN, is a bug in the application. So it should not work.
Also the TCP code does not (and needs not to) handle this.
While there, also simplify the setting of the TF_FASTOPEN flag.

This issue was found by running syzkaller.

Reviewed by:		rrs
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D25115
2020-06-03 13:51:53 +00:00
melifaro
e3a794daf5 * Add rib_<add|del|change>_route() functions to manipulate the routing table.
The main driver for the change is the need to improve notification mechanism.
Currently callers guess the operation data based on the rtentry structure
 returned in case of successful operation result. There are two problems with
 this appoach. First is that it doesn't provide enough information for the
 upcoming multipath changes, where rtentry refers to a new nexthop group,
 and there is no way of guessing which paths were added during the change.
 Second is that some rtentry fields can change during notification and
 protecting from it by requiring customers to unlock rtentry is not desired.

Additionally, as the consumers such as rtsock do know which operation they
 request in advance, making explicit add/change/del versions of the functions
 makes sense, especially given the functions don't share a lot of code.

With that in mind, introduce rib_cmd_info notification structure and
 rib_<add|del|change>_route() functions, with mandatory rib_cmd_info pointer.
 It will be used in upcoming generalized notifications.

* Move definitions of the new functions and some other functions/structures
 used for the routing table manipulation to a separate header file,
 net/route/route_ctl.h. net/route.h is a frequently used file included in
 ~140 places in kernel, and 90% of the users don't need these definitions.

Reviewed by:		ae
Differential Revision:	https://reviews.freebsd.org/D25067
2020-06-01 20:49:42 +00:00
melifaro
c5efadf64f Revert r361704, it accidentally committed merged D25067 and D25070. 2020-06-01 20:40:40 +00:00
melifaro
e46bbfddee * Add rib_<add|del|change>_route() functions to manipulate the routing table.
The main driver for the change is the need to improve notification mechanism.
Currently callers guess the operation data based on the rtentry structure
 returned in case of successful operation result. There are two problems with
 this appoach. First is that it doesn't provide enough information for the
 upcoming multipath changes, where rtentry refers to a new nexthop group,
 and there is no way of guessing which paths were added during the change.
 Second is that some rtentry fields can change during notification and
 protecting from it by requiring customers to unlock rtentry is not desired.

Additionally, as the consumers such as rtsock do know which operation they
 request in advance, making explicit add/change/del versions of the functions
 makes sense, especially given the functions don't share a lot of code.

With that in mind, introduce rib_cmd_info notification structure and
 rib_<add|del|change>_route() functions, with mandatory rib_cmd_info pointer.
 It will be used in upcoming generalized notifications.

* Move definitions of the new functions and some other functions/structures
 used for the routing table manipulation to a separate header file,
 net/route/route_ctl.h. net/route.h is a frequently used file included in
 ~140 places in kernel, and 90% of the users don't need these definitions.

Reviewed by:	ae
Differential Revision: https://reviews.freebsd.org/D25067
2020-06-01 20:32:02 +00:00
melifaro
ebd6d5fb37 Use fib[46]_lookup() in mtu calculations.
fib[46]_lookup_nh_ represents pre-epoch generation of fib api,
providing less guarantees over pointer validness and requiring
on-stack data copying.

Conversion is straight-forwarded, as the only 2 differences are
requirement of running in network epoch and the need to handle
RTF_GATEWAY case in the caller code.

Differential Revision:	https://reviews.freebsd.org/D24974
2020-05-28 08:00:08 +00:00
melifaro
5f29c9b8e0 Switch ip_output/icmp_reflect rt lookup calls with fib4_lookup.
fib4_lookup_nh_ represents pre-epoch generation of fib api,
providing less guarantees over pointer validness and requiring
on-stack data copying.

Conversion is straight-forwarded, as the only 2 differences are
requirement of running in network epoch and the need to handle
RTF_GATEWAY case in the caller code.

Reviewed by:	ae
Differential Revision:	https://reviews.freebsd.org/D24976
2020-05-28 07:31:53 +00:00
melifaro
4b5a6f4255 Switch gif(4) path verification to fib[46]_check_urfp().
fibX_lookup_nh_ represents pre-epoch generation of fib api,
providing less guarantees over pointer validness and requiring
on-stack data copying.
Use specialized fib[46]_check_urpf() from newer KPI instead,
to allow removal of older KPI.

Reviewed by:	ae
Differential Revision:	https://reviews.freebsd.org/D24978
2020-05-28 07:26:18 +00:00
manu
9fa4191791 bbr: Use arc4random_uniform from libkern.
This unbreak LINT build

Reported by:	jenkins, melifaro
2020-05-23 19:52:20 +00:00
melifaro
29b95a0fda Move <add|del|change>_route() functions to route_ctl.c in preparation of
multipath control plane changed described in D24141.

Currently route.c contains core routing init/teardown functions, route table
 manipulation functions and various helper functions, resulting in >2KLOC
 file in total. This change moves most of the route table manipulation parts
 to a dedicated file, simplifying planned multipath changes and making
 route.c more manageable.

Differential Revision:	https://reviews.freebsd.org/D24870
2020-05-23 19:06:57 +00:00
rscheff
1dc7f19b37 DCTCP: update alpha only once after loss recovery.
In mixed ECN marking and loss scenarios it was found, that
the alpha value of DCTCP is updated two times. The second
update happens with freshly initialized counters indicating
to ECN loss. Overall this leads to alpha not adjusting as
quickly as expected to ECN markings, and therefore lead to
excessive loss.

Reported by:	Cheng Cui
Reviewed by:	chengc_netapp.com, rrs, tuexen (mentor)
Approved by:	tuexen (mentor)
MFC after:	2 weeks
Sponsored by:	NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D24817
2020-05-21 21:42:49 +00:00
rscheff
6a15b73ab5 With RFC3168 ECN, CWR SHOULD only be sent with new data
Overly conservative data receivers may ignore the CWR flag
on other packets, and keep ECE latched. This can result in
continous reduction of the congestion window, and very poor
performance when ECN is enabled.

Reviewed by:	rgrimes (mentor), rrs
Approved by:	rgrimes (mentor), tuexen (mentor)
MFC after:	3 days
Sponsored by:	NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D23364
2020-05-21 21:33:15 +00:00
rscheff
c201e08228 Retain only mutually supported TCP options after simultaneous SYN
When receiving a parallel SYN in SYN-SENT state, remove all the
options only we supported locally before sending the SYN,ACK.

This addresses a consistency issue on parallel opens.

Also, on such a parallel open, the stack could be coaxed into
running with timestamps enabled, even if administratively disabled.

Reviewed by:	tuexen (mentor)
Approved by:	tuexen (mentor)
MFC after:	2 weeks
Sponsored by:	NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D23371
2020-05-21 21:26:21 +00:00
rscheff
55bf91efe4 Handle ECN handshake in simultaneous open
While testing simultaneous open TCP with ECN, found that
negotiation fails to arrive at the expected final state.

Reviewed by:	tuexen (mentor)
Approved by:	tuexen (mentor), rgrimes (mentor)
MFC after:	2 weeks
Sponsored by:	NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D23373
2020-05-21 21:15:25 +00:00
markj
b20c56f366 Define a module version for accept filter modules.
Otherwise accept filters compiled into the kernel do not preempt
preloaded accept filter modules.  Then, the preloaded file registers its
accept filter module before the kernel, and the kernel's attempt fails
since duplicate accept filter list entries are not permitted.  This
causes the preloaded file's module to be released, since
module_register_init() does a lookup by name, so the preloaded file is
unloaded, and the accept filter's callback points to random memory since
preload_delete_name() unmaps the file on x86 as of r336505.

Add a new ACCEPT_FILTER_DEFINE macro which wraps the accept filter and
module definitions, and ensures that a module version is defined.

PR:		245870
Reported by:	Thomas von Dein <freebsd@daemon.de>
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2020-05-19 18:35:08 +00:00
tuexen
b7cd8817dc Replace snprintf() by SCTP_SNPRINTF() and let SCTP_SNPRINTF() map
to snprintf() on FreeBSD. This allows to check for failures of snprintf()
on platforms other than FreeBSD kernel.
2020-05-19 07:23:35 +00:00
tuexen
879bc02020 Revert r361209:
cem noted that on FreeBSD snprintf() can not fail and code should not
check for that.

A followup commit will replace the usage of snprintf() in the SCTP
sources with a variadic macro SCTP_SNPRINTF, which will simply map to
snprintf() on FreeBSD and do a checking similar to r361209 on
other platforms.
2020-05-19 07:21:11 +00:00
karels
1ec42007fe Fix NULL-pointer bug from r361228.
Note that in_pcb_lport and in_pcb_lport_dest can be called with a NULL
local address for IPv6 sockets; handle it.  Found by syzkaller.

Reported by:	cem
MFC after:	1 month
2020-05-19 01:05:13 +00:00
karels
738968b552 Allow TCP to reuse local port with different destinations
Previously, tcp_connect() would bind a local port before connecting,
forcing the local port to be unique across all outgoing TCP connections
for the address family. Instead, choose a local port after selecting
the destination and the local address, requiring only that the tuple
is unique and does not match a wildcard binding.

Reviewed by:	tuexen (rscheff, rrs previous version)
MFC after:	1 month
Sponsored by:	Forcepoint LLC
Differential Revision:	https://reviews.freebsd.org/D24781
2020-05-18 22:53:12 +00:00
tuexen
d7c7261fd4 Remove assignment without effect.
MFC after:	3 days
2020-05-18 19:48:38 +00:00
tuexen
ff9ac4fb92 Don't check an unsigned variable for being negative.
MFC after:		3 days.
2020-05-18 19:35:46 +00:00
tuexen
430f5449e3 Remove redundant assignment.
MFC after:		3 days
2020-05-18 19:23:01 +00:00
tuexen
3d1108ce55 Cleanup, no functional change intended.
MFC after:		3 days
2020-05-18 18:42:43 +00:00
tuexen
9a8ba47e7d Avoid an integer underflow.
MFC after:		3 days
2020-05-18 18:32:58 +00:00
tuexen
b144456332 Remove redundant check.
MFC after:		3 days
2020-05-18 18:27:10 +00:00
tuexen
9b1fe6cb48 Fix logical condition by looking at usecs.
This issue was found by cpp-check running on the userland stack.

MFC after:		3 days
2020-05-18 15:02:15 +00:00
tuexen
34d8ce7bb2 Whitespace change.
MFC after:		3 days
2020-05-18 15:00:18 +00:00
tuexen
e01d246218 Handle failures of snprintf().
MFC after:		3 days
2020-05-18 10:07:01 +00:00
tuexen
f93ce78c07 Non-functional changes, cleanups.
MFC after:		3 days
2020-05-17 22:31:38 +00:00
melifaro
b8c57b4bc0 Remove redundant checks for nhop validity.
Currently NH_IS_VALID() simly aliases to RT_LINK_IS_UP(), so we're
 checking the same thing twice.

In the near future the implementation of this check will be simpler,
 as there are plans to introduce control-plane interface status monitoring
 similar to ipfw interface tracker.
2020-05-17 15:32:36 +00:00
tuexen
2471743629 Ensure that an stcb is not dereferenced when it is about to be
freed.
This issue was found by SYZKALLER.

MFC after:		3 days
2020-05-16 19:26:39 +00:00
emaste
9c888ca65d libalias: retire cuseeme support
The CU-SeeMe videoconferencing client and associated protocol is at this
point a historical artifact; there is no need to retain support for this
protocol today.

Reviewed by:	philip, markj, allanjude
Relnotes:	Yes
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D24790
2020-05-16 02:29:10 +00:00
tuexen
17476d7506 Allow only IPv4 addresses in sendto() for TCP on AF_INET sockets.
This problem was found by looking at syzkaller reproducers for some other
problems.

Reviewed by:		rrs
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D24831
2020-05-15 14:06:37 +00:00