6470 Commits

Author SHA1 Message Date
Michael Tuexen
f799ff82fb Remove unused timer.
Submitted by:		Taylor Brandstetter
2020-02-04 14:01:07 +00:00
Michael Tuexen
bbf9f080e9 Improve numbering of debug information.
Submitted by:		Taylor Brandstetter
MFC after:		1 week
2020-02-04 12:34:16 +00:00
Conrad Meyer
8e6b06be14 netinet/libalias: Fix typo in debug message
No functional change.

PR:		243831
Submitted by:	Neel Chauhan <neel AT neelc DOT org>
Differential Revision:	https://reviews.freebsd.org/D23365
2020-02-03 05:19:44 +00:00
Gleb Smirnoff
42ce79378d Fix missing NET_EPOCH_ENTER() when compiled with TCP_OFFLOAD.
Reported by:	Coverity
CID:		1413162
2020-01-29 22:48:18 +00:00
Michael Tuexen
dc13edbc7d Fix build issues for the userland stack on 32-bit platforms.
Reported by:		Felix Weinrank
MFC after:		1 week
2020-01-28 10:09:05 +00:00
Alexander V. Chernikov
75831a1c95 Fix NOINET6 build after r357038.
Reported by:	AN <andy at neu.net>
2020-01-26 11:54:21 +00:00
Michael Tuexen
9cc711c9ff Sending CWR after an RTO is according to RFC 3168 generally required
and not only for the DCTCP congestion control.

Submitted by:		Richard Scheffenegger
Reviewed by:		rgrimes, tuexen@, Cheng Cui
MFC after:		1 week
Differential Revision:	https://reviews.freebsd.org/D23119
2020-01-25 13:45:10 +00:00
Michael Tuexen
47e2c17c12 Don't set the ECT codepoint on retransmitted packets during SACK loss
recovery. This is required by RFC 3168.

Submitted by:		Richard Scheffenegger
Reviewed by:		rgrimes@, tuexen@, Cheng Cui
MFC after:		1 week
Differential Revision:	https://reviews.freebsd.org/D23118
2020-01-25 13:34:29 +00:00
Michael Tuexen
a2d59694be As a TCP client only enable ECN when the corresponding sysctl variable
indicates that ECN should be negotiated for the client side.

Submitted by:		Richard Scheffenegger
Reviewed by:		rgrimes@, tuexen@
MFC after:		1 week
Differential Revision:	https://reviews.freebsd.org/D23228
2020-01-25 13:11:14 +00:00
Michael Tuexen
ee97681e5c Don't delay the ACK for a TCP segment with the CWR flag set.
This allows the data sender to increase the CWND faster.

Submitted by:		Richard Scheffenegger
Reviewed by:		rgrimes@, tuexen@, Cheng Cui
MFC after:		1 week
Differential Revision:	https://reviews.freebsd.org/D22670
2020-01-24 22:50:23 +00:00
Michael Tuexen
8f63a52bdb The server side of TCP fast open relies on the delayed ACK timer to allow
including user data in the SYN-ACK. When DSACK support was added in
r347382, an immediate ACK was sent even for the received SYN with
user data. This patch fixes that and allows again to send user data with
the SYN-ACK.

Reported by:		Jeremy Harris
Reviewed by:		Richard Scheffenegger, rrs@
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D23212
2020-01-24 22:37:53 +00:00
Gleb Smirnoff
e1d2b46953 Enter the network epoch when rack_output() is called in setsockopt(2). 2020-01-24 21:56:10 +00:00
Alexander V. Chernikov
75b893375f Add support for RFC 6598/Carrier Grade NAT subnets. to libalias and ipfw.
In libalias, a new flag PKT_ALIAS_UNREGISTERED_RFC6598 is added.
 This is like PKT_ALIAS_UNREGISTERED_ONLY, but also is RFC 6598 aware.
Also, we add a new NAT option to ipfw called unreg_cgn, which is like
 unreg_only, but also is RFC 6598-aware.  The reason for the new
 flags/options is to avoid breaking existing networks, especially those
 which rely on RFC 6598 as an external address.

Submitted by:	Neel Chauhan <neel AT neelc DOT org>
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D22877
2020-01-24 20:35:41 +00:00
Alexander V. Chernikov
ab15488f12 Bring indentation back to normal after r357038.
No functional changes.

MFC after:	3 weeks
2020-01-23 09:46:45 +00:00
Alexander V. Chernikov
5533ec4806 Fix epoch-related panic in ipdivert, ensuring in_broadcast() is called
within epoch.

Simplify gigantic div_output() by splitting it into 3 functions,
 handling preliminary setup, remote "ip[6]_output" case and
 local "netisr" case. Leave original indenting in most parts to ease
 diff comparison.  Indentation will be fixed by a followup commit.

Reported by:	Nick Hibma <nick at van-laarhoven.org>
Reviewed by:	glebius
Differential Revision:	https://reviews.freebsd.org/D23317
2020-01-23 09:14:28 +00:00
Gleb Smirnoff
a3b0db5b0a Plug possible calls into ip6?_output() without network epoch from SCTP
bluntly adding epoch entrance into the macro that SCTP uses to call
ip6?_output().  This definitely will introduce several epoch recursions.

Reported by:	https://syzkaller.appspot.com/bug?id=79f03f574594a5be464997310896765c458ed80a
Reported by:	https://syzkaller.appspot.com/bug?id=07c6f52106cddbe356cc2b2f3664a1c51cc0dadf
2020-01-22 17:19:53 +00:00
Bjoern A. Zeeb
7754e281c0 Fix NOINET kernels after r356983.
All gotos to the label are within the #ifdef INET section, which leaves
us with an unused label.  Cover the label under #ifdef INET as well to
avoid the warning and compile time error.
2020-01-22 15:06:59 +00:00
Alexander V. Chernikov
34a5582c47 Bring back redirect route expiration.
Redirect (and temporal) route expiration was broken a while ago.
This change brings route expiration back, with unified IPv4/IPv6 handling code.

It introduces net.inet.icmp.redirtimeout sysctl, allowing to set
 an expiration time for redirected routes. It defaults to 10 minutes,
 analogues with net.inet6.icmp6.redirtimeout.

Implementation uses separate file, route_temporal.c, as route.c is already
 bloated with tons of different functions.
Internally, expiration is implemented as an per-rnh callout scheduled when
 route with non-zero rt_expire time is added or rt_expire is changed.
 It does not add any overhead when no temporal routes are present.

Callout traverses entire routing tree under wlock, scheduling expired routes
 for deletion and calculating the next time it needs to be run. The rationale
 for such implemention is the following: typically workloads requiring large
 amount of routes have redirects turned off already, while the systems with
 small amount of routes will not inhibit large overhead during tree traversal.

This changes also fixes netstat -rn display of route expiration time, which
 has been broken since the conversion from kread() to sysctl.

Reviewed by:	bz
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D23075
2020-01-22 13:53:18 +00:00
Gleb Smirnoff
c1604fe4d2 Make in_pcbladdr() require network epoch entered by its callers. Together
with this widen network epoch coverage up to tcp_connect() and udp_connect().

Revisions from r356974 and up to this revision cover D23187.

Differential Revision:	https://reviews.freebsd.org/D23187
2020-01-22 06:10:41 +00:00
Gleb Smirnoff
e2636f0a78 Remove extraneous NET_EPOCH_ASSERT - the full function is covered. 2020-01-22 06:07:27 +00:00
Gleb Smirnoff
3fed74e90f Re-absorb tcp_detach() back into tcp_usr_detach() as the comment suggests.
Not a functional change.
2020-01-22 06:06:27 +00:00
Gleb Smirnoff
5fc8df3c49 Don't enter network epoch in tcp_usr_detach. A PCB removal doesn't
require that.
2020-01-22 06:04:56 +00:00
Gleb Smirnoff
5c722e2ad3 The network epoch changes in the TCP stack combined with old r286227,
actually make removal of a PCB not needing ipi_lock in any form.  The
ipi_list_lock is sufficient.
2020-01-22 06:03:45 +00:00
Gleb Smirnoff
7669c586da tcp_usr_attach() doesn't need network epoch. in_pcbfree() and
in_pcbdetach() perform all necessary synchronization themselves.
2020-01-22 06:01:26 +00:00
Gleb Smirnoff
6a2954a17d Relax locking requirements for in_pcballoc(). All pcbinfo fields
modified by this function are protected by the PCB list lock that is
acquired inside the function.

This could have been done even before epoch changes, after r286227.
2020-01-22 05:58:29 +00:00
Gleb Smirnoff
0f6385e705 Inline tcp_attach() into tcp_usr_attach(). Not a functional change. 2020-01-22 05:54:58 +00:00
Gleb Smirnoff
109eb549e1 Make tcp_output() require network epoch.
Enter the epoch before calling into tcp_output() from those
functions, that didn't do that before.

This eliminates a bunch of epoch recursions in TCP.
2020-01-22 05:53:16 +00:00
Gleb Smirnoff
b955545386 Make ip6_output() and ip_output() require network epoch.
All callers that before may called into these functions
without network epoch now must enter it.
2020-01-22 05:51:22 +00:00
Gleb Smirnoff
0452a1f3ef Add documenting NET_EPOCH_ASSERT() to tcp_drop(). 2020-01-22 02:38:46 +00:00
Gleb Smirnoff
bab98355f9 Add some documenting NET_EPOCH_ASSERTs. 2020-01-22 02:37:47 +00:00
Michael Tuexen
6745815d25 Remove debug code not needed anymore.
Submitted by:		Richard Scheffenegger
Reviewed by:		tuexen@
MFC after:		1 week
Differential Revision:	https://reviews.freebsd.org/D23208
2020-01-16 17:15:06 +00:00
Gleb Smirnoff
ed0282f46a A miss from r356754. 2020-01-15 06:12:39 +00:00
Gleb Smirnoff
2a4bd982d0 Introduce NET_EPOCH_CALL() macro and use it everywhere where we free
data based on the network epoch.   The macro reverses the argument
order of epoch_call(9) - first function, then its argument. NFC
2020-01-15 06:05:20 +00:00
Gleb Smirnoff
b1328235b4 Use official macro to enter/exit the network epoch. NFC 2020-01-15 05:48:36 +00:00
Gleb Smirnoff
97168be809 Mechanically substitute assertion of in_epoch(net_epoch_preempt) to
NET_EPOCH_ASSERT(). NFC
2020-01-15 05:45:27 +00:00
Gleb Smirnoff
fae994f636 Stop header pollution and don't include if_var.h via in_pcb.h. 2020-01-15 03:41:15 +00:00
Gleb Smirnoff
8fd73e9160 Since this code dereferences struct ifnet, it must include if_var.h
explicitly, not via header pollution.  While here move TCPSTATES
declaration right above the include that is going to make use of it.
2020-01-15 03:40:32 +00:00
Gleb Smirnoff
9cdc43b16e The non-preemptible network epoch identified by net_epoch isn't used.
This code definitely meant net_epoch_preempt.
2020-01-15 03:30:33 +00:00
Gleb Smirnoff
4c69f60a8e Fix yet another regression from r354484. Error code from cr_cansee()
aliases with hard error from other operations.

Reported by:	flo
2020-01-13 21:12:10 +00:00
Michael Tuexen
fe1274ee39 Fix race when accepting TCP connections.
When expanding a SYN-cache entry to a socket/inp a two step approach was
taken:
1) The local address was filled in, then the inp was added to the hash
   table.
2) The remote address was filled in and the inp was relocated in the
   hash table.
Before the epoch changes, a write lock was held when this happens and
the code looking up entries was holding a corresponding read lock.
Since the read lock is gone away after the introduction of the
epochs, the half populated inp was found during lookup.
This resulted in processing TCP segments in the context of the wrong
TCP connection.
This patch changes the above procedure in a way that the inp is fully
populated before inserted into the hash table.

Thanks to Paul <devgs@ukr.net> for reporting the issue on the net@
mailing list and for testing the patch!

Reviewed by:		rrs@
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D22971
2020-01-12 17:52:32 +00:00
Michael Tuexen
fc0eb7637c Fix division by zero issue.
Thanks to Stas Denisov for reporting the issue for the userland stack
and providing a fix.

MFC after:		3 days
2020-01-12 15:45:27 +00:00
Mateusz Guzik
879e0604ee Add KERNEL_PANICKED macro for use in place of direct panicstr tests 2020-01-12 06:07:54 +00:00
Alexander V. Chernikov
ead85fe415 Add fibnum, family and vnet pointer to each rib head.
Having metadata such as fibnum or vnet in the struct rib_head
 is handy as it eases building functionality in the routing space.
This change is required to properly bring back route redirect support.

Reviewed by:	bz
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D23047
2020-01-09 17:21:00 +00:00
Bjoern A. Zeeb
334fc5822b vnet: virtualise more network stack sysctls.
Virtualise tcp_always_keepalive, TCP and UDP log_in_vain.  All three are
set in the netoptions startup script, which we would love to run for VNETs
as well [1].

While virtualising the log_in_vain sysctls seems pointles at first for as
long as the kernel message buffer is not virtualised, it at least allows
an administrator to debug the base system or an individual jail if needed
without turning the logging on for all jails running on a system.

PR:		243193 [1]
MFC after:	2 weeks
2020-01-08 23:30:26 +00:00
Ed Maste
ee92463aca Do not define TCPOUTFLAGS in rack_bbr_common
tcp_outflags isn't used in this source file and compilation failed with
external GCC on sparc64.  I'm not sure why only that case failed (perhaps
inconsistent -Werror config) but it is a legitimate issue to fix.

Reviewed by:	tuexen
Differential Revision:	https://reviews.freebsd.org/D23068
2020-01-07 17:57:08 +00:00
Randall Stewart
4ad2473790 This catches rack up in the recent changes to ECN and
also commonizes the functions that both the freebsd and
rack stack uses.

Sponsored by:Netflix Inc
Differential Revision:	https://reviews.freebsd.org/D23052
2020-01-06 15:29:14 +00:00
Randall Stewart
a9a08eced6 This change adds a small feature to the tcp logging code. Basically
a connection can now have a separate tag added to the id.

Obtained from:	Lawrence Stewart
Sponsored by:	Netflix Inc
Differential Revision:	https://reviews.freebsd.org/D22866
2020-01-06 12:48:06 +00:00
Michael Tuexen
97a8ab398e Don't make the sendall iterator as being up if it could not be started.
MFC after:		1 week
2020-01-05 14:08:01 +00:00
Michael Tuexen
4b66d476b3 Return -1 consistently if an error occurs.
MFC after:	1 week
2020-01-05 14:06:40 +00:00
Michael Tuexen
397b1c945f Ensure that we don't miss a trigger for kicking off the SCTP iterator.
Reported by:		nwhitehorn@
MFC after:		1 week
2020-01-05 13:56:32 +00:00