Commit Graph

6397 Commits

Author SHA1 Message Date
melifaro
0f2aacbaf5 Fix NOINET6 build after r357038.
Reported by:	AN <andy at neu.net>
2020-01-26 11:54:21 +00:00
tuexen
f779ce9ed3 Sending CWR after an RTO is according to RFC 3168 generally required
and not only for the DCTCP congestion control.

Submitted by:		Richard Scheffenegger
Reviewed by:		rgrimes, tuexen@, Cheng Cui
MFC after:		1 week
Differential Revision:	https://reviews.freebsd.org/D23119
2020-01-25 13:45:10 +00:00
tuexen
e6f7ffd056 Don't set the ECT codepoint on retransmitted packets during SACK loss
recovery. This is required by RFC 3168.

Submitted by:		Richard Scheffenegger
Reviewed by:		rgrimes@, tuexen@, Cheng Cui
MFC after:		1 week
Differential Revision:	https://reviews.freebsd.org/D23118
2020-01-25 13:34:29 +00:00
tuexen
bc51b4b2b5 As a TCP client only enable ECN when the corresponding sysctl variable
indicates that ECN should be negotiated for the client side.

Submitted by:		Richard Scheffenegger
Reviewed by:		rgrimes@, tuexen@
MFC after:		1 week
Differential Revision:	https://reviews.freebsd.org/D23228
2020-01-25 13:11:14 +00:00
tuexen
77b1d6dcde Don't delay the ACK for a TCP segment with the CWR flag set.
This allows the data sender to increase the CWND faster.

Submitted by:		Richard Scheffenegger
Reviewed by:		rgrimes@, tuexen@, Cheng Cui
MFC after:		1 week
Differential Revision:	https://reviews.freebsd.org/D22670
2020-01-24 22:50:23 +00:00
tuexen
68ae78ed36 The server side of TCP fast open relies on the delayed ACK timer to allow
including user data in the SYN-ACK. When DSACK support was added in
r347382, an immediate ACK was sent even for the received SYN with
user data. This patch fixes that and allows again to send user data with
the SYN-ACK.

Reported by:		Jeremy Harris
Reviewed by:		Richard Scheffenegger, rrs@
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D23212
2020-01-24 22:37:53 +00:00
glebius
9aa8e6fbcd Enter the network epoch when rack_output() is called in setsockopt(2). 2020-01-24 21:56:10 +00:00
melifaro
20aa310e22 Add support for RFC 6598/Carrier Grade NAT subnets. to libalias and ipfw.
In libalias, a new flag PKT_ALIAS_UNREGISTERED_RFC6598 is added.
 This is like PKT_ALIAS_UNREGISTERED_ONLY, but also is RFC 6598 aware.
Also, we add a new NAT option to ipfw called unreg_cgn, which is like
 unreg_only, but also is RFC 6598-aware.  The reason for the new
 flags/options is to avoid breaking existing networks, especially those
 which rely on RFC 6598 as an external address.

Submitted by:	Neel Chauhan <neel AT neelc DOT org>
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D22877
2020-01-24 20:35:41 +00:00
melifaro
40d93a3852 Bring indentation back to normal after r357038.
No functional changes.

MFC after:	3 weeks
2020-01-23 09:46:45 +00:00
melifaro
d49a1663d6 Fix epoch-related panic in ipdivert, ensuring in_broadcast() is called
within epoch.

Simplify gigantic div_output() by splitting it into 3 functions,
 handling preliminary setup, remote "ip[6]_output" case and
 local "netisr" case. Leave original indenting in most parts to ease
 diff comparison.  Indentation will be fixed by a followup commit.

Reported by:	Nick Hibma <nick at van-laarhoven.org>
Reviewed by:	glebius
Differential Revision:	https://reviews.freebsd.org/D23317
2020-01-23 09:14:28 +00:00
glebius
afbc3286fa Plug possible calls into ip6?_output() without network epoch from SCTP
bluntly adding epoch entrance into the macro that SCTP uses to call
ip6?_output().  This definitely will introduce several epoch recursions.

Reported by:	https://syzkaller.appspot.com/bug?id=79f03f574594a5be464997310896765c458ed80a
Reported by:	https://syzkaller.appspot.com/bug?id=07c6f52106cddbe356cc2b2f3664a1c51cc0dadf
2020-01-22 17:19:53 +00:00
bz
a177f76cd2 Fix NOINET kernels after r356983.
All gotos to the label are within the #ifdef INET section, which leaves
us with an unused label.  Cover the label under #ifdef INET as well to
avoid the warning and compile time error.
2020-01-22 15:06:59 +00:00
melifaro
b17accbc49 Bring back redirect route expiration.
Redirect (and temporal) route expiration was broken a while ago.
This change brings route expiration back, with unified IPv4/IPv6 handling code.

It introduces net.inet.icmp.redirtimeout sysctl, allowing to set
 an expiration time for redirected routes. It defaults to 10 minutes,
 analogues with net.inet6.icmp6.redirtimeout.

Implementation uses separate file, route_temporal.c, as route.c is already
 bloated with tons of different functions.
Internally, expiration is implemented as an per-rnh callout scheduled when
 route with non-zero rt_expire time is added or rt_expire is changed.
 It does not add any overhead when no temporal routes are present.

Callout traverses entire routing tree under wlock, scheduling expired routes
 for deletion and calculating the next time it needs to be run. The rationale
 for such implemention is the following: typically workloads requiring large
 amount of routes have redirects turned off already, while the systems with
 small amount of routes will not inhibit large overhead during tree traversal.

This changes also fixes netstat -rn display of route expiration time, which
 has been broken since the conversion from kread() to sysctl.

Reviewed by:	bz
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D23075
2020-01-22 13:53:18 +00:00
glebius
313adad5b8 Make in_pcbladdr() require network epoch entered by its callers. Together
with this widen network epoch coverage up to tcp_connect() and udp_connect().

Revisions from r356974 and up to this revision cover D23187.

Differential Revision:	https://reviews.freebsd.org/D23187
2020-01-22 06:10:41 +00:00
glebius
57cf9e8fd6 Remove extraneous NET_EPOCH_ASSERT - the full function is covered. 2020-01-22 06:07:27 +00:00
glebius
9aca735515 Re-absorb tcp_detach() back into tcp_usr_detach() as the comment suggests.
Not a functional change.
2020-01-22 06:06:27 +00:00
glebius
57981ac130 Don't enter network epoch in tcp_usr_detach. A PCB removal doesn't
require that.
2020-01-22 06:04:56 +00:00
glebius
2a7af08431 The network epoch changes in the TCP stack combined with old r286227,
actually make removal of a PCB not needing ipi_lock in any form.  The
ipi_list_lock is sufficient.
2020-01-22 06:03:45 +00:00
glebius
2e901d0ed4 tcp_usr_attach() doesn't need network epoch. in_pcbfree() and
in_pcbdetach() perform all necessary synchronization themselves.
2020-01-22 06:01:26 +00:00
glebius
a7f275d187 Relax locking requirements for in_pcballoc(). All pcbinfo fields
modified by this function are protected by the PCB list lock that is
acquired inside the function.

This could have been done even before epoch changes, after r286227.
2020-01-22 05:58:29 +00:00
glebius
1019be5656 Inline tcp_attach() into tcp_usr_attach(). Not a functional change. 2020-01-22 05:54:58 +00:00
glebius
b38bde736a Make tcp_output() require network epoch.
Enter the epoch before calling into tcp_output() from those
functions, that didn't do that before.

This eliminates a bunch of epoch recursions in TCP.
2020-01-22 05:53:16 +00:00
glebius
9879d0d15d Make ip6_output() and ip_output() require network epoch.
All callers that before may called into these functions
without network epoch now must enter it.
2020-01-22 05:51:22 +00:00
glebius
5fe4c30cb8 Add documenting NET_EPOCH_ASSERT() to tcp_drop(). 2020-01-22 02:38:46 +00:00
glebius
f58ea78da0 Add some documenting NET_EPOCH_ASSERTs. 2020-01-22 02:37:47 +00:00
tuexen
e51fa01579 Remove debug code not needed anymore.
Submitted by:		Richard Scheffenegger
Reviewed by:		tuexen@
MFC after:		1 week
Differential Revision:	https://reviews.freebsd.org/D23208
2020-01-16 17:15:06 +00:00
glebius
3402b0a5ce A miss from r356754. 2020-01-15 06:12:39 +00:00
glebius
5057bc60d0 Introduce NET_EPOCH_CALL() macro and use it everywhere where we free
data based on the network epoch.   The macro reverses the argument
order of epoch_call(9) - first function, then its argument. NFC
2020-01-15 06:05:20 +00:00
glebius
f4fefae588 Use official macro to enter/exit the network epoch. NFC 2020-01-15 05:48:36 +00:00
glebius
044cd3b935 Mechanically substitute assertion of in_epoch(net_epoch_preempt) to
NET_EPOCH_ASSERT(). NFC
2020-01-15 05:45:27 +00:00
glebius
9aa583df20 Stop header pollution and don't include if_var.h via in_pcb.h. 2020-01-15 03:41:15 +00:00
glebius
a470835e82 Since this code dereferences struct ifnet, it must include if_var.h
explicitly, not via header pollution.  While here move TCPSTATES
declaration right above the include that is going to make use of it.
2020-01-15 03:40:32 +00:00
glebius
c2e0b506d8 The non-preemptible network epoch identified by net_epoch isn't used.
This code definitely meant net_epoch_preempt.
2020-01-15 03:30:33 +00:00
glebius
d48ad489ca Fix yet another regression from r354484. Error code from cr_cansee()
aliases with hard error from other operations.

Reported by:	flo
2020-01-13 21:12:10 +00:00
tuexen
58c0da1e84 Fix race when accepting TCP connections.
When expanding a SYN-cache entry to a socket/inp a two step approach was
taken:
1) The local address was filled in, then the inp was added to the hash
   table.
2) The remote address was filled in and the inp was relocated in the
   hash table.
Before the epoch changes, a write lock was held when this happens and
the code looking up entries was holding a corresponding read lock.
Since the read lock is gone away after the introduction of the
epochs, the half populated inp was found during lookup.
This resulted in processing TCP segments in the context of the wrong
TCP connection.
This patch changes the above procedure in a way that the inp is fully
populated before inserted into the hash table.

Thanks to Paul <devgs@ukr.net> for reporting the issue on the net@
mailing list and for testing the patch!

Reviewed by:		rrs@
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D22971
2020-01-12 17:52:32 +00:00
tuexen
9698d1b344 Fix division by zero issue.
Thanks to Stas Denisov for reporting the issue for the userland stack
and providing a fix.

MFC after:		3 days
2020-01-12 15:45:27 +00:00
mjg
165ba25434 Add KERNEL_PANICKED macro for use in place of direct panicstr tests 2020-01-12 06:07:54 +00:00
melifaro
af1352e631 Add fibnum, family and vnet pointer to each rib head.
Having metadata such as fibnum or vnet in the struct rib_head
 is handy as it eases building functionality in the routing space.
This change is required to properly bring back route redirect support.

Reviewed by:	bz
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D23047
2020-01-09 17:21:00 +00:00
bz
26a90b3ee8 vnet: virtualise more network stack sysctls.
Virtualise tcp_always_keepalive, TCP and UDP log_in_vain.  All three are
set in the netoptions startup script, which we would love to run for VNETs
as well [1].

While virtualising the log_in_vain sysctls seems pointles at first for as
long as the kernel message buffer is not virtualised, it at least allows
an administrator to debug the base system or an individual jail if needed
without turning the logging on for all jails running on a system.

PR:		243193 [1]
MFC after:	2 weeks
2020-01-08 23:30:26 +00:00
emaste
201b368c14 Do not define TCPOUTFLAGS in rack_bbr_common
tcp_outflags isn't used in this source file and compilation failed with
external GCC on sparc64.  I'm not sure why only that case failed (perhaps
inconsistent -Werror config) but it is a legitimate issue to fix.

Reviewed by:	tuexen
Differential Revision:	https://reviews.freebsd.org/D23068
2020-01-07 17:57:08 +00:00
rrs
c2ceba81e5 This catches rack up in the recent changes to ECN and
also commonizes the functions that both the freebsd and
rack stack uses.

Sponsored by:Netflix Inc
Differential Revision:	https://reviews.freebsd.org/D23052
2020-01-06 15:29:14 +00:00
rrs
583431074c This change adds a small feature to the tcp logging code. Basically
a connection can now have a separate tag added to the id.

Obtained from:	Lawrence Stewart
Sponsored by:	Netflix Inc
Differential Revision:	https://reviews.freebsd.org/D22866
2020-01-06 12:48:06 +00:00
tuexen
740267e0fe Don't make the sendall iterator as being up if it could not be started.
MFC after:		1 week
2020-01-05 14:08:01 +00:00
tuexen
c2e0570386 Return -1 consistently if an error occurs.
MFC after:	1 week
2020-01-05 14:06:40 +00:00
tuexen
02e5dff4f0 Ensure that we don't miss a trigger for kicking off the SCTP iterator.
Reported by:		nwhitehorn@
MFC after:		1 week
2020-01-05 13:56:32 +00:00
tuexen
9dec5821df Make the message size limit used for SCTP_SENDALL configurable via
a sysctl variable instead of a compiled in constant.

This is based on a patch provided by nwhitehorn@.
2020-01-04 20:33:12 +00:00
markj
bc00abc550 Take the ifnet's address lock in igmp_v3_cancel_link_timers().
inm_rele_locked() may remove the multicast address associated with inm.

Reported by:	syzbot+871c5d1fd5fac6c28f52@syzkaller.appspotmail.com
Reviewed by:	hselasky
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D23009
2020-01-03 17:03:10 +00:00
tuexen
bc4e8390c8 Remove empty line which was added in r356270 by accident.
MFC after:		1 week
2020-01-02 14:04:16 +00:00
tuexen
1728a3713c Improve input validation of the spp_pathmtu field in the
SCTP_PEER_ADDR_PARAMS socket option. The code in the stack assumes
sane values for the MTU.

This issue was found by running an instance of syzkaller.

MFC after:		1 week
2020-01-02 13:55:10 +00:00
glebius
ef9a657efe In r343631 error code for a packet blocked by a firewall was
changed from EACCES to EPERM.  This change was not intentional,
so fix that.  Return EACCESS if a firewall forbids sending.

Noticed by:	ae
2020-01-01 17:31:43 +00:00