Commit Graph

6854 Commits

Author SHA1 Message Date
Gleb Smirnoff
2cca4c0ee0 Remove tcp_hostcache.h. Everything is private.
Reviewed by:	rscheff
2021-04-08 10:58:44 -07:00
Richard Scheffenegger
90cca08e91 tcp: Prepare PRR to work with NewReno LossRecovery
Add proper PRR vnet declarations for consistency.
Also add pointer to tcpopt struct to tcp_do_prr_ack, in preparation
for it to deal with non-SACK window reduction (after loss).

No functional change.

MFC after: 2 weeks
Reviewed By: tuexen, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D29440
2021-04-08 19:16:31 +02:00
Richard Scheffenegger
9f2eeb0262 [tcp] Fix ECN on finalizing sessions.
A subtle oversight would subtly change new data packets
sent after a shutdown() or close() call, while the send
buffer is still draining.

MFC after: 3 days
Reviewed By: #transport, tuexen
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D29616
2021-04-08 15:26:09 +02:00
Mark Johnston
274579831b capsicum: Limit socket operations in capability mode
Capsicum did not prevent certain privileged networking operations,
specifically creation of raw sockets and network configuration ioctls.
However, these facilities can be used to circumvent some of the
restrictions that capability mode is supposed to enforce.

Add capability mode checks to disallow network configuration ioctls and
creation of sockets other than PF_LOCAL and SOCK_DGRAM/STREAM/SEQPACKET
internet sockets.

Reviewed by:	oshogbo
Discussed with:	emaste
Reported by:	manu
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D29423
2021-04-07 14:32:56 -04:00
Richard Scheffenegger
a04906f027 fix typo in 38ea2bd069 2021-04-02 20:34:33 +02:00
Richard Scheffenegger
38ea2bd069 Use sbuf_drain unconditionally
After making sbuf_drain safe for external use,
there is no need to protect the call.

MFC after: 2 weeks
Reviewed By: tuexen, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D29545
2021-04-02 20:27:46 +02:00
Richard Scheffenegger
9aef4e7c2b tcp: Shouldn't drain empty sbuf
MFC after: 2 weeks
Reviewed By: tuexen, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D29524
2021-04-01 17:18:38 +02:00
Richard Scheffenegger
02f26e98c7 tcp: Add hash histogram output and validate bucket length accounting
Provide a histogram output to check, if the hashsize or
bucketlimit could be optimized. Also add some basic sanity
checks around the accounting of the hash utilization.

MFC after: 2 weeks
Reviewed By: tuexen, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D29506
2021-04-01 14:44:14 +02:00
Richard Scheffenegger
529a2a0f27 tcp: For hostcache performance, use atomics instead of counters
As accessing the tcp hostcache happens frequently on some
classes of servers, it was recommended to use atomic_add/subtract
rather than (per-CPU distributed) counters, which have to be
summed up at high cost to cache efficiency.

PR: 254333
MFC after: 2 weeks
Sponsored by: NetApp, Inc.
Reviewed By: #transport, tuexen, jtl
Differential Revision: https://reviews.freebsd.org/D29522
2021-04-01 10:03:30 +02:00
Richard Scheffenegger
95e56d31e3 tcp: Make hostcache.cache_count MPSAFE by using a counter_u64_t
Addressing the underlying root cause for cache_count to
show unexpectedly high  values, by protecting all arithmetic on
that global variable by using counter(9).

PR:		254333
Reviewed By: tuexen, #transport
MFC after: 2 weeks
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D29510
2021-03-31 20:24:13 +02:00
Richard Scheffenegger
869880463c tcp: drain tcp_hostcache_list in between per-bucket locks
Explicitly drain the sbuf after completing each hash bucket
to minimize the work performed while holding the hash
bucket lock.

PR:		254333
MFC after:	2 weeks
Reviewed By:	tuexen, jhb, #transport
Sponsored by: 	NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D29483
2021-03-31 19:24:21 +02:00
Andrey V. Elsukov
c80a4b76ce ipdivert: check that PCB is still valid after taking INPCB_RLOCK.
We are inspecting PCBs of divert sockets under NET_EPOCH section,
but PCB could be already detached and we should check INP_FREED flag
when we took INP_RLOCK.

PR:		254478
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D29420
2021-03-30 12:31:09 +03:00
Richard Scheffenegger
cb0dd7e122 tcp: reduce memory footprint when listing tcp hostcache
In tcp_hostcache_list, the sbuf used would need a large (~2MB)
blocking allocation of memory (M_WAITOK), when listing a
full hostcache. This may stall the requestor for an indeterminate
time.

A further optimization is to return the expected userspace
buffersize right away, rather than preparing the output of
each current entry of the hostcase, provided by: @tuexen.

This makes use of the ready-made functions of sbuf to work
with sysctl, and repeatedly drain the much smaller buffer.

PR: 254333
MFC after: 2 weeks
Reviewed By: #transport, tuexen
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D29471
2021-03-28 23:50:23 +02:00
Richard Scheffenegger
b9f803b7d4 tcp: Use PRR for ECN congestion recovery
MFC after: 2 weeks
Reviewed By: #transport, rrs
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D28972
2021-03-26 02:06:15 +01:00
Richard Scheffenegger
eb3a59a831 tcp: Refactor PRR code
No functional change intended.

MFC after: 2 weeks
Reviewed By: #transport, rrs
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D29411
2021-03-26 00:01:34 +01:00
Richard Scheffenegger
0533fab89e tcp: Perform simple fast retransmit when SACK Blocks are missing on SACK session
MFC after: 2 weeks
Reviewed By: #transport, rrs
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D28634
2021-03-25 23:23:48 +01:00
Michael Tuexen
d995cc7e54 sctp: fix handling of RTO.initial of 1 ms
MFC after:	3 days
Reported by:	syzbot+5eb0e009147050056ce9@syzkaller.appspotmail.com
2021-03-22 16:44:18 +01:00
Michael Tuexen
40f41ece76 tcp: improve handling of SYN segments in SYN-SENT state
Ensure that the stack does not generate a DSACK block for user
data received on a SYN segment in SYN-SENT state.

Reviewed by:		rscheff
MFC after:		3 days
Differential Revision:	https://reviews.freebsd.org/D29376
Sponsored by:		Netflix, Inc.
2021-03-22 15:58:49 +01:00
Richard Scheffenegger
e9f029831f fix panic when rescue retransmission and FIN overlap
PR:           254244
PR:           254309
Reviewed By:  #transport, hselasky, tuexen
MFC after:    3 days
Sponsored By: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D29315
2021-03-17 17:12:04 +01:00
Gordon Bergling
5666643a95 Fix some common typos in comments
- occured -> occurred
- normaly -> normally
- controling -> controlling
- fileds -> fields
- insterted -> inserted
- outputing -> outputting

MFC after:	1 week
2021-03-13 18:26:15 +01:00
Gordon Bergling
183502d162 Fix a few typos in comments
- trough -> through

MFC after:	1 week
2021-03-13 16:37:28 +01:00
John Baldwin
5a50eb6585 Don't pass RFPROC to kproc_create(), it is redundant.
Reviewed by:	tuexen, kib
MFC after:	1 week
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D29206
2021-03-12 09:48:10 -08:00
Alexander V. Chernikov
b1d63265ac Flush remaining routes from the routing table during VNET shutdown.
Summary:
This fixes rtentry leak for the cloned interfaces created inside the
 VNET.

PR:	253998
Reported by:	rashey at superbox.pl
MFC after:	3 days

Loopback teardown order is `SI_SUB_INIT_IF`, which happens after `SI_SUB_PROTO_DOMAIN` (route table teardown).
Thus, any route table operations are too late to schedule.
As the intent of the vnet teardown procedures to minimise the amount of effort by doing global cleanups instead of per-interface ones, address this by adding a relatively light-weight routing table cleanup function, `rib_flush_routes()`.
It removes all remaining routes from the routing table and schedules the deletion, which will happen later, when `rtables_destroy()` waits for the current epoch to finish.

Test Plan:
```
set_skip:set_skip_group_lo  ->  passed  [0.053s]
tail -n 200 /var/log/messages | grep rtentry
```

Reviewers: #network, kp, bz

Reviewed By: kp

Subscribers: imp, ae

Differential Revision: https://reviews.freebsd.org/D29116
2021-03-10 21:10:14 +00:00
Richard Scheffenegger
e53138694a tcp: Add prr_out in preparation for PRR/nonSACK and LRD
Reviewed By:           #transport, kbowling
MFC after:             3 days
Sponsored By:          Netapp, Inc.
Differential Revision: https://reviews.freebsd.org/D29058
2021-03-06 00:38:22 +01:00
Richard Scheffenegger
9a13d9dcee tcp: remove a superfluous local var in tcp_sack_partialack()
No functional change.

Reviewed By: #transport, tuexen
MFC after:   3 days
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D29088
2021-03-05 18:20:23 +01:00
Richard Scheffenegger
4a8f3aad37 tcp: remove incorrect reset of SACK variable in PRR
Reviewed By:   #transport, rrs, tuexen
PR:            253848
MFC after:     3 days
Sponsored By:  NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D29083
2021-03-05 17:45:54 +01:00
Michael Tuexen
705d06b289 rack: unbreak TCP fast open for the client side
Allow sending user data on the SYN segment.

Reviewed by:		rrs
MFC after:		3 days
Differential Revision:	https://reviews.freebsd.org/D29082
Sponsored by:		Netflix, Inc.
2021-03-05 16:03:03 +01:00
Kristof Provost
bb4a7d94b9 net: Introduce IPV6_DSCP(), IPV6_ECN() and IPV6_TRAFFIC_CLASS() macros
Introduce convenience macros to retrieve the DSCP, ECN or traffic class
bits from an IPv6 header.

Use them where appropriate.

Reviewed by:	ae (previous version), rscheff, tuexen, rgrimes
MFC after:	2 weeks
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D29056
2021-03-04 20:56:48 +01:00
Michael Tuexen
99adf23006 RACK: fix an issue triggered by using the CDG CC module
Obtained from:		rrs@
MFC after:		3 days
PR:			238741
Sponsored by:		Netlix, Inc.
2021-03-02 12:32:16 +01:00
Richard Scheffenegger
0b0f8b359d calculate prr_out correctly when pipe < ssthresh
Reviewed By:	#transport, tuexen
MFC after:	3 days
Sponsored by:	NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D28998
2021-03-01 16:26:05 +01:00
Richard Scheffenegger
e9071000c9 Improve PRR initial transmission timing
Reviewed By:	tuexen, #transport
MFC after:	3 days
Sponsored by:	NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D28953
2021-02-28 15:46:54 +01:00
Michael Tuexen
70e95f0b69 sctp: avoid integer overflow when starting the HB timer
MFC after:	3 days
Reported by:	syzbot+14b9d7c3c64208fae62f@syzkaller.appspotmail.com
2021-02-27 23:27:30 +01:00
Richard Scheffenegger
9e83a6a556 Include new data sent in PRR calculation
Reviewed By:	#transport, kbowling
MFC after:	3 days
Sponsored by:	NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D28941
2021-02-26 22:31:58 +01:00
Richard Scheffenegger
2593f858d7 A TCP server has to take into consideration, if TCP_NOOPT is preventing
the negotiation of TCP features. This affects most TCP options but
adherance to RFC7323 with the timestamp option will prevent a session
from getting established.

PR:	253576
Reviewed By:	tuexen, #transport
MFC after:	3 days
Sponsored by:	NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D28652
2021-02-25 19:12:20 +01:00
Richard Scheffenegger
31d7a27c6e PRR: Avoid accounting left-edge twice in partial ACK.
Reviewed By:	#transport, kbowling
MFC after:	3 days
Sponsored by:	NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D28819
2021-02-25 18:37:47 +01:00
Richard Scheffenegger
48396dc779 Address two incorrect calculations and enhance readability of PRR code
- address second instance of cwnd potentially becoming zero
- fix sublte bug due to implicit int to uint typecase in max()
- fix bug due to typo in hand-coded CEILING() function by using howmany() macro
- use int instead of long, and add a missing long typecast
- replace if conditionals with easier to read imax/imin (as in pseudocode)

Reviewed By: #transport, kbowling
MFC after: 3 days
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D28813
2021-02-25 18:32:04 +01:00
Kristof Provost
f3245be349 net: remove legacy in_addmulti()
Despite the comment to the contrary neither pf nor carp use
in_addmulti(). Nothing does, so get rid of it.

Carp stopped using it in 08b68b0e4c
(2011). It's unclear when pf stopped using it, but before
d6d3f01e0a (2012).

Reviewed by:	bz@, melifaro@
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D28918
2021-02-25 10:13:52 +01:00
Kristof Provost
c139b3c19b arp/nd: Cope with late calls to iflladdr_event
When tearing down vnet jails we can move an if_bridge out (as
part of the normal vnet_if_return()). This can, when it's clearing out
its list of member interfaces, change its link layer address.
That sends an iflladdr_event, but at that point we've already freed the
AF_INET/AF_INET6 if_afdata pointers.

In other words: when the iflladdr_event callbacks fire we can't assume
that ifp->if_afdata[AF_INET] will be set.

Reviewed by:	donner@, melifaro@
MFC after:	1 week
Sponsored by:	Orange Business Services
Differential Revision:	https://reviews.freebsd.org/D28860
2021-02-23 13:54:07 +01:00
Hans Petter Selasky
9febbc4541 Fix for natd(8) sending wrong sequence number after TCP retransmission,
terminating a TCP connection.

If a TCP packet must be retransmitted and the data length has changed in the
retransmitted packet, due to the internal workings of TCP, typically when ACK
packets are lost, then there is a 30% chance that the logic in GetDeltaSeqOut()
will find the correct length, which is the last length received.

This can be explained as follows:

If a "227 Entering Passive Mode" packet must be retransmittet and the length
changes from 51 to 50 bytes, for example, then we have three cases for the
list scan in GetDeltaSeqOut(), depending on how many prior packets were
received modulus N_LINK_TCP_DATA=3:

  case 1:  index 0:   original packet        51
           index 1:   retransmitted packet   50
           index 2:   not relevant

  case 2:  index 0:   not relevant
           index 1:   original packet        51
           index 2:   retransmitted packet   50

  case 3:  index 0:   retransmitted packet   50
           index 1:   not relevant
           index 2:   original packet        51

This patch simply changes the searching order for TCP packets, always starting
at the last received packet instead of any received packet, in
GetDeltaAckIn() and GetDeltaSeqOut().

Else no functional changes.

Discussed with:	rscheff@
Submitted by:	Andreas Longwitz <longwitz@incore.de>
PR:		230755
MFC after:	1 week
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2021-02-22 17:13:58 +01:00
Michael Tuexen
b963ce4588 sctp: improve computation of an alternate net
Espeially handle the case where the net passed in is about to
be deleted and therefore not in the list of nets anymore.

MFC after:	3 days
Reported by:	syzbot+9756917a7c8381adf5e8@syzkaller.appspotmail.com
2021-02-21 17:13:06 +01:00
Michael Tuexen
5ac839029d sctp: clear a pointer to a net which will be removed
MFC after:	3 days
2021-02-21 13:06:05 +01:00
Richard Scheffenegger
a8e431e153 PRR: use accurate rfc6675_pipe when enabled
Reviewed By: #transport, tuexen
MFC after:   2 weeks
Sponsored by:        NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D28816
2021-02-20 20:11:48 +01:00
Richard Scheffenegger
853fd7a2e3 Ensure cwnd doesn't shrink to zero with PRR
Under some circumstances, PRR may end up with a fully
collapsed cwnd when finalizing the loss recovery.

Reviewed By:	#transport, kbowling
Reported by:	Liang Tian
MFC after:	1 week
Sponsored by:	NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D28780
2021-02-19 13:55:32 +01:00
Kyle Evans
4c0bef07be kern: net: remove TCP_LINGERTIME
TCP_LINGERTIME can be traced back to BSD 4.4 Lite and perhaps beyond, in
exactly the same form that it appears here modulo slightly different
context.  It used to be the case that there was a single pr_usrreq
method with requests dispatched to it; these exact two lines appeared in
tcp_usrreq's PRU_ATTACH handling.

The only purpose of this that I can find is to cause surprising behavior
on accepted connections. Newly-created sockets will never hit these
paths as one cannot set SO_LINGER prior to socket(2). If SO_LINGER is
set on a listening socket and inherited, one would expect the timeout to
be inherited rather than changed arbitrarily like this -- noting that
SO_LINGER is nonsense on a listening socket beyond inheritance, since
they cannot be 'connected' by definition.

Neither Illumos nor Linux reset the timer like this based on testing and
inspection of Illumos, and testing of Linux.

Reviewed by:	rscheff, tuexen
Differential Revision:	https://reviews.freebsd.org/D28265
2021-02-18 22:36:01 -06:00
Randall Stewart
e13e4fa6c4 fix Navdeeps LINT_NOINET error. 2021-02-18 07:29:12 -05:00
Randall Stewart
0a4f851074 Fix another pesky missing #ifdef TCPHPTS 2021-02-18 01:27:30 -05:00
Randall Stewart
ab4fad4be1 Add ifdef TCPHPTS around build_ack_entry and do_bpf_and_csum to avoid
warnings when HPTS is not included

Thanks to Gary Jennejohn for pointing this out.
2021-02-17 12:49:42 -05:00
Randall Stewart
69a34e8d02 Update the LRO processing code so that we can support
a further CPU enhancements for compressed acks. These
are acks that are compressed into an mbuf. The transport
has to be aware of how to process these, and an upcoming
update to rack will do so. You need the rack changes
to actually test and validate these since if the transport
does not support mbuf compression, then the old code paths
stay in place. We do in this commit take out the concept
of logging if you don't have a lock (which was quite
dangerous and was only for some early debugging but has
been left in the code).

Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D28374
2021-02-17 10:41:01 -05:00
Alexander V. Chernikov
9fdbf7eef5 Make in_localip_more() fib-aware.
It fixes loopback route installation for the interfaces
 in the different fibs using the same prefix.

Reviewed By:	donner
PR:		189088
Differential Revision: https://reviews.freebsd.org/D28673
MFC after:	1 week
2021-02-16 20:00:46 +00:00
Richard Scheffenegger
3c40e1d52c update the SACK loss recovery to RFC6675, with the following new features:
- improved pipe calculation which does not degrade under heavy loss
- engaging in Loss Recovery earlier under adverse conditions
- Rescue Retransmission in case some of the trailing packets of a request got lost

All above changes are toggled with the sysctl "rfc6675_pipe" (disabled by default).

Reviewers:	#transport, tuexen, lstewart, slavash, jtl, hselasky, kib, rgrimes, chengc_netapp.com, thj, #manpages, kbowling, #netapp, rscheff
Reviewed By:	#transport
Subscribers:	imp, melifaro
MFC after:	2 weeks
Sponsored by:	NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D18985
2021-02-16 13:08:37 +01:00