Commit Graph

80 Commits

Author SHA1 Message Date
Gordon Bergling
b4aa9cb217 tcp(4): Fix a typo in a sysctl description
- s/entires/entries/

MFC after:	3 days
2021-11-30 07:17:30 +01:00
Gleb Smirnoff
d554522f6e tcp_hostcache: use SMR for lookups, mutex(9) for updates.
In certain cases, e.g. a SYN-flood from a limited set of hosts,
the TCP hostcache becomes the main contention point. To solve
that, this change introduces lockless lookups on the hostcache.

The cache remains a hash, however buckets are now CK_SLIST. For
updates a bucket mutex is obtained, for read an SMR section is
entered.

Reviewed by:	markj, rscheff
Differential revision:	https://reviews.freebsd.org/D29729
2021-04-20 10:02:20 -07:00
Gleb Smirnoff
faa9ad8a90 Fix off-by-one error in KASSERT from 02f26e98c7. 2021-04-19 17:20:19 -07:00
Gleb Smirnoff
1a7fe55ab8 tcp_hostcache: make THC_LOCK/UNLOCK macros to work with hash head pointer.
Not a functional change.
2021-04-09 14:07:35 -07:00
Gleb Smirnoff
4f49e3382f tcp_hostcache: style(9)
Reviewed by:	rscheff
2021-04-09 14:07:27 -07:00
Gleb Smirnoff
7c71f3bd6a tcp_hostcache: remove extraneous check.
All paths leading here already checked this setting.

Reviewed by:	rscheff
2021-04-09 14:07:19 -07:00
Gleb Smirnoff
0c25bf7e7c tcp_hostcache: implement tcp_hc_updatemtu() via tcp_hc_update.
Locking changes are planned here, and without this change too
much copy-and-paste would be between these two functions.

Reviewed by:	rscheff
2021-04-09 14:06:44 -07:00
Richard Scheffenegger
b878ec024b tcp: Use jenkins_hash32() in hostcache
As other parts of the base tcp stack (eg.
tcp fastopen) already use jenkins_hash32,
and the properties appear reasonably good,
switching to use that.

Reviewed By: tuexen, #transport, ae
MFC after: 2 weeks
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D29515
2021-04-08 20:29:19 +02:00
Gleb Smirnoff
373ffc62c1 tcp_hostcache.c: remove unneeded includes.
Reviewed by:	rscheff
2021-04-08 10:58:44 -07:00
Gleb Smirnoff
29acb54393 tcp_hostcache: add bool argument for tcp_hc_lookup() to tell are we
looking to only read from the result, or to update it as well.
For now doesn't affect locking, but allows to push stats and expire
update into single place.

Reviewed by:	rscheff
2021-04-08 10:58:44 -07:00
Gleb Smirnoff
489bde5753 tcp_hostcache: hide rmx_hits/rmx_updates under ifdef.
They have little value unless you do some profiling investigations,
but they are performance bottleneck.

Reviewed by:	rscheff
2021-04-08 10:58:44 -07:00
Gleb Smirnoff
2cca4c0ee0 Remove tcp_hostcache.h. Everything is private.
Reviewed by:	rscheff
2021-04-08 10:58:44 -07:00
Richard Scheffenegger
a04906f027 fix typo in 38ea2bd069 2021-04-02 20:34:33 +02:00
Richard Scheffenegger
38ea2bd069 Use sbuf_drain unconditionally
After making sbuf_drain safe for external use,
there is no need to protect the call.

MFC after: 2 weeks
Reviewed By: tuexen, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D29545
2021-04-02 20:27:46 +02:00
Richard Scheffenegger
9aef4e7c2b tcp: Shouldn't drain empty sbuf
MFC after: 2 weeks
Reviewed By: tuexen, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D29524
2021-04-01 17:18:38 +02:00
Richard Scheffenegger
02f26e98c7 tcp: Add hash histogram output and validate bucket length accounting
Provide a histogram output to check, if the hashsize or
bucketlimit could be optimized. Also add some basic sanity
checks around the accounting of the hash utilization.

MFC after: 2 weeks
Reviewed By: tuexen, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D29506
2021-04-01 14:44:14 +02:00
Richard Scheffenegger
529a2a0f27 tcp: For hostcache performance, use atomics instead of counters
As accessing the tcp hostcache happens frequently on some
classes of servers, it was recommended to use atomic_add/subtract
rather than (per-CPU distributed) counters, which have to be
summed up at high cost to cache efficiency.

PR: 254333
MFC after: 2 weeks
Sponsored by: NetApp, Inc.
Reviewed By: #transport, tuexen, jtl
Differential Revision: https://reviews.freebsd.org/D29522
2021-04-01 10:03:30 +02:00
Richard Scheffenegger
95e56d31e3 tcp: Make hostcache.cache_count MPSAFE by using a counter_u64_t
Addressing the underlying root cause for cache_count to
show unexpectedly high  values, by protecting all arithmetic on
that global variable by using counter(9).

PR:		254333
Reviewed By: tuexen, #transport
MFC after: 2 weeks
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D29510
2021-03-31 20:24:13 +02:00
Richard Scheffenegger
869880463c tcp: drain tcp_hostcache_list in between per-bucket locks
Explicitly drain the sbuf after completing each hash bucket
to minimize the work performed while holding the hash
bucket lock.

PR:		254333
MFC after:	2 weeks
Reviewed By:	tuexen, jhb, #transport
Sponsored by: 	NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D29483
2021-03-31 19:24:21 +02:00
Richard Scheffenegger
cb0dd7e122 tcp: reduce memory footprint when listing tcp hostcache
In tcp_hostcache_list, the sbuf used would need a large (~2MB)
blocking allocation of memory (M_WAITOK), when listing a
full hostcache. This may stall the requestor for an indeterminate
time.

A further optimization is to return the expected userspace
buffersize right away, rather than preparing the output of
each current entry of the hostcase, provided by: @tuexen.

This makes use of the ready-made functions of sbuf to work
with sysctl, and repeatedly drain the much smaller buffer.

PR: 254333
MFC after: 2 weeks
Reviewed By: #transport, tuexen
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D29471
2021-03-28 23:50:23 +02:00
Pawel Biernacki
7029da5c36 Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE.  All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by:	kib (mentor, blanket)
Commented by:	kib, gallatin, melifaro
Differential Revision:	https://reviews.freebsd.org/D23718
2020-02-26 14:26:36 +00:00
Michael Tuexen
6b8fba3c5c Don't use uninitialised stack memory if the sysctl variable
net.inet.tcp.hostcache.enable is set to 0.
The bug resulted in using possibly a too small MSS value or wrong
initial retransmission timer settings. Possibly the value used
for ssthresh was also wrong.

Submitted by:		Richard Scheffenegger
Reviewed by:		Cheng Cui, rgrimes@, tuexen@
MFC after:		1 week
Differential Revision:	https://reviews.freebsd.org/D23687
2020-02-17 14:54:21 +00:00
Andrew Turner
5f901c92a8 Use the new VNET_DEFINE_STATIC macro when we are defining static VNET
variables.

Reviewed by:	bz
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D16147
2018-07-24 16:35:52 +00:00
Pedro F. Giffuni
fe267a5590 sys: general adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

No functional change intended.
2017-11-27 15:23:17 +00:00
Eric van Gyzen
8144690af4 Use inet_ntoa_r() instead of inet_ntoa() throughout the kernel
inet_ntoa() cannot be used safely in a multithreaded environment
because it uses a static local buffer. Instead, use inet_ntoa_r()
with a buffer on the caller's stack.

Suggested by:	glebius, emaste
Reviewed by:	gnn
MFC after:	2 weeks
Sponsored by:	Dell EMC
Differential Revision:	https://reviews.freebsd.org/D9625
2017-02-16 20:47:41 +00:00
Hiren Panchasara
b8a2fb91f6 sysctl net.inet.tcp.hostcache.list in a jail can see connections from other
jails and the host. This commit fixes it.

PR:		200361
Submitted by:	bz (original version), hiren (minor corrections)
Reported by:	Marcus Reid <marcus at blazingdot dot com>
Reviewed by:	bz, gnn
Tested by:	Lohith Bellad <lohithbsd at gmail dot com>
MFC after:	1 week
Sponsored by:	Limelight Networks (minor corrections)
2017-01-05 17:22:09 +00:00
Jonathan T. Looney
3ac125068a Remove "long" variables from the TCP stack (not including the modular
congestion control framework).

Reviewed by:	gnn, lstewart (partial)
Sponsored by:	Juniper Networks, Netflix
Differential Revision:	(multiple)
Tested by:	Limelight, Netflix
2016-10-06 16:28:34 +00:00
Hiren Panchasara
8a56c64533 This adds a sysctl which allows you to disable the TCP hostcache. This is handy
during testing of network related changes where cached entries may pollute your
results, or during known congestion events where you don't want to unfairly
penalize hosts.

Prior to r232346 this would have meant you would break any connection with a sub
1500 MTU, as the hostcache was authoritative. All entries as they stand today
should simply be used to pre populate values for efficiency.

Submitted by:	Jason Wolfe (j at nitrology dot com)
Reviewed by:	rwatson, sbruno, rrs , bz (earlier version)
MFC after:	2 weeks
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D6198
2016-09-30 00:10:57 +00:00
Hiren Panchasara
4d16338223 Clean up unused bandwidth entry in the TCP hostcache.
Submitted by:		Jason Wolfe (j at nitrology dot com)
Reviewed by:		rrs, hiren
Sponsored by:		Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D4154
2015-12-11 06:22:58 +00:00
Jung-uk Kim
fd90e2ed54 CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten
years for head.  However, it is continuously misused as the mpsafe argument
for callout_init(9).  Deprecate the flag and clean up callout_init() calls
to make them more consistent.

Differential Revision:	https://reviews.freebsd.org/D2613
Reviewed by:	jhb
MFC after:	2 weeks
2015-05-22 17:05:21 +00:00
Hiren Panchasara
2645638064 Add a new sysctl net.inet.tcp.hostcache.purgenow=1 to expire and purge all
entries in hostcache immediately.

In collaboration with:	bz, rwatson
MFC after:	1 week
Relnotes:	yes
Sponsored by:	Limelight Networks
2015-05-20 01:08:01 +00:00
Ian Lepore
dcdeb95f09 Go back to using sbuf_new() with a preallocated large buffer, to avoid
triggering an sbuf auto-drain copyout while holding a lock.

Pointed out by:	   jhb
Pointy hat: 	   ian
2015-03-14 23:57:33 +00:00
Ian Lepore
751ccc429d Use sbuf_new_for_sysctl() instead of plain sbuf_new() to ensure sysctl
string returned to userland is nulterminated.

PR:		195668
2015-03-14 18:11:24 +00:00
John Baldwin
002d455873 Use an sbuf to generate the output of the net.inet.tcp.hostcache.list
sysctl to avoid a possible buffer overflow if the cache grows while the
text is being generated.

PR:		172675
MFC after:	2 weeks
2015-01-25 19:45:44 +00:00
Gleb Smirnoff
6df8a71067 Remove SYSCTL_VNET_* macros, and simply put CTLFLAG_VNET where needed.
Sponsored by:	Nginx, Inc.
2014-11-07 09:39:05 +00:00
Andrey V. Elsukov
028bdf289d Add scope zone id to the in_endpoints and hc_metrics structures.
A non-global IPv6 address can be used in more than one zone of the same
scope. This zone index is used to identify to which zone a non-global
address belongs.

Also we can have many foreign hosts with equal non-global addresses,
but from different zones. So, they can have different metrics in the
host cache.

Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
2014-09-10 16:26:18 +00:00
Gleb Smirnoff
76039bc84f The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare
to this event, adding if_var.h to files that do need it. Also, include
all includes that now are included due to implicit pollution via if_var.h

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-26 17:58:36 +00:00
Andrey Zonov
32fe38f123 - Update cachelimit after hashsize and bucketlimit were set.
Reported by:	az
Reviewed by:	melifaro
Approved by:	kib (mentor)
MFC after:	1 week
2012-10-19 14:00:03 +00:00
Mikolaj Golub
3a288e901f Fix RTTVAR scale in net.inet.tcp.hostcache.list sysctl.
Reviewed by:	andre
MFC after:	3 days
2012-07-03 18:59:13 +00:00
Ed Schouten
6472ac3d8a Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.
The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.
2011-11-07 15:43:11 +00:00
Matthew D Fleming
f88910cdf5 sysctl(9) cleanup checkpoint: amd64 GENERIC builds cleanly.
Commit the net* piece.
2011-01-12 19:53:50 +00:00
Dimitry Andric
3e288e6238 After some off-list discussion, revert a number of changes to the
DPCPU_DEFINE and VNET_DEFINE macros, as these cause problems for various
people working on the affected files.  A better long-term solution is
still being considered.  This reversal may give some modules empty
set_pcpu or set_vnet sections, but these are harmless.

Changes reverted:

------------------------------------------------------------------------
r215318 | dim | 2010-11-14 21:40:55 +0100 (Sun, 14 Nov 2010) | 4 lines

Instead of unconditionally emitting .globl's for the __start_set_xxx and
__stop_set_xxx symbols, only emit them when the set_vnet or set_pcpu
sections are actually defined.

------------------------------------------------------------------------
r215317 | dim | 2010-11-14 21:38:11 +0100 (Sun, 14 Nov 2010) | 3 lines

Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout
the tree.

------------------------------------------------------------------------
r215316 | dim | 2010-11-14 21:23:02 +0100 (Sun, 14 Nov 2010) | 2 lines

Add macros to define static instances of VNET_DEFINE and DPCPU_DEFINE.
2010-11-22 19:32:54 +00:00
Dimitry Andric
31c6a0037e Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout
the tree.
2010-11-14 20:38:11 +00:00
Bjoern A. Zeeb
82cea7e6f3 MFP4: @176978-176982, 176984, 176990-176994, 177441
"Whitspace" churn after the VIMAGE/VNET whirls.

Remove the need for some "init" functions within the network
stack, like pim6_init(), icmp_init() or significantly shorten
others like ip6_init() and nd6_init(), using static initialization
again where possible and formerly missed.

Move (most) variables back to the place they used to be before the
container structs and VIMAGE_GLOABLS (before r185088) and try to
reduce the diff to stable/7 and earlier as good as possible,
to help out-of-tree consumers to update from 6.x or 7.x to 8 or 9.

This also removes some header file pollution for putatively
static global variables.

Revert VIMAGE specific changes in ipfilter::ip_auth.c, that are
no longer needed.

Reviewed by:	jhb
Discussed with:	rwatson
Sponsored by:	The FreeBSD Foundation
Sponsored by:	CK Software GmbH
MFC after:	6 days
2010-04-29 11:52:42 +00:00
Bjoern A. Zeeb
fffb9f1d9c Properly free resources when destroying the TCP hostcache while
tearing down a network stack (in the VIMAGE jail+vnet case).

For that break out the logic from tcp_hc_purge() into an internal
function we can call from both, the sysctl handler and the
tcp_hc_destroy().

Sponsored by:	ISPsystem
Reviewed by:	silby, lstewart
MFC After:	8 days
2010-02-09 21:31:53 +00:00
Robert Watson
530c006014 Merge the remainder of kern_vimage.c and vimage.h into vnet.c and
vnet.h, we now use jails (rather than vimages) as the abstraction
for virtualization management, and what remained was specific to
virtual network stacks.  Minor cleanups are done in the process,
and comments updated to reflect these changes.

Reviewed by:	bz
Approved by:	re (vimage blanket)
2009-08-01 19:26:27 +00:00
Robert Watson
1e77c1056a Remove unused VNET_SET() and related macros; only VNET_GET() is
ever actually used.  Rename VNET_GET() to VNET() to shorten
variable references.

Discussed with:	bz, julian
Reviewed by:	bz
Approved by:	re (kensmith, kib)
2009-07-16 21:13:04 +00:00
Robert Watson
eddfbb763d Build on Jeff Roberson's linker-set based dynamic per-CPU allocator
(DPCPU), as suggested by Peter Wemm, and implement a new per-virtual
network stack memory allocator.  Modify vnet to use the allocator
instead of monolithic global container structures (vinet, ...).  This
change solves many binary compatibility problems associated with
VIMAGE, and restores ELF symbols for virtualized global variables.

Each virtualized global variable exists as a "reference copy", and also
once per virtual network stack.  Virtualized global variables are
tagged at compile-time, placing the in a special linker set, which is
loaded into a contiguous region of kernel memory.  Virtualized global
variables in the base kernel are linked as normal, but those in modules
are copied and relocated to a reserved portion of the kernel's vnet
region with the help of a the kernel linker.

Virtualized global variables exist in per-vnet memory set up when the
network stack instance is created, and are initialized statically from
the reference copy.  Run-time access occurs via an accessor macro, which
converts from the current vnet and requested symbol to a per-vnet
address.  When "options VIMAGE" is not compiled into the kernel, normal
global ELF symbols will be used instead and indirection is avoided.

This change restores static initialization for network stack global
variables, restores support for non-global symbols and types, eliminates
the need for many subsystem constructors, eliminates large per-subsystem
structures that caused many binary compatibility issues both for
monitoring applications (netstat) and kernel modules, removes the
per-function INIT_VNET_*() macros throughout the stack, eliminates the
need for vnet_symmap ksym(2) munging, and eliminates duplicate
definitions of virtualized globals under VIMAGE_GLOBALS.

Bump __FreeBSD_version and update UPDATING.

Portions submitted by:  bz
Reviewed by:            bz, zec
Discussed with:         gnn, jamie, jeff, jhb, julian, sam
Suggested by:           peter
Approved by:            re (kensmith)
2009-07-14 22:48:30 +00:00
Bjoern A. Zeeb
5736e6fb9d After cleaning up rt_tables from vnet.h and cleaning up opt_route.h
a lot of files no longer need route.h either. Garbage collect them.
While here remove now unneeded vnet.h #includes as well.
2009-06-23 17:03:45 +00:00
Marko Zec
bc29160df3 Introduce an infrastructure for dismantling vnet instances.
Vnet modules and protocol domains may now register destructor
functions to clean up and release per-module state.  The destructor
mechanisms can be triggered by invoking "vimage -d", or a future
equivalent command which will be provided via the new jail framework.

While this patch introduces numerous placeholder destructor functions,
many of those are currently incomplete, thus leaking memory or (even
worse) failing to stop all running timers.  Many of such issues are
already known and will be incrementaly fixed over the next weeks in
smaller incremental commits.

Apart from introducing new fields in structs ifnet, domain, protosw
and vnet_net, which requires the kernel and modules to be rebuilt, this
change should have no impact on nooptions VIMAGE builds, since vnet
destructors can only be called in VIMAGE kernels.  Moreover,
destructor functions should be in general compiled in only in
options VIMAGE builds, except for kernel modules which can be safely
kldunloaded at run time.

Bump __FreeBSD_version to 800097.
Reviewed by:	bz, julian
Approved by:	rwatson, kib (re), julian (mentor)
2009-06-08 17:15:40 +00:00