Commit Graph

134778 Commits

Author SHA1 Message Date
Mateusz Guzik
62dbc992ad thread: move nthread management out of tid_alloc
While this adds more work single-threaded, it also enables SMP-related
speed ups.
2020-11-12 00:29:23 +00:00
Kyle Evans
38033780a3 umtx: drop incorrect timespec32 definition
This works for amd64, but none others -- drop it, because we already have a
proper definition in sys/compat/freebsd32/freebsd32.h that correctly uses
time32_t.

MFC after:	1 week
2020-11-11 22:35:23 +00:00
Alexander Motin
8054320e07 Make CTL nicer to increased MAXPHYS.
Before this CTL always allocated MAXPHYS-sized buffers, even for 4KB I/O,
that is even more overkill for MAXPHYS of 1MB.  This change limits maximum
allocation to 512KB if MAXPHYS is bigger, plus if one is above 128KB, adds
new 128KB UMA zone for smaller I/Os.  The patch factors out alloc/free,
so later we could make it use more zones or malloc() if we'd like.

MFC after:	1 week
Sponsored by:	iXsystems, Inc.
2020-11-11 21:59:39 +00:00
Mateusz Guzik
755341df4f thread: batch tid_free calls in thread_reap
This eliminates the highly pessimal pattern of relocking from multiple
CPUs in quick succession. Note this is still globally serialized.
2020-11-11 18:45:06 +00:00
Mateusz Guzik
c5315f5196 thread: lockless zombie list manipulation
This gets rid of the most contended spinlock seen when creating/destroying
threads in a loop. (modulo kstack)

Tested by:	alfredo (ppc64), bdragon (ppc64)
2020-11-11 18:43:51 +00:00
Mark Johnston
54bf96fb4f iflib: Free full mbuf chains when draining transmit queues
Submitted by:	Sai Rajesh Tallamraju <stallamr@netapp.com>
Reviewed by:	gallatin, hselasky
MFC after:	1 week
Sponsored by:	NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D27179
2020-11-11 18:00:06 +00:00
Mark Johnston
20f02659d6 vm_map: Handle kernel map entry allocator recursion
On platforms without a direct map[*], vm_map_insert() may in rare
situations need to allocate a kernel map entry in order to allocate
kernel map entries.  This poses a problem similar to the one solved for
vmem boundary tags by vmem_bt_alloc().  In fact the kernel map case is a
bit more complicated since we must allocate entries with the kernel map
locked, whereas vmem can recurse into itself because boundary tags are
allocated up-front.

The solution is to add a custom slab allocator for kmapentzone which
allocates KVA directly from kernel_map, bypassing the kmem_* layer.
This avoids mutual recursion with the vmem btag allocator.  Then, when
vm_map_insert() allocates a new kernel map entry, it avoids triggering
allocation of a new slab with M_NOVM until after the insertion is
complete.  Instead, vm_map_insert() allocates from the reserve and sets
a flag in kernel_map to trigger re-population of the reserve just before
the map is unlocked.  This places an implicit upper bound on the number
of kernel map entries that may be allocated before the kernel map lock
is released, but in general a bound of 1 suffices.

[*] This also comes up on amd64 with UMA_MD_SMALL_ALLOC undefined, a
configuration required by some kernel sanitizers.

Discussed with:	kib, rlibby
Reported by:	andrew
Tested by:	pho (i386 and amd64 with !UMA_MD_SMALL_ALLOC)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26851
2020-11-11 17:16:39 +00:00
Andrey V. Elsukov
2f4ffa9f72 Fix possible NULL pointer dereference.
lagg(4) replaces if_output method of its child interfaces and expects
that this method can be called only by child interfaces. But it is
possible that lagg_port_output() could be called by children of child
interfaces. In this case ifnet's if_lagg field is NULL. Add check that
lp is not NULL.

Obtained from:	Yandex LLC
MFC after:	1 week
Sponsored by:	Yandex LLC
2020-11-11 15:53:36 +00:00
Mark Johnston
6f5a960678 vmm: Make pmap_invalidate_ept() wait synchronously for guest exits
Currently EPT TLB invalidation is done by incrementing a generation
counter and issuing an IPI to all CPUs currently running vCPU threads.
The VMM inner loop caches the most recently observed generation on each
host CPU and invalidates TLB entries before executing the VM if the
cached generation number is not the most recent value.
pmap_invalidate_ept() issues IPIs to force each vCPU to stop executing
guest instructions and reload the generation number.  However, it does
not actually wait for vCPUs to exit, potentially creating a window where
guests may continue to reference stale TLB entries.

Fix the problem by bracketing guest execution with an SMR read section
which is entered before loading the invalidation generation.  Then,
pmap_invalidate_ept() increments the current write sequence before
loading pm_active and sending IPIs, and polls readers to ensure that all
vCPUs potentially operating with stale TLB entries have exited before
pmap_invalidate_ept() returns.

Also ensure that unsynchronized loads of the generation counter are
wrapped with atomic(9), and stop (inconsistently) updating the
invalidation counter and pm_active bitmask with acquire semantics.

Reviewed by:	grehan, kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26910
2020-11-11 15:01:17 +00:00
Mark Johnston
0da7ac7cbb Remove an extraneous parameter from SIGIO_ASSERT_LOCKED()
Reported by:	hselasky
MFC with:	r367588
2020-11-11 14:03:49 +00:00
Mark Johnston
f44994874b ffs: Clamp BIO_SPEEDUP length
On 32-bit platforms, the computed size of the BIO_SPEEDUP requested by
softdep_request_cleanup() may be negative when assigned to bp->b_bcount,
which has type "long".

Clamp the size to LONG_MAX.  Also convert the unused g_io_speedup() to
use an off_t for the magnitude of the shortage for consistency with
softdep_send_speedup().

Reviewed by:	chs, kib
Reported by:	pho
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D27081
2020-11-11 13:48:07 +00:00
Mark Johnston
f52979098d Fix a pair of races in SIGIO registration
First, funsetownlst() list looks at the first element of the list to see
whether it's processing a process or a process group list.  Then it
acquires the global sigio lock and processes the list.  However, nothing
prevents the first sigio tracker from being freed by a concurrent
funsetown() before the sigio lock is acquired.

Fix this by acquiring the global sigio lock immediately after checking
whether the list is empty.  Callers of funsetownlst() ensure that new
sigio trackers cannot be added concurrently.

Second, fsetown() uses funsetown() to remove an existing sigio structure
from a file object.  However, funsetown() uses a racy check to avoid the
sigio lock, so two threads may call fsetown() on the same file object,
both observe that no sigio tracker is present, and enqueue two sigio
trackers for the same file object.  However, if the file object is
destroyed, funsetown() will only remove one sigio tracker, and
funsetownlst() may later trigger a use-after-free when it clears the
file object reference for each entry in the list.

Fix this by introducing funsetown_locked(), which avoids the racy check.

Reviewed by:	kib
Reported by:	pho
Tested by:	pho
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D27157
2020-11-11 13:44:27 +00:00
Mateusz Guzik
26007fe37c thread: add more fine-grained tidhash locking
Note this still does not scale but is enough to move it out of the way
for the foreseable future.

In particular a trivial benchmark spawning/killing threads stops contesting
on tidhash.
2020-11-11 08:51:04 +00:00
Mateusz Guzik
aae3547be3 thread: rework tidhash vs proc lock interaction
Apart from minor clean up this gets rid of proc unlock/lock cycle on thread
exit to work around LOR against tidhash lock.
2020-11-11 08:50:04 +00:00
Mateusz Guzik
cf31cadeb6 thread: fix thread0 tid allocation
Startup code hardcodes the value instead of allocating it.
The first spawned thread would then be a duplicate.

Pointy hat:	mjg
2020-11-11 08:48:43 +00:00
Warner Losh
26676c47dc Add INIT_ALL_ZERO and INIT_ALL_PATTERN to kern.opts.mk
These options need to be in the kern.opts.mk file to be alive for kernel
and module builds. This also reverts r367579 since that's not needed with
this fix: the host's bsd.opts.mk is irrelevant.

Reviewed by: brooks@
Differential Revision:  https://reviews.freebsd.org/D27170
2020-11-10 23:25:16 +00:00
Mateusz Guzik
40aad3e477 thread: tidy up r367543
"locked" variable is spurious in the committed version.
2020-11-10 21:29:10 +00:00
Brooks Davis
d8033dc3d3 Be more tolerant of share/mk and kern.mk mismatch
When building out-of-tree modules, it appears that the system share/mk
is used, but sys/conf/kern.mk is used.  That results in MK_INIT_ALL_ZERO
being undefined.  In the interest of maximum compatability, check
that MK_INIT_ALL_* and COMPILER_FEATURES are defined before comparing
their values.

Reported by:	mmacy
Sponsored by:	DARPA
2020-11-10 21:12:32 +00:00
John Baldwin
b3ceca0c80 Clear tp->tod in t4_pcb_detach().
Otherwise, a socket can have a non-NULL tp->tod while TF_TOE is clear.
In particular, if a newly accepted socket falls back to non-TOE due to
an active open failure, the non-TOE socket will still have tp->tod set
even though TF_TOE is clear.

Reviewed by:	np
MFC after:	2 weeks
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D27028
2020-11-10 19:54:39 +00:00
Brooks Davis
e268fd0a02 Support initializing stack variables on function entry
There are two options:
 - WITH_INIT_ALL_ZERO: Zero all variables on the stack.
 - WITH_INIT_ALL_PATTERN: Initialize variables with well-defined patterns.

The exact pattern are a compiler implementation detail and vary by type.
They are somewhat documented in the LLVM commit message:
https://reviews.llvm.org/rL349442
I've used WITH_INIT_ALL_* to match Microsoft's InitAll feature rather
than naming them after the LLVM specific compiler flags.

In a range of consumer products, options like these are used in
both debug and production builds with debugs builds using patterns
(intended to provoke crashes on use of uninitialized values) and
production using zeros (deemed more likely to lead to harmless
misbehavior or NULL-pointer dereferences).

Reviewed by:	emaste
Obtained from:	CheriBSD
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D27131
2020-11-10 19:15:13 +00:00
Jonathan T. Looney
7b516613aa When destroying a UMA zone which has a reserve (set with
uma_zone_reserve()), messages like the following appear on the console:
"Freed UMA keg (Test zone) was not empty (0 items). Lost 528 pages of
memory."

When keg_drain_domain() is draining the zone, it tries to keep the number
of items specified in the reservation. However, when we are destroying the
UMA zone, we do not need to keep those items. Therefore, when destroying a
non-secondary and non-cache zone, we should reset the keg reservation to 0
prior to draining the zone.

Reviewed by:	markj
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D27129
2020-11-10 18:12:09 +00:00
Mateusz Guzik
5c5ca843b7 Allow rtprio_thread to operate on threads of any process
This in particular unbreaks rtkit.

The limitation was a leftover of previous state, to quote a
comment:

/*
 * Though lwpid is unique, only current process is supported
 * since there is no efficient way to look up a LWP yet.
 */

Long since then a global tid hash was introduced to remedy
the problem.

Permission checks still apply.

Submitted by:	greg_unrelenting.technology (Greg V)
Differential Revision:	https://reviews.freebsd.org/D27158
2020-11-10 18:10:50 +00:00
Mateusz Guzik
4426311a3c zfs: combine zio caches if possible
This deduplicates 2 sets of caches using the same sizes.

Memory savings fluctuate a lot, one sample result is buildworld on zfs
saving ~180MB RAM in reduced page count associated with zio caches.
2020-11-10 14:23:46 +00:00
Mateusz Guzik
41ce62251c zfs: g/c unused data_alloc_arena 2020-11-10 14:21:23 +00:00
Hans Petter Selasky
6c43a5e9c7 Include GID type when deleting GIDs from HW table under RoCE in mlx4ib.
Refer to the Linux commit mentioned below for a more detailed description.

Linux commit:
a18177925c252da7801149abe217c05b80884798

Requested by:	Isilon
MFC after:	1 week
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2020-11-10 12:58:25 +00:00
Eugene Grosbein
3ff4b31749 ng_nat: unbreak ABI
The revision r342168 broke ABI of ng_nat needlessly and
the change was merged to stable branches breaking ABI there, too.
Unbreak it.

PR:		250722
MFC after:	1 week
2020-11-10 02:26:44 +00:00
Mateusz Guzik
5c100123a3 thread: retire thread_find
tdfind should be used instead.
2020-11-10 01:57:48 +00:00
Mateusz Guzik
f837888a3e thread: use tdfind in sysctl_kern_proc_kstack
This treads linear scans for locked lookup, but more importantly removes
the only consumer of thread_find.
2020-11-10 01:57:19 +00:00
Mateusz Guzik
94275e3e69 threads: remove the unused TID_BUFFER_SIZE macro 2020-11-10 01:31:06 +00:00
Mateusz Guzik
934e7e5ec9 thread: adds newer bits for r367537
The committed patch was an older version.
2020-11-10 01:13:58 +00:00
Bjoern A. Zeeb
4c7458fa7c usb_hub: fix whitespace
Fix a whitespace "error" introduced in r367435 noticed when
preparing the MFC.  No functional changes.
2020-11-09 23:36:51 +00:00
Bjoern A. Zeeb
47da3ae49d arm64: bs_sr_<N> take II
In r367327 generic_bs_sr_<n> were derived from mips.  Given we are calling
generic_bs_w_<n> and no write directly, we do not have to do the address
calculations ourselves as eneric_bs_w_<n> will do a str val [bsh, offset].
All we actually have to do is increment offset.

MFC after:			3 days
2020-11-09 23:34:32 +00:00
Mateusz Guzik
35bb59edc5 threads: reimplement tid allocation on top of a bitmap
There are workloads with very bursty tid allocation and since unr tries very
hard to have small-sized bitmaps it keeps reallocating memory. Just doing
buildkernel gives almost 150k calls to free coming from unr.

This also gets rid of the hack which tried to postpone TID reuse.

Reviewed by:	kib, markj
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D27101
2020-11-09 23:05:28 +00:00
Mateusz Guzik
1bd3cf5de5 threads: introduce a limit for total number
The intent is to replace the current id allocation method and a known upper
bound will be useful.

Reviewed by:	kib (previous version), markj (previous version)
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D27100
2020-11-09 23:04:30 +00:00
Mateusz Guzik
f6dd1aefb7 vfs: group mount per-cpu vars into one struct
While here move frequently read stuff into the same cacheline.

This shrinks struct mount by 64 bytes.

Tested by:	pho
2020-11-09 23:02:13 +00:00
Mateusz Guzik
f0c90a0931 malloc: provide 384 byte zone
Total page count after buildworld on ZFS for 384 (if present) and 512 zones:
before: 29713
after: 25946

per-zone page use:
vm.uma.malloc_384.keg.domain.1.pages: 11621
vm.uma.malloc_384.keg.domain.0.pages: 11597
vm.uma.malloc_512.keg.domain.1.pages: 1280
vm.uma.malloc_512.keg.domain.0.pages: 1448

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D27145
2020-11-09 22:59:41 +00:00
Mateusz Guzik
8e6526e966 malloc: retire mt_stats_zone in favor of pcpu_zone_64
Reviewed by:	markj, imp
Differential Revision:	https://reviews.freebsd.org/D27142
2020-11-09 22:58:29 +00:00
Michael Tuexen
283c76c7c3 RFC 7323 specifies that:
* TCP segments without timestamps should be dropped when support for
  the timestamp option has been negotiated.
* TCP segments with timestamps should be processed normally if support
  for the timestamp option has not been negotiated.
This patch enforces the above.

PR:			250499
Reviewed by:		gnn, rrs
MFC after:		1 week
Sponsored by:		Netflix, Inc
Differential Revision:	https://reviews.freebsd.org/D27148
2020-11-09 21:49:40 +00:00
Emmanuel Vadot
db6a0c8f47 Bump __FreeBSD_version after linuxkpi changes 2020-11-09 13:20:44 +00:00
Emmanuel Vadot
dab39c11af LinuxKPI: Implement ACPI bits required by drm-kmod in base system
It includes:

ACPI_HANDLE() implementation.
AC and VIDEO ACPI events notification support.
Replacement of hand-rolled GPLed _DSM method evaluation helpers
with in-base ones.

Submitted by:	wulf
Differential Revision:	https://reviews.freebsd.org/D26603
2020-11-09 13:20:14 +00:00
Michael Tuexen
e597bae4ee Fix a potential use-after-free bug introduced in
https://svnweb.freebsd.org/changeset/base/363046

Thanks to Taylor Brandstetter for finding this issue using fuzz testing
and reporting it in https://github.com/sctplab/usrsctp/issues/547
2020-11-09 13:12:07 +00:00
Edward Tomasz Napierala
e3b1c847a4 Make it possible to mount a fuse filesystem, such as squashfuse,
from a Linux binary.  Should come handy for AppImages.

Reviewed by:	asomers
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26959
2020-11-09 08:53:15 +00:00
Warner Losh
8b8af16875 Remove newline from bxe description, it's not done elsewhere. 2020-11-09 03:02:34 +00:00
Mateusz Guzik
3a440a421d Add more per-cpu zones.
This covers powers of 2 up to 64.

Example pending user is ZFS.
2020-11-09 00:34:23 +00:00
Navdeep Parhar
de0a3472d8 cxgbe(4): Allow the PF driver to set a VF's MAC address.
The MAC address can be set with the optional mac-addr property in the VF
section of the iovctl.conf(5) used to instantiate the VFs.

MFC after:	2 weeks
Sponsored by:	Chelsio Communications
2020-11-09 00:08:35 +00:00
Mateusz Guzik
523d66730c procdesc: convert the zone to a malloc type
The object is 128 bytes in size.
2020-11-09 00:05:21 +00:00
Mateusz Guzik
62d77e4e0c bufcache: convert bo_numoutput from long to int
int is wide enough and it plugs a hole in struct vnode, taking it down
from 496 to 488 bytes.
2020-11-09 00:04:58 +00:00
Mateusz Guzik
e90afaa015 kqueue: save space by using only one func pointer for assertions 2020-11-09 00:04:35 +00:00
Navdeep Parhar
dc0800a9ad cxgbev(4): Use the MAC address set by the the PF if there is one.
Query the firmware for the MAC address set by the PF for the VF and use
it instead of the firmware generated MAC if it's available.

MFC after:	2 weeks
Sponsored by:	Chelsio Communications
2020-11-09 00:01:13 +00:00
Brandon Bergren
8801df34f0 [PowerPC] Fix powerpc64le boot after HPT superpages addition
The HPT is always stored in big-endian, as it is accessed directly by the
hardware as well as the kernel. As such, it is necessary to convert values
to and from native endian when running on LE.

Some unconverted accesses snuck in accidentally with r367417.

Apply the appropriate conversions to fix boot hanging on powerpc64le.

Sponsored by:	Tag1 Consulting, Inc.
2020-11-08 23:34:06 +00:00