Commit Graph

254473 Commits

Author SHA1 Message Date
Konstantin Belousov
441eb16a95 Allow some VOPs to return ERELOOKUP to indicate VFS operation restart at top level.
Restart syscalls and some sync operations when filesystem indicated
ERELOOKUP condition, mostly for VOPs operating on metdata.  In
particular, lookup results cached in the inode/v_data is no longer
valid and needs recalculating.  Right now this should be nop.

Assert that ERELOOKUP is catched everywhere and not returned to
userspace, by asserting that td_errno != ERELOOKUP on syscall return
path.

In collaboration with:	pho
Reviewed by:	mckusick (previous version), markj
Tested by:	markj (syzkaller), pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D26136
2020-11-13 09:42:32 +00:00
Konstantin Belousov
7cde2ec4fd Implement vn_lock_pair().
In collaboration with:	pho
Reviewed by:	mckusick (previous version), markj (previous version)
Tested by:	markj (syzkaller), pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D26136
2020-11-13 09:31:57 +00:00
Alexander Motin
5dc463f9a5 Improve nvmecontrol error reporting.
MFC after:	1 week
Sponsored by:	iXsystems, Inc.
2020-11-13 02:05:45 +00:00
Adrian Chadd
cc4d36325b [malloc] quieten -Werror=missing-braces with malloc.h wth gcc-6.4
This sets off gcc-6.4 to spit out a 'error: missing braces around initializer'
error when compiling this.

Remove it as it isn't needed.

Reviewed by:	brooks
Differential Revision:	 https://reviews.freebsd.org/D27183
2020-11-13 01:53:59 +00:00
George V. Neville-Neil
8ad114c082 An earlier commit effectively turned out the fast forwading path
due to its lack of support for ICMP redirects. The following commit
adds redirects to the fastforward path, again allowing for decent
forwarding performance in the kernel.

Reviewed by: ae, melifaro
Sponsored by: Rubicon Communications, LLC (d/b/a "Netgate")
2020-11-12 21:58:47 +00:00
Mateusz Guzik
9aa6d792b5 malloc: retire malloc_last_fail
The routine does not serve any practical purpose.

Memory can be allocated in many other ways and most consumers pass the
M_WAITOK flag, making malloc not fail in the first place.

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D27143
2020-11-12 20:22:58 +00:00
Mateusz Guzik
d5127d1ae2 gbde: replace malloc_last_fail with a kludge
This facilitates removal of malloc_last_fail without really impacting
anything.
2020-11-12 20:20:57 +00:00
Alexander Motin
46fbd8004f Fix panic if NVMe is detached before the intrhook call.
MFC after:	1 week
Sponsored by:	iXsystems, Inc.
2020-11-12 20:20:43 +00:00
Navdeep Parhar
bdabd00d65 cxgbe/t4_tom: Handle VXLAN-encapsulated SYNs correctly.
TCP SYNs in inner traffic will hit hardware listeners when VXLAN/NVGRE
rx parsing is enabled in the chip.  t4_tom should pass on these SYNs to
the kernel and let it deal with them as if they arrived on the non-TOE
path.

Reported by:	Sony at Chelsio
MFC after:	1 week
Sponsored by:	Chelsio Communications
2020-11-12 20:02:48 +00:00
Dimitry Andric
2418469b60 Merge commit 8df4e6094 from llvm git (by Fangrui Song):
[ELF] Don't consider SHF_ALLOC ".debug*" sections debug sections

  Fixes PR48071

  * The Rust compiler produces SHF_ALLOC `.debug_gdb_scripts` (which
    normally does not have the flag)
  * `.debug_gdb_scripts` sections are removed from `inputSections` due
    to --strip-debug/--strip-all
  * When processing --gc-sections, pieces of a SHF_MERGE section can be
    marked live separately

  `=>` segfault when marking liveness of a `.debug_gdb_scripts` which
  is not split into pieces (because it is not in `inputSections`)

  This patch circumvents the problem by not treating SHF_ALLOC
  ".debug*" as debug sections (to prevent --strip-debug's stripping)
  (which is still useful on its own).

  Reviewed By: grimar

  Differential Revision: https://reviews.llvm.org/D91291

This should fix lld segfaulting when linking the rust-based parts of the
devel/py-maturin port.

Reported by:	Nick Venenga <nijave@gmail.com>
PR:		250783
MFC after:	3 days
2020-11-12 19:25:31 +00:00
Hans Petter Selasky
6abe97c014 Add more USB quirks.
PR:		230038
MFC after:	1 week
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2020-11-12 18:24:37 +00:00
Mateusz Piotrowski
1fb04df956 Remove macros from the width arguments passed to Bl macros
I've not removed the Er macro from one of the lists in example.9, however,
because it seems to be doing some special kind of magic. Let's leave it
there for now.
2020-11-12 17:28:29 +00:00
Mateusz Piotrowski
2bbc7e7436 Add a missing period and remove a macro from Bl's width argument
MFC after:	3 days
2020-11-12 16:44:56 +00:00
Mateusz Piotrowski
e2a03adb53 Fix a typo in a license comment
Approved by:	kaktus (src)
2020-11-12 15:50:18 +00:00
Mark Johnston
381219b64d qat: Fix nits reported by Coverity
MFC after:	3 days
Sponsored by:	Rubicon Communications, LLC (Netgate)
2020-11-12 15:00:48 +00:00
Emmanuel Vadot
6cd88fe0e0 pkgbase: Move libprivatezstd from utilities to runtime
libarchive depends on it by default and tar uses libarchive.
So on a update :
1/ runtime contain tar
2/ runtime have libarchive in shlibs_required
3/ libarchive packages depends on utilities
4/ utilities depends on runtime
5/ kaboom

All users of libprivatezstd (libarchive related stuff and objcopy/ar)
are already in utilities.

Discussed with: bapt
2020-11-12 14:04:08 +00:00
Hans Petter Selasky
f14436adc6 Add a tunable sysctl, hw.usb.uaudio.handle_hid, to allow disabling the
the HID volume keys support in the USB audio driver.

While at it re-organize the USB audio sysctls a bit.

Differential Revision:	https://reviews.freebsd.org/D27180
MFC after:	1 week
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2020-11-12 09:26:01 +00:00
Hans Petter Selasky
eb985e1802 When doing a USB alternate setting on an USB interface we need to
re-configure the XHCI endpoint context.

Differential Revision:	https://reviews.freebsd.org/D27174
MFC after:	1 week
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2020-11-12 09:15:07 +00:00
Konstantin Belousov
038f5c7bfe bhyve: remove a hack to map all 8G BARs 1:1
Suggested and reviewed by:	grehan
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D27186
2020-11-12 02:52:01 +00:00
Konstantin Belousov
0b8e170d95 mlx5en: Set ifmr_current same as ifmr_active.
This both:
- makes ifconfig media line similar to that of other drivers.
- fixes ENXIO in case when paradoxical current media word is not registered.

Now e.g.
      ifconfig mce0 -mediaopt txpause,rxpause
works by disabling pauses if enabled.

Sponsored by:	Mellanox Technologies/NVidia Networking
MFC after:	1 week
2020-11-12 02:25:10 +00:00
Konstantin Belousov
bab0c4b1a0 mlx5en: stop ignoring pauses and flow in the media reqs.
Sponsored by:	Mellanox Technologies/NVidia Networking
MFC after:	1 week
2020-11-12 02:23:27 +00:00
Konstantin Belousov
559dbeac47 mlx5en: Register all combinations of FDX/RXPAUSE/TXPAUSE as valid media types.
Sponsored by:	Mellanox Technologies/NVidia Networking
MFC after:	1 week
2020-11-12 02:22:16 +00:00
Konstantin Belousov
4ead80241a mlx5en: Refactor repeated code to register media type to mlx5e_ifm_add().
Sponsored by:	Mellanox Technologies/NVidia Networking
MFC after:	1 week
2020-11-12 02:21:14 +00:00
Navdeep Parhar
f14d7c9516 cxgbev(4): Make sure that the iq/eq map sizes are correct for VFs.
This should have been part of r366929.

MFC after:	3 days
Sponsored by:	Chelsio Communications
2020-11-12 01:18:05 +00:00
Konstantin Belousov
670b364b76 bhyve: increase allowed size for 64bit BAR allocation below 4G from 32 to 128 MB.
Reviewed by:	grehan
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D27095
2020-11-12 00:51:53 +00:00
Konstantin Belousov
9922872ba2 bhyve: avoid allocating BARs above the end of supported physical addresses.
Read CPUID leaf 0x8000008 to determine max supported phys address and
create BAR region right below it, reserving 1/4 of the supported guest
physical address space to the 64bit BARs mappings.

PR:    250802 (although the issue from PR is not fixed by the change)
Noted and reviewed by:	grehan
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D27095
2020-11-12 00:46:53 +00:00
Mateusz Guzik
62dbc992ad thread: move nthread management out of tid_alloc
While this adds more work single-threaded, it also enables SMP-related
speed ups.
2020-11-12 00:29:23 +00:00
Kyle Evans
38033780a3 umtx: drop incorrect timespec32 definition
This works for amd64, but none others -- drop it, because we already have a
proper definition in sys/compat/freebsd32/freebsd32.h that correctly uses
time32_t.

MFC after:	1 week
2020-11-11 22:35:23 +00:00
Alexander Motin
8054320e07 Make CTL nicer to increased MAXPHYS.
Before this CTL always allocated MAXPHYS-sized buffers, even for 4KB I/O,
that is even more overkill for MAXPHYS of 1MB.  This change limits maximum
allocation to 512KB if MAXPHYS is bigger, plus if one is above 128KB, adds
new 128KB UMA zone for smaller I/Os.  The patch factors out alloc/free,
so later we could make it use more zones or malloc() if we'd like.

MFC after:	1 week
Sponsored by:	iXsystems, Inc.
2020-11-11 21:59:39 +00:00
Mateusz Guzik
755341df4f thread: batch tid_free calls in thread_reap
This eliminates the highly pessimal pattern of relocking from multiple
CPUs in quick succession. Note this is still globally serialized.
2020-11-11 18:45:06 +00:00
Mateusz Guzik
c5315f5196 thread: lockless zombie list manipulation
This gets rid of the most contended spinlock seen when creating/destroying
threads in a loop. (modulo kstack)

Tested by:	alfredo (ppc64), bdragon (ppc64)
2020-11-11 18:43:51 +00:00
Mark Johnston
54bf96fb4f iflib: Free full mbuf chains when draining transmit queues
Submitted by:	Sai Rajesh Tallamraju <stallamr@netapp.com>
Reviewed by:	gallatin, hselasky
MFC after:	1 week
Sponsored by:	NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D27179
2020-11-11 18:00:06 +00:00
Mark Johnston
20f02659d6 vm_map: Handle kernel map entry allocator recursion
On platforms without a direct map[*], vm_map_insert() may in rare
situations need to allocate a kernel map entry in order to allocate
kernel map entries.  This poses a problem similar to the one solved for
vmem boundary tags by vmem_bt_alloc().  In fact the kernel map case is a
bit more complicated since we must allocate entries with the kernel map
locked, whereas vmem can recurse into itself because boundary tags are
allocated up-front.

The solution is to add a custom slab allocator for kmapentzone which
allocates KVA directly from kernel_map, bypassing the kmem_* layer.
This avoids mutual recursion with the vmem btag allocator.  Then, when
vm_map_insert() allocates a new kernel map entry, it avoids triggering
allocation of a new slab with M_NOVM until after the insertion is
complete.  Instead, vm_map_insert() allocates from the reserve and sets
a flag in kernel_map to trigger re-population of the reserve just before
the map is unlocked.  This places an implicit upper bound on the number
of kernel map entries that may be allocated before the kernel map lock
is released, but in general a bound of 1 suffices.

[*] This also comes up on amd64 with UMA_MD_SMALL_ALLOC undefined, a
configuration required by some kernel sanitizers.

Discussed with:	kib, rlibby
Reported by:	andrew
Tested by:	pho (i386 and amd64 with !UMA_MD_SMALL_ALLOC)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26851
2020-11-11 17:16:39 +00:00
Andrey V. Elsukov
2f4ffa9f72 Fix possible NULL pointer dereference.
lagg(4) replaces if_output method of its child interfaces and expects
that this method can be called only by child interfaces. But it is
possible that lagg_port_output() could be called by children of child
interfaces. In this case ifnet's if_lagg field is NULL. Add check that
lp is not NULL.

Obtained from:	Yandex LLC
MFC after:	1 week
Sponsored by:	Yandex LLC
2020-11-11 15:53:36 +00:00
Mark Johnston
6f5a960678 vmm: Make pmap_invalidate_ept() wait synchronously for guest exits
Currently EPT TLB invalidation is done by incrementing a generation
counter and issuing an IPI to all CPUs currently running vCPU threads.
The VMM inner loop caches the most recently observed generation on each
host CPU and invalidates TLB entries before executing the VM if the
cached generation number is not the most recent value.
pmap_invalidate_ept() issues IPIs to force each vCPU to stop executing
guest instructions and reload the generation number.  However, it does
not actually wait for vCPUs to exit, potentially creating a window where
guests may continue to reference stale TLB entries.

Fix the problem by bracketing guest execution with an SMR read section
which is entered before loading the invalidation generation.  Then,
pmap_invalidate_ept() increments the current write sequence before
loading pm_active and sending IPIs, and polls readers to ensure that all
vCPUs potentially operating with stale TLB entries have exited before
pmap_invalidate_ept() returns.

Also ensure that unsynchronized loads of the generation counter are
wrapped with atomic(9), and stop (inconsistently) updating the
invalidation counter and pm_active bitmask with acquire semantics.

Reviewed by:	grehan, kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26910
2020-11-11 15:01:17 +00:00
Mateusz Piotrowski
a4848c103c Document in the synopsis that -0 cannot be used with the utility argument 2020-11-11 14:53:03 +00:00
Mark Johnston
0da7ac7cbb Remove an extraneous parameter from SIGIO_ASSERT_LOCKED()
Reported by:	hselasky
MFC with:	r367588
2020-11-11 14:03:49 +00:00
Mark Johnston
f44994874b ffs: Clamp BIO_SPEEDUP length
On 32-bit platforms, the computed size of the BIO_SPEEDUP requested by
softdep_request_cleanup() may be negative when assigned to bp->b_bcount,
which has type "long".

Clamp the size to LONG_MAX.  Also convert the unused g_io_speedup() to
use an off_t for the magnitude of the shortage for consistency with
softdep_send_speedup().

Reviewed by:	chs, kib
Reported by:	pho
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D27081
2020-11-11 13:48:07 +00:00
Mark Johnston
f52979098d Fix a pair of races in SIGIO registration
First, funsetownlst() list looks at the first element of the list to see
whether it's processing a process or a process group list.  Then it
acquires the global sigio lock and processes the list.  However, nothing
prevents the first sigio tracker from being freed by a concurrent
funsetown() before the sigio lock is acquired.

Fix this by acquiring the global sigio lock immediately after checking
whether the list is empty.  Callers of funsetownlst() ensure that new
sigio trackers cannot be added concurrently.

Second, fsetown() uses funsetown() to remove an existing sigio structure
from a file object.  However, funsetown() uses a racy check to avoid the
sigio lock, so two threads may call fsetown() on the same file object,
both observe that no sigio tracker is present, and enqueue two sigio
trackers for the same file object.  However, if the file object is
destroyed, funsetown() will only remove one sigio tracker, and
funsetownlst() may later trigger a use-after-free when it clears the
file object reference for each entry in the list.

Fix this by introducing funsetown_locked(), which avoids the racy check.

Reviewed by:	kib
Reported by:	pho
Tested by:	pho
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D27157
2020-11-11 13:44:27 +00:00
Mateusz Guzik
26007fe37c thread: add more fine-grained tidhash locking
Note this still does not scale but is enough to move it out of the way
for the foreseable future.

In particular a trivial benchmark spawning/killing threads stops contesting
on tidhash.
2020-11-11 08:51:04 +00:00
Mateusz Guzik
aae3547be3 thread: rework tidhash vs proc lock interaction
Apart from minor clean up this gets rid of proc unlock/lock cycle on thread
exit to work around LOR against tidhash lock.
2020-11-11 08:50:04 +00:00
Mateusz Guzik
cf31cadeb6 thread: fix thread0 tid allocation
Startup code hardcodes the value instead of allocating it.
The first spawned thread would then be a duplicate.

Pointy hat:	mjg
2020-11-11 08:48:43 +00:00
Warner Losh
26676c47dc Add INIT_ALL_ZERO and INIT_ALL_PATTERN to kern.opts.mk
These options need to be in the kern.opts.mk file to be alive for kernel
and module builds. This also reverts r367579 since that's not needed with
this fix: the host's bsd.opts.mk is irrelevant.

Reviewed by: brooks@
Differential Revision:  https://reviews.freebsd.org/D27170
2020-11-10 23:25:16 +00:00
Mateusz Guzik
40aad3e477 thread: tidy up r367543
"locked" variable is spurious in the committed version.
2020-11-10 21:29:10 +00:00
Brooks Davis
d8033dc3d3 Be more tolerant of share/mk and kern.mk mismatch
When building out-of-tree modules, it appears that the system share/mk
is used, but sys/conf/kern.mk is used.  That results in MK_INIT_ALL_ZERO
being undefined.  In the interest of maximum compatability, check
that MK_INIT_ALL_* and COMPILER_FEATURES are defined before comparing
their values.

Reported by:	mmacy
Sponsored by:	DARPA
2020-11-10 21:12:32 +00:00
John Baldwin
b3ceca0c80 Clear tp->tod in t4_pcb_detach().
Otherwise, a socket can have a non-NULL tp->tod while TF_TOE is clear.
In particular, if a newly accepted socket falls back to non-TOE due to
an active open failure, the non-TOE socket will still have tp->tod set
even though TF_TOE is clear.

Reviewed by:	np
MFC after:	2 weeks
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D27028
2020-11-10 19:54:39 +00:00
Brooks Davis
e268fd0a02 Support initializing stack variables on function entry
There are two options:
 - WITH_INIT_ALL_ZERO: Zero all variables on the stack.
 - WITH_INIT_ALL_PATTERN: Initialize variables with well-defined patterns.

The exact pattern are a compiler implementation detail and vary by type.
They are somewhat documented in the LLVM commit message:
https://reviews.llvm.org/rL349442
I've used WITH_INIT_ALL_* to match Microsoft's InitAll feature rather
than naming them after the LLVM specific compiler flags.

In a range of consumer products, options like these are used in
both debug and production builds with debugs builds using patterns
(intended to provoke crashes on use of uninitialized values) and
production using zeros (deemed more likely to lead to harmless
misbehavior or NULL-pointer dereferences).

Reviewed by:	emaste
Obtained from:	CheriBSD
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D27131
2020-11-10 19:15:13 +00:00
John Baldwin
9ebe945bd7 Add C startup code tests for PIE binaries.
- Force dynamic to be a non-PIE binary.

- Add a dynamicpie test which uses a PIE binary.

Reviewed by:	andrew
Obtained from:	CheriBSD
MFC after:	2 weeks
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D27127
2020-11-10 19:09:35 +00:00
John Baldwin
f9fd7337f6 Fix dso_handle_check for PIE executables.
PIE executables use crtbeginS.o and have a non-NULL dso_handle as a
result.

Reviewed by:	andrew, emaste
MFC after:	2 weeks
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D27126
2020-11-10 19:07:30 +00:00
John Baldwin
ecad1d050c Rename __JCR_LIST__ to __JCR_END__ in crtend.c.
This is more consistent with the names used for .ctor and .dtor
symbols and better reflects __JCR_END__'s role.

Reviewed by:	andrew
Obtained from:	CheriBSD
MFC after:	2 weeks
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D27125
2020-11-10 19:04:54 +00:00