Commit Graph

134824 Commits

Author SHA1 Message Date
Hans Petter Selasky
ffdb195f31 Enhance the mlx5_core_create_cq() function in mlx5core.
Enhance mlx5_core_create_cq() to get the command out buffer from the
callers to let them use the output.

Linux commit:
38164b771947be9baf06e78ffdfb650f8f3e908e

MFC after:	1 week
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2020-11-16 10:06:10 +00:00
Hans Petter Selasky
4a64b690f1 Use mlx5core to create/destroy all Dynamically Connected Targets, DCTs.
To prevent a hardware memory leak when a DEVX DCT object is destroyed
without calling drain DCT before, (e.g. under cleanup flow), need to
manage its creation and destruction via mlx5 core.

Linux commit:
c5ae1954c47d3fd8815bd5a592aba18702c93f33

MFC after:	1 week
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2020-11-16 10:03:18 +00:00
Hans Petter Selasky
8114aeea44 Fix error handling order in create_kernel_qp in mlx5ib.
Make sure order of cleanup is exactly the opposite of initialization.

Linux commit:
f4044dac63e952ac1137b6df02b233d37696e2f5

MFC after:	1 week
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2020-11-16 10:00:21 +00:00
Mateusz Guzik
19d3e47dca select: call seltdfini on process and thread exit
Since thread_zone is marked NOFREE the thread_fini callback is never
executed, meaning memory allocated by seltdinit is never released.

Adding the call to thread_dtor is not sufficient as exiting processes
cache the main thread.
2020-11-16 03:12:21 +00:00
Mateusz Guzik
31b2ac4b5a select: replace reference counting with memory barriers in selfd
Refcounting was added to combat a race between selfdfree and doselwakup,
but it adds avoidable overhead.

selfdfree detects it can free the object by ->sf_si == NULL, thus we can
ensure that the condition only holds after all accesses are completed.
2020-11-16 03:09:18 +00:00
Mateusz Guzik
b77594bbbf sched: fix an incorrect comparison in sched_lend_user_prio_cond
Compare with sched_lend_user_prio.
2020-11-15 01:54:44 +00:00
Mateusz Guzik
6b45dbe56c cred: annotate credbatch_process argument as unused
Fixes libprocstat compilation as zfs defines _KERNEL.
2020-11-14 19:56:11 +00:00
Mateusz Guzik
5596f836e7 zfs: disable periodic arc updates
They are only there to provide less innacurate statistics for debuggers.
However, this is quite heavy-weight and instead it would be better to
teach debuggers how to obtain the necessary information.
2020-11-14 19:23:07 +00:00
Mateusz Guzik
f34a2f56c3 thread: batch credential freeing 2020-11-14 19:22:02 +00:00
Mateusz Guzik
fb8ab68084 thread: batch resource limit free calls 2020-11-14 19:21:46 +00:00
Mateusz Guzik
5ef7b7a0f3 thread: rework tid batch to use helpers 2020-11-14 19:20:58 +00:00
Mateusz Guzik
25600ad516 cred: reorder cr_audit to be closer to the lock
This makes cr_uid avoid sharing.
2020-11-14 19:20:37 +00:00
Mateusz Guzik
d1ca25be49 thread: pad tid lock
On a kernel with other changes this bumps 104-way thread creation/destruction
from 0.96 mln ops/s to 1.1 mln ops/s.
2020-11-14 19:19:27 +00:00
Jonathan T. Looney
440598dd9e Fix implicit automatic local port selection for IPv6 during connect calls.
When a user creates a TCP socket and tries to connect to the socket without
explicitly binding the socket to a local address, the connect call
implicitly chooses an appropriate local port. When evaluating candidate
local ports, the algorithm checks for conflicts with existing ports by
doing a lookup in the connection hash table.

In this circumstance, both the IPv4 and IPv6 code look for exact matches
in the hash table. However, the IPv4 code goes a step further and checks
whether the proposed 4-tuple will match wildcard (e.g. TCP "listen")
entries. The IPv6 code has no such check.

The missing wildcard check can cause problems when connecting to a local
server. It is possible that the algorithm will choose the same value for
the local port as the foreign port uses. This results in a connection with
identical source and destination addresses and ports. Changing the IPv6
code to align with the IPv4 code's behavior fixes this problem.

Reviewed by:	tuexen
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D27164
2020-11-14 14:50:34 +00:00
Vladimir Kondratyev
146e176df7 LinuxKPI: Exclude linux/acpi.h content on non-ACPI archs.
LinuxKPI ACPI support is based on FreeBSD import of ACPICA which can be
compiled only on aarch64, amd64 and i386. Ifdef-out broken parts on our
side to avoid patching of vendor code.

This fixes drm-devel-kmod build on powerpc64(le).

Reported by:	pkubaj
2020-11-14 10:34:18 +00:00
Konstantin Belousov
8a1509e442 Handle LoR in flush_pagedep_deps().
When operating in SU or SU+J mode, ffs_syncvnode() might need to
instantiate other vnode by inode number while owning syncing vnode
lock.  Typically this other vnode is the parent of our vnode, but due
to renames occuring right before fsync (or during fsync when we drop
the syncing vnode lock, see below) it might be no longer parent.

More, the called function flush_pagedep_deps() needs to lock other
vnode while owning the lock for vnode which owns the buffer, for which
the dependencies are flushed.  This creates another instance of the
same LoR as was fixed in softdep_sync().

Put the generic code for safe relocking into new SU helper
get_parent_vp() and use it in flush_pagedep_deps().  The case for safe
relocking of two vnodes with undefined lock order was extracted into
vn helper vn_lock_pair().

Due to call sequence
     ffs_syncvnode()->softdep_sync_buf()->flush_pagedep_deps(),
ffs_syncvnode() indicates with ERELOOKUP that passed vnode was
unlocked in process, and can return ENOENT if the passed vnode
reclaimed.  All callers of the function were inspected.

Because UFS namei lookups store auxiliary information about directory
entry in in-memory directory inode, and this information is then used
by UFS code that creates/removed directory entry in the actual
mutating VOPs, it is critical that directory vnode lock is not dropped
between lookup and VOP.  For softdep_prelink(), which ensures that
later link/unlink operation can proceed without overflowing the
journal, calls were moved to the place where it is safe to drop
processing VOP because mutations are not yet applied.  Then, ERELOOKUP
causes restart of the whole VFS operation (typically VFS syscall) at
top level, including the re-lookup of the involved pathes.  [Note that
we already do the same restart for failing calls to vn_start_write(),
so formally this patch does not introduce new behavior.]

Similarly, unsafe calls to fsync in snapshot creation code were
plugged.  A possible view on these failures is that it does not make
sense to continue creating snapshot if the snapshot vnode was
reclaimed due to forced unmount.

It is possible that relock/ERELOOKUP situation occurs in
ffs_truncate() called from ufs_inactive().  In this case, dropping the
vnode lock is not safe.  Detect the situation with VI_DOINGINACT and
reschedule inactivation by setting VI_OWEINACT.  ufs_inactive()
rechecks VI_OWEINACT and avoids reclaiming vnode is truncation failed
this way.

In ffs_truncate(), allocation of the EOF block for partial truncation
is re-done after vnode is synced, since we cannot leave the buffer
locked through ffs_syncvnode().

In collaboration with:	pho
Reviewed by:	mckusick (previous version), markj
Tested by:	markj (syzkaller), pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D26136
2020-11-14 05:30:10 +00:00
Konstantin Belousov
738ea0010b Add ffs_inode_bwrite() helper.
In collaboration with:	pho
Reviewed by:	mckusick (previous version), markj
Tested by:	markj (syzkaller), pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D26136
2020-11-14 05:19:59 +00:00
Konstantin Belousov
7b795aa3c0 Revert r367669 to re-commit with proper message 2020-11-14 05:19:44 +00:00
Konstantin Belousov
c0d2077f41 Add a framework that tracks exclusive vnode lock generation count for UFS.
This count is memoized together with the lookup metadata in directory
inode, and we assert that accesses to lookup metadata are done under
the same lock generation as they were stored.  Enabled under DIAGNOSTICS.

UFS saves additional data for parent dirent when doing lookup
(i_offset, i_count, i_endoff), and this data is used later by VOPs
operating on dirents.  If parent vnode exclusive lock is dropped and
re-acquired between lookup and the VOP call, we corrupt directories.

Framework asserts that corruption cannot occur that way, by tracking
vnode lock generation counter.  Updates to inode dirent members also
save the counter, while users compare current and saved counters
values.

Also, fix a case in ufs_lookup_ino() where i_offset and i_count could
be updated under shared lock.  It is not a bug on its own since dvp
i_offset results from such lookup cannot be used, but it causes false
positive in the checker.

In collaboration with:	pho
Reviewed by:	mckusick (previous version), markj
Tested by:	markj (syzkaller), pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D26136
2020-11-14 05:17:04 +00:00
Konstantin Belousov
61846fc4dc Add a framework that tracks exclusive vnode lock generation count for UFS.
This count is memoized together with the lookup metadata in directory
inode, and we assert that accesses to lookup metadata are done under
the same lock generation as they were stored.  Enabled under DIAGNOSTICS.

UFS saves additional data for parent dirent when doing lookup
(i_offset, i_count, i_endoff), and this data is used later by VOPs
operating on dirents.  If parent vnode exclusive lock is dropped and
re-acquired between lookup and the VOP call, we corrupt directories.

Framework asserts that corruption cannot occur that way, by tracking
vnode lock generation counter.  Updates to inode dirent members also
save the counter, while users compare current and saved counters
values.

Also, fix a case in ufs_lookup_ino() where i_offset and i_count could
be updated under shared lock.  It is not a bug on its own since dvp
i_offset results from such lookup cannot be used, but it causes false
positive in the checker.

In collaboration with:	pho
Reviewed by:	mckusick (previous version), markj
Tested by:	markj (syzkaller), pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D26136
2020-11-14 05:10:39 +00:00
Alexander Motin
0bed3eabc5 Add PMRCAP printing and fix earlier CAP_HI.
MFC after:	3 days
2020-11-14 01:45:34 +00:00
Jung-uk Kim
fbde34778b MFV: r367652
Merge ACPICA 20201113.
2020-11-13 22:45:26 +00:00
Mateusz Guzik
9b9bb9ffa5 malloc: retire MALLOC_PROFILE
The global array has prohibitive performance impact on multicore systems.

The same data (and more) can be obtained with dtrace.

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D27199
2020-11-13 19:22:53 +00:00
Ed Maste
d801ff15dc Disable kernel INIT_ALL_ZERO on amd64
It is currently incompatible with kernel ifunc memset.

PR:		251083
MFC with:	r367577
Sponsored by:	The FreeBSD Foundation
2020-11-13 18:34:13 +00:00
Ed Maste
360d1232ab ip_fastfwd: style(9) tidy for r367628
Discussed with:	gnn
MFC with:	r367628
2020-11-13 18:25:07 +00:00
Brandon Bergren
0e0457251b [PowerPC64LE] Radix MMU fixes for LE.
There were many, many endianness fixes needed for Radix MMU. The Radix
pagetable is stored in BE (as it is read and written to by the MMU hw),
so we need to convert back and forth every time we interact with it when
running in LE.

With these changes, I can successfully boot with radix enabled on POWER9 hw.

Reviewed by:	luporl, jhibbits
Sponsored by:	Tag1 Consulting, Inc.
Differential Revision:	https://reviews.freebsd.org/D27181
2020-11-13 16:56:03 +00:00
Brandon Bergren
26869ad14c [PowerPC] Allow traversal of oversize OF properties.
In standards such as LoPAPR, property names in excess of the usual 31
characters exist.

This breaks property traversal.

While in IEEE 1275-1994, nextprop is defined explicitly to work with a
32-byte region of memory, using a larger buffer should be fine. There is
actually no way to pass a buffer length to the nextprop call in the OF
client interface, so SLOF actually just blindly overflows the buffer.

So we have to defensively make the buffer larger, to avoid memory
corruption when reading out long properties on live OF systems.

Note also that on real-mode OF, things are pretty tight because we are
allocating against a static bounce buffer in low memory, so we can't just
use a huge buffer to work around this without it being wasteful of our
limited amount of 32-bit physical memory.

This allows a patched ofwdump to operate properly on SLOF (i.e. pseries)
systems, as well as any other PowerPC systems with overlength properties.

Reviewed by:	jhibbits
MFC after:	2 weeks
Sponsored by:	Tag1 Consulting, Inc.
Differential Revision:	https://reviews.freebsd.org/D26669
2020-11-13 16:49:41 +00:00
George V. Neville-Neil
d65d6d5aa9 Followup pointed out by ae@ 2020-11-13 13:07:44 +00:00
Konstantin Belousov
441eb16a95 Allow some VOPs to return ERELOOKUP to indicate VFS operation restart at top level.
Restart syscalls and some sync operations when filesystem indicated
ERELOOKUP condition, mostly for VOPs operating on metdata.  In
particular, lookup results cached in the inode/v_data is no longer
valid and needs recalculating.  Right now this should be nop.

Assert that ERELOOKUP is catched everywhere and not returned to
userspace, by asserting that td_errno != ERELOOKUP on syscall return
path.

In collaboration with:	pho
Reviewed by:	mckusick (previous version), markj
Tested by:	markj (syzkaller), pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D26136
2020-11-13 09:42:32 +00:00
Konstantin Belousov
7cde2ec4fd Implement vn_lock_pair().
In collaboration with:	pho
Reviewed by:	mckusick (previous version), markj (previous version)
Tested by:	markj (syzkaller), pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D26136
2020-11-13 09:31:57 +00:00
Adrian Chadd
cc4d36325b [malloc] quieten -Werror=missing-braces with malloc.h wth gcc-6.4
This sets off gcc-6.4 to spit out a 'error: missing braces around initializer'
error when compiling this.

Remove it as it isn't needed.

Reviewed by:	brooks
Differential Revision:	 https://reviews.freebsd.org/D27183
2020-11-13 01:53:59 +00:00
George V. Neville-Neil
8ad114c082 An earlier commit effectively turned out the fast forwading path
due to its lack of support for ICMP redirects. The following commit
adds redirects to the fastforward path, again allowing for decent
forwarding performance in the kernel.

Reviewed by: ae, melifaro
Sponsored by: Rubicon Communications, LLC (d/b/a "Netgate")
2020-11-12 21:58:47 +00:00
Mateusz Guzik
9aa6d792b5 malloc: retire malloc_last_fail
The routine does not serve any practical purpose.

Memory can be allocated in many other ways and most consumers pass the
M_WAITOK flag, making malloc not fail in the first place.

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D27143
2020-11-12 20:22:58 +00:00
Mateusz Guzik
d5127d1ae2 gbde: replace malloc_last_fail with a kludge
This facilitates removal of malloc_last_fail without really impacting
anything.
2020-11-12 20:20:57 +00:00
Alexander Motin
46fbd8004f Fix panic if NVMe is detached before the intrhook call.
MFC after:	1 week
Sponsored by:	iXsystems, Inc.
2020-11-12 20:20:43 +00:00
Navdeep Parhar
bdabd00d65 cxgbe/t4_tom: Handle VXLAN-encapsulated SYNs correctly.
TCP SYNs in inner traffic will hit hardware listeners when VXLAN/NVGRE
rx parsing is enabled in the chip.  t4_tom should pass on these SYNs to
the kernel and let it deal with them as if they arrived on the non-TOE
path.

Reported by:	Sony at Chelsio
MFC after:	1 week
Sponsored by:	Chelsio Communications
2020-11-12 20:02:48 +00:00
Hans Petter Selasky
6abe97c014 Add more USB quirks.
PR:		230038
MFC after:	1 week
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2020-11-12 18:24:37 +00:00
Mateusz Piotrowski
e2a03adb53 Fix a typo in a license comment
Approved by:	kaktus (src)
2020-11-12 15:50:18 +00:00
Mark Johnston
381219b64d qat: Fix nits reported by Coverity
MFC after:	3 days
Sponsored by:	Rubicon Communications, LLC (Netgate)
2020-11-12 15:00:48 +00:00
Hans Petter Selasky
f14436adc6 Add a tunable sysctl, hw.usb.uaudio.handle_hid, to allow disabling the
the HID volume keys support in the USB audio driver.

While at it re-organize the USB audio sysctls a bit.

Differential Revision:	https://reviews.freebsd.org/D27180
MFC after:	1 week
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2020-11-12 09:26:01 +00:00
Hans Petter Selasky
eb985e1802 When doing a USB alternate setting on an USB interface we need to
re-configure the XHCI endpoint context.

Differential Revision:	https://reviews.freebsd.org/D27174
MFC after:	1 week
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2020-11-12 09:15:07 +00:00
Konstantin Belousov
0b8e170d95 mlx5en: Set ifmr_current same as ifmr_active.
This both:
- makes ifconfig media line similar to that of other drivers.
- fixes ENXIO in case when paradoxical current media word is not registered.

Now e.g.
      ifconfig mce0 -mediaopt txpause,rxpause
works by disabling pauses if enabled.

Sponsored by:	Mellanox Technologies/NVidia Networking
MFC after:	1 week
2020-11-12 02:25:10 +00:00
Konstantin Belousov
bab0c4b1a0 mlx5en: stop ignoring pauses and flow in the media reqs.
Sponsored by:	Mellanox Technologies/NVidia Networking
MFC after:	1 week
2020-11-12 02:23:27 +00:00
Konstantin Belousov
559dbeac47 mlx5en: Register all combinations of FDX/RXPAUSE/TXPAUSE as valid media types.
Sponsored by:	Mellanox Technologies/NVidia Networking
MFC after:	1 week
2020-11-12 02:22:16 +00:00
Konstantin Belousov
4ead80241a mlx5en: Refactor repeated code to register media type to mlx5e_ifm_add().
Sponsored by:	Mellanox Technologies/NVidia Networking
MFC after:	1 week
2020-11-12 02:21:14 +00:00
Navdeep Parhar
f14d7c9516 cxgbev(4): Make sure that the iq/eq map sizes are correct for VFs.
This should have been part of r366929.

MFC after:	3 days
Sponsored by:	Chelsio Communications
2020-11-12 01:18:05 +00:00
Mateusz Guzik
62dbc992ad thread: move nthread management out of tid_alloc
While this adds more work single-threaded, it also enables SMP-related
speed ups.
2020-11-12 00:29:23 +00:00
Kyle Evans
38033780a3 umtx: drop incorrect timespec32 definition
This works for amd64, but none others -- drop it, because we already have a
proper definition in sys/compat/freebsd32/freebsd32.h that correctly uses
time32_t.

MFC after:	1 week
2020-11-11 22:35:23 +00:00
Alexander Motin
8054320e07 Make CTL nicer to increased MAXPHYS.
Before this CTL always allocated MAXPHYS-sized buffers, even for 4KB I/O,
that is even more overkill for MAXPHYS of 1MB.  This change limits maximum
allocation to 512KB if MAXPHYS is bigger, plus if one is above 128KB, adds
new 128KB UMA zone for smaller I/Os.  The patch factors out alloc/free,
so later we could make it use more zones or malloc() if we'd like.

MFC after:	1 week
Sponsored by:	iXsystems, Inc.
2020-11-11 21:59:39 +00:00
Mateusz Guzik
755341df4f thread: batch tid_free calls in thread_reap
This eliminates the highly pessimal pattern of relocking from multiple
CPUs in quick succession. Note this is still globally serialized.
2020-11-11 18:45:06 +00:00