128699 Commits

Author SHA1 Message Date
Eric Joyner
749597dc1d ix, ixv: Read msix_bar from device configuration
Instead of predicting the MSI-X bar index based on the device's MAC
type, read it from the device's PCI configuration instead.

PR:		239704
Submitted by:	Piotr Pietruszewski <piotr.pietruszewski@intel.com>
Reviewed by:	erj@
MFC after:	3 days
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D21547
2019-09-24 17:06:32 +00:00
Eric Joyner
53b5b9b049 iflib: Remove redundant VLAN events deregistration
From Piotr:
r351152 introduced iflib_deregister() function calling
EVENTHANDLER_DEREGISTER() to unregister VLAN events. This patch removes
duplicate of EVENTHANDLER_DEREGISTER() calls placed in
iflib_device_deregister() as this function is now calling
iflib_deregister(). This is to avoid deregistering same event twice.

This patch also adds check in iflib_vlan_register() to prevent
registering VLAN while being in detach.

Patch co-authored by Krzysztof Galazka <krzysztof.galazka@intel.com>,
erj <erj@FreeBSD.org> and Jacob Keller <jacob.e.keller@intel.com>.

Signed-off-by: Piotr Pietruszewski <piotr.pietruszewski@intel.com>

Submitted by:	Piotr Pietruszewski <piotr.pietruszewski@intel.com>
Reviewed by:	gallatin@, erj@
MFC after:	3 days
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D21711
2019-09-24 17:03:31 +00:00
Olivier Cochard
16f9d2f3b8 Fix a minor typo
Approved by:	lwhsu
MFC after:	1 month
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D19970
2019-09-24 16:49:42 +00:00
Michael Tuexen
2b861c1538 Plumb a memory leak.
Thnanks to Felix Weinrank for finding this issue using fuzz testing
and reporting it for the userland stack:
https://github.com/sctplab/usrsctp/issues/378

MFC after:		3 days
2019-09-24 13:15:24 +00:00
Rick Macklem
5d85e12f44 Replace all mtx_lock()/mtx_unlock() on n_mtx with the macros.
For a long time, some places in the NFS code have locked/unlocked the
NFS node lock with the macros NFSLOCKNODE()/NFSUNLOCKNODE() whereas
others have simply used mtx_lock()/mtx_unlock().
Since the NFS node mutex needs to change to an sx lock so it can be held when
vnode_pager_setsize() is called, replace all occurrences of mtx_lock/mtx_unlock
with the macros to simply making the change to an sx lock in future commit.
There is no semantic change as a result of this commit.

I am not sure if the change to an sx lock will be MFC'd soon, so I put
an MFC of 1 week on this commit so that it could be MFC'd with that commit.

Suggested by:	kib
MFC after:	1 week
2019-09-24 01:58:54 +00:00
Li-Wen Hsu
05cba150d3 Clean LINT* kernel configurations for arm*
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2019-09-24 01:56:27 +00:00
Mateusz Guzik
93a85508ad cache: tidy up handling of negative entries
- track the total count of hot entries
- pre-read the lock when shrinking since it is typically already taken
- place the lock in its own cacheline
- shorten the hold time of hot lock list when zapping

Sponsored by:	The FreeBSD Foundation
2019-09-23 20:50:04 +00:00
Alexander Motin
1eab19cbec Make nvme(4) driver some more NUMA aware.
- For each queue pair precalculate CPU and domain it is bound to.
If queue pairs are not per-CPU, then use the domain of the device.
 - Allocate most of queue pair memory from the domain it is bound to.
 - Bind callouts to the same CPUs as queue pair to avoid migrations.
 - Do not assign queue pairs to each SMT thread.  It just wasted
resources and increased lock congestions.
 - Remove fixed multiplier of CPUs per queue pair, spread them even.
This allows to use more queue pairs in some hardware configurations.
 - If queue pair serves multiple CPUs, bind different NVMe devices to
different CPUs.

MFC after:	1 month
Sponsored by:	iXsystems, Inc.
2019-09-23 17:53:47 +00:00
Mark Johnston
9093dd9a66 Implement x86 dtrace_invop_(un)init() in C.
There is no reason for these routines to be written in assembly.  In
the ports of DTrace to other platforms, they are already written in C.
No functional change intended.

MFC after:	1 week
Sponsored by:	Netflix
2019-09-23 15:08:17 +00:00
Mark Johnston
07bf14bb72 Fix a harmless typo.
MFC after:	1 week
2019-09-23 14:34:23 +00:00
Mark Johnston
c7e224c66d Revert r316820.
Despite appearing correct, r316820 breaks packet rx/tx for jme(4)
interfaces.  With 12.1 approaching, let's just revert the commit for now.

PR:		233952
Tested by:	Armin Gruner <ag-freebsd@muc.de>
MFC after:	3 days
2019-09-23 14:29:05 +00:00
Mark Johnston
2ff730191d Set NX on some non-leaf direct map page table entries.
The direct map is never used for execution of code, so we might as well
set NX in the direct map's PML4Es.  Also clarify the intent of the code
in create_pagetables() that restricts access protections on the region
of the direct map mapping the kernel text.

Reviewed by:	alc, kib (previous version)
MFC after:	1 week
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D21759
2019-09-23 14:19:41 +00:00
Mark Johnston
38dae42c26 Use elf_relocaddr() when handling R_X86_64_RELATIVE relocations.
This is required for DPCPU and VNET data variable definitions to work when
KLDs are linked as DSOs.  R_X86_64_RELATIVE relocations should not appear
in object files, so assert this in elf_relocaddr().

Reviewed by:	kib
MFC after:	1 month
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D21755
2019-09-23 14:14:43 +00:00
Mark Johnston
751727948a Set NX in mappings created by pmap_kenter() and pmap_kenter_attr().
There does not appear to be any existing need for such mappings to be
executable.

Reviewed by:	alc, kib
MFC after:	1 month
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D21754
2019-09-23 14:11:59 +00:00
Kyle Evans
2e6a21bbd8 mips: fix XLPN32 after r352434
SYSINIT usage was added, but the <sys/kernel.h> dependency was not added.
This worked by coincidence, as most of the mips configs have DDB enabled and
pmap.c gets <sys/kernel.h> via ddb.h pollution.

Reported by:	dim
2019-09-23 12:43:08 +00:00
Tijl Coosemans
55258ab0ff Create a "drm" subdirectory for drm devices in linsysfs. Recent versions of
linux libdrm check for the existence of this directory:

https://cgit.freedesktop.org/mesa/drm/commit/?id=f8392583418aef5e27bfed9989aeb601e20cc96d

MFC after:	2 weeks
2019-09-23 12:27:55 +00:00
Mateusz Guzik
afe257e3ca cache: count evictions of negatve entries
Sponsored by:	The FreeBSD Foundation
2019-09-23 08:53:14 +00:00
Sean Eric Fagan
ba7a55d934 Add two options to allow mount to avoid covering up existing mount points.
The two options are

* nocover/cover:  Prevent/allow mounting over an existing root mountpoint.
E.g., "mount -t ufs -o nocover /dev/sd1a /usr/local" will fail if /usr/local
is already a mountpoint.
* emptydir/noemptydir:  Prevent/allow mounting on a non-empty directory.
E.g., "mount -t ufs -o emptydir /dev/sd1a /usr" will fail.

Neither of these options is intended to be a default, for historical and
compatibility reasons.

Reviewed by:	allanjude, kib
Differential Revision:	https://reviews.freebsd.org/D21458
2019-09-23 04:28:07 +00:00
Mateusz Guzik
7505cffa56 cache: try to avoid vhold if locks held
Sponsored by:	The FreeBSD Foundation
2019-09-22 20:50:24 +00:00
Mateusz Guzik
cd2112c305 cache: jump in negative success instead of positive
Sponsored by:	The FreeBSD Foundation
2019-09-22 20:49:17 +00:00
Mateusz Guzik
d2be3ef05c lockprof: move per-cpu data to dpcpu
Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21747
2019-09-22 20:44:24 +00:00
Konstantin Belousov
66eb1d6347 i386: reduce differences in source between PAE and non-PAE pmaps ...
by defining pg_nx as zero for non-PAE and correspondingly simplifying
some expressions.

Suggested and reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D21757
2019-09-22 19:59:10 +00:00
Konstantin Belousov
b223a69238 i386: implement sysctl vm.pmap.kernel_maps.
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D21739
2019-09-22 19:23:00 +00:00
Konstantin Belousov
a5181a86a2 amd64: minor tweaks to pat decoding in sysctl vm.pmap.kernel_maps.
Decode PAT_UNCACHED.
When unknown pat mode is encountered, print the pte bits combination
instead of the index, which is always 8.

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D21738
2019-09-22 19:20:37 +00:00
Kyle Evans
da0a7834ac octeon-sdk: suppress another set of warnings under clang
Clang sees this construct and warns that adding an int to a string like this
does not concatenate the two. Fortunately, this is not what octeon-sdk
actually intended to do, so we take the path towards remediation that clang
offers: use array indexing instead.
2019-09-22 18:32:05 +00:00
Kyle Evans
2946ed83c0 octeon1: suppress a couple of warnings under clang
These appear in octeon-sdk -- there are new releases, but they don't seem to
address the running issues in octeon-sdk. GCC4.2 is more than happy, but
clang is much less-so and most of them are fairly innocuous and perhaps a
by-product of their style guide, which may make some of the changes harder
to upstream (if this is even possible anymore).
2019-09-22 18:30:19 +00:00
Kyle Evans
b16a3c9d19 Honor CWARNFLAGS.clang/gcc in the kernel build
Some kernel builds or users may want to disable warnings on a per-compiler
basis, so do this now.
2019-09-22 18:27:57 +00:00
Michael Tuexen
1325a0de13 Don't hold the info lock when calling sctp_select_a_tag().
This avoids a double lock bug in the NAT colliding state processing
of SCTP. Thanks to Felix Weinrank for finding and reporting this issue in
https://github.com/sctplab/usrsctp/issues/374
He found this bug using fuzz testing.

MFC after:		3 days
2019-09-22 11:11:01 +00:00
Michael Tuexen
44f2a3272e Cleanup the RTO calculation and perform some consistency checks
before computing the RTO.
This should fix an overflow issue reported by Felix Weinrank in
https://github.com/sctplab/usrsctp/issues/375
for the userland stack and found by running a fuzz tester.

MFC after:		3 days
2019-09-22 10:40:15 +00:00
Andriy Gapon
38a1def12f MFZoL: Retire send space estimation via ZFS_IOC_SEND
Add a small wrapper around libzfs_core's lzc_send_space() to libzfs so
that every legacy ZFS_IOC_SEND consumer, along with their userland
counterpart estimate_ioctl(), can leverage ZFS_IOC_SEND_SPACE to
request send space estimation.

The legacy functionality in zfs_ioc_send() is left untouched for
compatibility purposes.

Obtained from:	ZoL
Obtained from:	zfsonlinux/zfs@cf7684bc8d
Author:		loli10K <ezomori.nozomu@gmail.com>
MFC after:	2 weeks
2019-09-22 08:44:41 +00:00
Konstantin Belousov
f33533da8c kern.elf{32,64}.pie_base sysctl: enforce page alignment.
Requested by:	rstone
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-09-21 20:03:17 +00:00
Alan Cox
2b28ec59ac In case a translation fault on the kernel address space occurs from
within a critical section, we must perform a lock-free check on the
faulting address.

Reported by:	andrew
Reviewed by:	andrew, markj
X-MFC with:	r350579
Differential Revision:	https://reviews.freebsd.org/D21685
2019-09-21 19:51:57 +00:00
Mateusz Guzik
cbba2cb367 lockprof: use CPUFOREACH and drop always false lp_cpu NULL checks
Sponsored by:	The FreeBSD Foundation
2019-09-21 19:05:38 +00:00
Konstantin Belousov
95aafd6900 Make non-ASLR pie base tunable.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-09-21 18:00:23 +00:00
Konstantin Belousov
6320faed10 amd64 pmap: Fix formats for 64bit addresses in ddb and sysctl output.
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D21737
2019-09-21 17:59:15 +00:00
Alexander Motin
36d151a237 Allocate callout wheel from the respective memory domain.
MFC after:	1 week
2019-09-21 15:38:08 +00:00
Kyle Evans
5fdac75222 msdosfs: do not deget unlinked denodes
When a file is unlinked, the denode is not reclaimed until the last
reference is dropped, but the directory entry is immediately up for reuse.
This is a problem later when createde goes to grab a denode for the newly
created entry -- we search the hash and find a dead denode, then return that
without even bumping the reference count and the data later gets truncated
when the the last reference to the unlinked file is dropped.

This manifested itself as a broken in-place strip(1) on msdosfs. elfcopy
will do a sequence incredibly roughly like this:

open("/mnt/foo", ...) => fd 3
mmap()
unlink("/mnt/foo")
open("/mnt/foo", ...) => fd 4
write(4, ...)
close(4)
close(3)

and the resulting file would be truncated, but the write succeeded, as long
as a reference to the unlinked file had not been closed.

Some archaeology indicates that this bug has likely existed since msdosfs
was converted to use vfs_hash instead of a home rolled hash implementation
in r143570. Prior to that point, the hashget implementation would do a
refcnt check while searching and explicitly only return a denode with
de_refcnt != 0. vfs_hash did not yet have the callback that it does today,
so this slipped away and did not come back when it later grew that
functionality.

The comment indicating that we want to skip these denodes has been updated
to reflect where this is actually done. My repo-diving session seems to
indicate that the refcnt check was likely never actually below the comment,
to be pedantic, but instead a detail wrapped up in the hashget
implementation since the beginning of its inclusion into FreeBSD.

This bug was the cause behind the issue addressed in r352557.

Reported by:	jhibbits
Reviewed by:	kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D21731
2019-09-20 20:47:10 +00:00
Hans Petter Selasky
7fca0e69f6 Add quirk for XHCI(4) controllers to support USB control transfers
above 1Kbyte.  It might look like some XHCI(4) controllers do not
support when the USB control transfer is split using a link TRB. The
next NORMAL TRB after the link TRB is simply failing with XHCI error
code 4. The quirk ensures we allocate a 64Kbyte buffer so that the
data stage TRB is not broken with a link TRB.

Found at:	EuroBSDcon 2019
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2019-09-20 11:28:45 +00:00
Hans Petter Selasky
4631d7f717 Increase the maximum user-space buffer size from 256kBytes to 32MBytes for
libusb. This is useful for speeding up large data transfers while reducing
the interrupt rate.

Found at:	EuroBSDcon 2019
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2019-09-20 11:00:02 +00:00
Hans Petter Selasky
4e792e431a The maximum TD size is 31 and not 15.
Found at:	EuroBSDcon 2019
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2019-09-20 10:56:13 +00:00
Andrew Gallatin
61b8a4af71 remove redundant "ktls" in KTLS thr name
This reducesthe string width of the ktls thread name
and improves "ps" output.

Glanced at by: jhb
Event: EuroBSDCon hackathon
Sponsored by:	Netflix
2019-09-20 09:36:07 +00:00
Ed Maste
8b43027381 elf_common: add ELF note names
r348628 added a definition of NT_GNU_BUILD_ID.  Some software (Valgrind)
also expects a #define for the note name (ELF_NOTE_GNU) in the case that
NT_GNU_BUILD_ID is defined.

PR:		239669
Reported by:	Yuichiro NAITO
Sponsored by:	The FreeBSD Foundation
Event:		EuroBSDCon FreeBSD DevSummit 2019
2019-09-20 09:04:52 +00:00
Michael Tuexen
e6b3bd22d8 Fix the handling of invalid parameters in ASCONF chunks.
Thanks to Mark Wodrich from Google for reproting the issue in
https://github.com/sctplab/usrsctp/issues/376
for the userland stack.

MFC after:		3 days
2019-09-20 08:20:20 +00:00
Alexander Motin
657dc81d90 Improve ioat(4) NUMA-awareness.
Allocate ioat->ring memory from the device domain.
Schedule ioat->poll_timer to the first CPU of the device domain.

According to pcm-numa tool from intel-pcm port, this reduces number of
remote DRAM accesses while copying data by 75%.  And unless it is a noise,
I've noticed some speed improvement when copying data to other domain.

MFC after:	1 week
Sponsored by:	iXsystems, Inc.
2019-09-19 22:15:57 +00:00
Mateusz Guzik
b488246b45 vfs: group fields used for per-cpu ops in one cacheline
Sponsored by:	The FreeBSD Foundation
2019-09-19 21:23:14 +00:00
Michael Tuexen
dd3121a895 When the RACK stack computes the space for user data in a TCP segment,
it wasn't taking the IP level options into account. This patch fixes this.
In addition, it also corrects a KASSERT and adds protection code to assure
that the IP header chain and the TCP head fit in the first fragment as
required by RFC 7112.

Reviewed by:		rrs@
MFC after:		3 days
Sponsored by:		Nertflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D21666
2019-09-19 10:27:47 +00:00
Michael Tuexen
e7a541b0b9 When processing an incoming IPv6 packet over the loopback interface which
contains Hop-by-Hop options, the mbuf chain is potentially changed in
ip6_hopopts_input(), called by ip6_input_hbh().
This can happen, because of the the use of IP6_EXTHDR_CHECK, which might
call m_pullup().
So provide the updated pointer back to the called of ip6_input_hbh() to
avoid using a freed mbuf chain in`ip6_input()`.

Reviewed by:		markj@
MFC after:		3 days
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D21664
2019-09-19 10:22:29 +00:00
Andriy Gapon
6caa629e73 fix dsl_scan_ds_clone_swapped logic
It was incorrect with respect to swapping dataset IDs both in the
on-disk ZAP object and the in-memory queue.

In both cases, if only ds1 was already present, then it would be first
replaced with ds2 and then ds2 would be replaced back with ds1.  Also,
both cases did not properly handle a situation where both ds1 and ds2
are already queued.  A duplicate insertion would be attempted and its
failure would result in a panic.

This change has also been submitted to ZoL as zfsonlinux/zfs@dd262c9

PR:		239566
Reported by:	pascal.guitierrez@gmail.com
MFC after:	4 days
Sponsored by:	CyberSecure
2019-09-19 09:43:56 +00:00
Andriy Gapon
9a2ed10014 vt: fix problems with trying to switch to a closed VT
If there is an attempt to switch from a process-owned VT to a closed VT,
then vt(4) first requests the process to release its VT and only then
realizes that the target VT is closed and, so, the switch is not
possible.  So, the driver does not actually do any switch, but at the
same time the owning process is not notified about that and it does not
re-acquire the VT.

This change adds an early check for the target VT state, so that the
switch can be refused before the process coordination dance.
On top of that, the code now checks for a failure of vt_window_switch()
and calls vt_window_postswitch() for the current VT if it is in the
process mode.

Test Plan:
- configure VT1 - VT8 (ttyv0 - ttyv7) to be text consoles (run getty)
- configure VT9 (ttyv8) to rn X server
- make sure that the X server configuration allows VT switching
- leave VT10 - VT12 unconfigured
- while in the X server press Ctrl+Alt+F10
- without the patch, observe strange screen content and problems with
  keyboard input
- with the patch, observe that nothing happens

The problem has been observed and the fix has been tested with an nVidia
graphics card and the proprietary nvidia driver.
Not sure if that matters.

Reviewed by:	ray
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D21704
2019-09-19 09:22:45 +00:00
Allan Jude
4a9c211af5 sys/vm/vm_glue.c: Incorrect function name in panic string
Use __func__ to avoid this issue in the future.

Submitted by:	Wuyang Chung <wuyang.chung1@gmail.com>
Reviewed by:	markj, emaste
Obtained from:	https://github.com/freebsd/freebsd/pull/410
2019-09-19 07:28:24 +00:00