Commit Graph

127470 Commits

Author SHA1 Message Date
Mitchell Horne
c04c594daa RISC-V: Clean up some GENERIC options
Some of the config options that are disabled by default seem to be only
for historical reasons. Enable those that appear to no longer be
problematic. This includes WITH_CTF, STACK, GEOM_RAID, and re-enabling
blacklisted kernel modules.

Reviewed by:	markj
Approved by:	markj (mentor)
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D20495
2019-06-09 15:50:35 +00:00
Mitchell Horne
bffa317ff2 RISC-V: Announce real and available memory at boot
Most architectures print their total (real) and available memory during
boot. Properly initialize the realmem global and print these messages.
Also print the physical memory chunks (behind a bootverbose flag).

Reviewed by:	markj
Approved by:	markj (mentor)
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D20496
2019-06-09 15:48:36 +00:00
Mitchell Horne
93ca8057c5 Add TSLOG events to initriscv()
Add the enter and exit events, similar to what's found in
hammer_time() on amd64. We must use TSRAW as the pcpu isn't yet
initialized.

Reviewed by:	markj
Approved by:	markj (mentor)
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D20497
2019-06-09 15:45:48 +00:00
Mitchell Horne
6ae48dd870 Fix global pointer relaxations in the RISC-V kernel
The gp register is intended to used by the linker as another means of
performing relaxations, and should point to the small data section (.sdata).

Currently gp is being used as the pcpu pointer within the kernel, but the more
appropriate choice for this is the tp register, which is unused.

Swap existing usage of gp with tp within the kernel, and set up gp properly
at boot with the value of __global_pointer$ for all harts.

Additionally, remove some cases of accessing tp from the PCB, as it is not
part of the per-thread state. The user's tp and gp should be tracked only
through the trapframe.

Reviewed by:	markj, jhb
Approved by:	markj (mentor)
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D19893
2019-06-09 15:43:38 +00:00
Mitchell Horne
fc261c16bd Remove block of dead code
Approved by:	markj (mentor)
2019-06-09 15:36:51 +00:00
Alan Cox
1fe65d054b Correct a new KASSERT() in r348828.
X-MFC with:	r348828
2019-06-09 05:55:58 +00:00
Alan Cox
fd2dae0a30 Implement an alternative solution to the amd64 and i386 pmap problem that we
previously addressed in r348246.

This pmap problem also exists on arm64 and riscv.  However, the original
solution developed for amd64 and i386 cannot be used on arm64 and riscv.  In
particular, arm64 and riscv do not define a PG_PROMOTED flag in their level
2 PTEs.  (A PG_PROMOTED flag makes no sense on arm64, where unlike x86 or
riscv we are required to break the old 4KB mappings before making the 2MB
mapping; and on riscv there are no unused bits in the PTE to define a
PG_PROMOTED flag.)

This commit implements an alternative solution that can be used on all four
architectures.  Moreover, this solution has two other advantages.  First, on
older AMD processors that required the Erratum 383 workaround, it is less
costly.  Specifically, it avoids unnecessary calls to pmap_fill_ptp() on a
superpage demotion.  Second, it enables the elimination of some calls to
pagezero() in pmap_kernel_remove_{l2,pde}().

In addition, remove a related stale comment from pmap_enter_{l2,pde}().

Reviewed by:	kib, markj (an earlier version)
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D20538
2019-06-09 03:36:10 +00:00
Vladimir Kondratyev
6c53fea7d6 psm(4): Add extra sanity checks to Elantech trackpoint packet parser.
Add strict checks for unused bit states in Elantech trackpoint packet
parser to filter out spurious events produces by some hardware which
are detected as trackpoint packets. See comment on r328191 for example.

Tested by:	Andrey Kosachenko <andrey.kosachenko@gmail.com>
2019-06-08 21:36:22 +00:00
Vladimir Kondratyev
8fa4620039 psm(4): Fix Elantech trackpoint support.
Sign bits for X and Y motion data were taken from wrong places.

PR:		238291
Reported by:	Andrey Kosachenko <andrey.kosachenko@gmail.com>
Tested by:	Andrey Kosachenko <andrey.kosachenko@gmail.com>
MFC after:	2 weeks
2019-06-08 21:33:34 +00:00
Konstantin Belousov
452a2db863 Style MAP_ENTRY_ and MAP_ definitions.
Spell all bits in the hex constants.
Since all lines are modified, consistently use <tab> after #define.

Reviewed by:	alc (previous version), dougm
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
Differential revision:	https://reviews.freebsd.org/D20560
2019-06-08 20:28:04 +00:00
Konstantin Belousov
2f73a6e9c4 Correct definition for PGEX_SGX.
At the moment it is only used for page fault error code textual
representation.

Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2019-06-08 20:26:04 +00:00
Konstantin Belousov
8b49f4dd80 Make trap_msg array constant as well.
Suggested by:	tijl
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-06-08 19:50:57 +00:00
Jonathan T. Looney
ca8929d2a3 Currently, MCA entries remain on an every-growing linked list. This means
that it becomes increasingly expensive to process a steady stream of
correctable errors. Additionally, the memory used by the MCA entries can
grow without bound.

Change the code to maintain two separate lists: a list of entries which
still need to be logged, and a list of entries which have already been
logged. Additionally, allow a user-configurable limit on the number of
entries which will be saved after they are logged. (The limit defaults
to -1 [unlimited], which is the current behavior.)

Reviewed by:	imp, jhb
MFC after:	2 weeks
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D20482
2019-06-08 18:26:48 +00:00
Doug Moore
7c022327ab Simple code refactoring originally in D13484.
Extract swp_pager_force_dirty() and swp_pager_force_launder() out of
swp_pager_force_pagein().

Extract swap_pager_swapoff_object() out of swap_pager_swapoff().

Submitted by: ota_j.email.ne.jp
Reviewed by: alc, dougm
Approved by: kib (mentor)
Differential Revision: https://reviews.freebsd.org/D20545
2019-06-08 17:49:17 +00:00
Bjoern A. Zeeb
4c62bffef5 Fix dpcpu and vnet panics with complex types at the end of the section.
Apply a linker script when linking i386 kernel modules to apply padding
to a set_pcpu or set_vnet section.  The padding value is kind-of random
and is used to catch modules not compiled with the linker-script, so
possibly still having problems leading to kernel panics.

This is needed as the code generated on certain architectures for
non-simple-types, e.g., an array can generate an absolute relocation
on the edge (just outside) the section and thus will not be properly
relocated. Adding the padding to the end of the section will ensure
that even absolute relocations of complex types will be inside the
section, if they are the last object in there and hence relocation will
work properly and avoid panics such as observed with carp.ko or ipsec.ko.

There is a rather lengthy discussion of various options to apply in
the mentioned PRs and their depends/blocks, and the review.
There seems no best solution working across multiple toolchains and
multiple version of them, so I took the liberty of taking one,
as currently our users (and our CI system) are hitting this on
just i386 and we need some solution.  I wish we would have a proper
fix rather than another "hack".

Also backout r340009 which manually, temporarily fixed CARP before 12.0-R
"by chance" after a lead-up of various other link-elf.c and related fixes.

PR:			230857,238012
With suggestions from:	arichardson (originally last year)
Tested by:		lwhsu
Event:			Waterloo Hackathon 2019
Reported by:		lwhsu, olivier
MFC after:		6 weeks
Differential Revision:	https://reviews.freebsd.org/D17512
2019-06-08 17:44:42 +00:00
Bjoern A. Zeeb
6e33e7e0f9 Remove extra stray + from a diff from the beginning of the lines after
r348805 to fix the build.  Please do not ask how 3 more local builds
succeeded without barfing.

Pointyhat to:		bz
MFC after:		6 weeks
X-MFC with:		r348805
2019-06-08 17:38:27 +00:00
Bjoern A. Zeeb
67ca7330cf Add SDIO support.
Add a CAM-Newbus SDIO support module.  This works provides a newbus
infrastructure for device drivers wanting to use SDIO.  On the lower end
while it is connected by newbus to SDHCI, it talks CAM using the MMCCAM
framework to get to it.

This also duplicates the usbdevs framework to equally create sdiodev
header files with #defines for "vendors" and "products".

Submitted by:	kibab (initial work, see https://reviews.freebsd.org/D12467)
Reviewed by:	kibab, imp (comments on earlier version)
MFC after:	6 weeks
Relnotes:	yes
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D19749
2019-06-08 16:26:56 +00:00
Bjoern A. Zeeb
9c907eb913 bcm2835_sdhci.c: exit DMA if not enough data left to avoid timeout errors
In the DMA case, given we disable the data interrupts, we never seem
to get DATA_END.  Given we are relying on DMA interrupts we are not
using the SDHCI state machine and hence only call into
sdhci_platform_will_handle() for the first check of data.
We do not call "will handle" for any following round trips of the same
transaction if block size * count > BCM_DMA_BLOCK_SIZE.
Manually check "left" in the DMA interrupt handler to see if we have at
least another full BCM_DMA_BLOCK_SIZE to handle.
Without this change we would DMA that and then even start a DMA with
left == 0 which would lead to a timeout and error.
Now we re-enable data interrupts and return and let the SDHCI generic
interrupt handler and state machine pick the SPACE_AVAIL up and then
find that it should punt to the pio_handler for the remaining bytes
or finish the data transaction.

With this change block mode seems to work beyond 7 * 64byte blocks,
which worked as it was below BCM_DMA_BLOCK_SIZE.

MFC after:		2 weeks
Differential Revision:	https://reviews.freebsd.org/D20199
2019-06-08 16:15:00 +00:00
Bjoern A. Zeeb
901491d025 bcm2835_sdhci.c: save block registers to avoid controller bug
Extending what the initial revision, r273264, r276985, r277346 have
started for the transfer mode and command registers, another pair of
16bit registers written in sequence are block size and block count,
which fall together onto the same 32bit line and hence the same
register(s) would be written twice in sequence for those as well.

Use a similar approach to transfer mode and command and save the writes
to either of the block regiters and then only execute a write once.
We can do this as with transfer mode their values are meaningless until
a command is issued so we can use that write to command as a trigger
to also write out the block registers.
Compared to transfer mode and command the value of block count can
change, so we need to keep state and actually read the block registers
back the first time after a write.

MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20197
2019-06-08 16:05:43 +00:00
Konstantin Belousov
99b81dcb9e Remove lazy FPU switch support from amd64.
It is incompatible with some future features.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-06-08 16:03:34 +00:00
Bjoern A. Zeeb
27d72fe14a Improve sdhci slot_printf() debug printing.
Currently slot_printf() uses two printf() calls to print the
device-slot name, and actual message. When other printf()s are
ongoing in parallel this can lead to interleaved message on the console,
which is especially unhelpful for debugging or error messages.

Take a hit on the stack and vsnprintf() the message to the buffer.
This way it can be printed along with the device-slot name in one go
avoiding console gibberish.

Reviewed by:	marius
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D19747
2019-06-08 15:24:03 +00:00
Bjoern A. Zeeb
6e40542a4e Introduce sim_dev and cam_sim_alloc_dev().
Add cam_sim_alloc_dev() as a wrapper to cam_sim_alloc() which takes
a device_t instead of the unit_number (which we can derive from the
dev again).

Add device_t sim_dev to struct cam_sim. It will be used to pass through
the bus for cases when both sides of CAM speak newbus already and we want
to link them (yet make the calls through CAM for now).

SDIO will be the first consumer of this. For that make use of
cam_sim_alloc_dev() in sdhci under MMCCAM.

This will also allow people to start iterating more on the idea
to newbus-ify CAM without changing 50+ device drivers from the start.
Also to be clear there are callers to cam_sim_alloc() which do not
have a device_t (e.g., XPT) or provide their own unit number so we cannot
simply switch the KPI entirely.

Submitted by:	kibab (original idea, see https://reviews.freebsd.org/D12467)
Reviewed by:	imp, chuck
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D19746
2019-06-08 15:19:50 +00:00
Konstantin Belousov
c46d985629 i386 trap.c: Remove unused MAX_TRAP_MSG define.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-06-08 13:41:39 +00:00
Konstantin Belousov
c7228026a8 amd64 trap.c: Modernize syntax around trap_msg[].
Convert the array to use C99 initializers.
Make it constant.
Replace MAX_TRAP_MSG with nitems().

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-06-08 13:40:57 +00:00
Justin Hibbits
988d63af1c powerpc/pmap: Move the SLB spill handlers to a better place
The SLB spill handlers are AIM-specific, and belong better with the rest of
the SLB code anyway.  No functional change.
2019-06-08 03:07:08 +00:00
Justin Hibbits
b7918b86b3 powerpc/aim: Use nitems() for calculating size of phys_avail in AIM pmaps
Same thing was already done in r347164 for Book-E pmap.
2019-06-08 02:36:07 +00:00
John Baldwin
5f37b74d5d Fix debug trace after removal of pdu_overhead.
MFC after:	1 week
Sponsored by:	Chelsio Communications
2019-06-07 21:30:11 +00:00
Alexander Motin
35251e9c28 Fix comparison signedness in arc_is_overflowing().
When ARC size is very small, aggsum_lower_bound(&arc_size) may return
negative values, that due to unsigned comparison caused delays, waiting
for arc_adjust() to "fix" it by calling aggsum_value(&arc_size).  Use
of signed comparison there fixes the problem.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2019-06-07 20:59:24 +00:00
Alexander Motin
61586dd647 Explicitly start ARC adjustment on limits change.
While formally it is not necessary, but the sooner it start, the sooner it
finish, and supposedly less disturbing for workload it will be.

MFC after:	2 weeks
2019-06-07 19:03:17 +00:00
Chuck Tuffli
b1f1471064 Fix nda(4) PCIe link status output
Differentiate between PCI Express Endpoint devices and Root Complex
Integrated Endpoints in the nda driver. The Link Status and Capability
registers are not valid for Integrated Endpoints and should not be
displayed. The bhyve emulated NVMe device will advertise as being an
Integrated Endpoint.

Reviewed by:	imp
Approved byL	imp (mentor)
Differential Revision: https://reviews.freebsd.org/D20282
2019-06-07 18:34:48 +00:00
Mark Johnston
88ea538a98 Replace uses of vm_page_unwire(m, PQ_NONE) with vm_page_unwire_noq(m).
These calls are not the same in general: the former will dequeue the
page if it is enqueued, while the latter will just leave it alone.  But,
all existing uses of the former apply to unmanaged pages, which are
never enqueued in the first place.  No functional change intended.

Reviewed by:	kib
MFC after:	1 week
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D20470
2019-06-07 18:23:29 +00:00
Leandro Lupori
b934fc7468 [PPC64] Support QEMU/KVM pseries without hugepages
This set of changes make it possible to run FreeBSD for PowerPC64/pseries,
under QEMU/KVM, without requiring the host to make hugepages available to the
guest.

While there was already this possibility, by means of setting hw_direct_map to
0, on PowerPC64 there were a couple of issues/wrong assumptions that prevented
this from working, before this changelist.

Reviewed by:	jhibbits
Differential Revision:	https://reviews.freebsd.org/D20522
2019-06-07 17:58:59 +00:00
Christian S.J. Peron
ca3075599a Teach readelf about some OpenBSD ELF program headers
- Add constants for OpenBSD wxneeded, bootdata and randomize to the
  FreeBSD elf_common.h file. This is the file that gets used by the
  elftoolchain library.
- Update readelf and elfdump utilities to decode these program headers
  if they are encountered.

Note: FreeBSD has it's own version of elfdump(1), which will be updated
in a subsequent commit. I am adding it here anyway because this diff is
going to be submitted upstream.

Discussed with:	emaste
Reviewed by:	imp
MFC afer:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20548

M    contrib/elftoolchain/elfdump/elfdump.c
M    contrib/elftoolchain/readelf/readelf.c
M    sys/sys/elf_common.h
2019-06-07 14:51:55 +00:00
Andrey V. Elsukov
cb1ee1bf9b Use underscores for internal variable name to avoid conflicts.
MFC after:	1 week
2019-06-07 08:30:35 +00:00
Andriy Gapon
d8b12a2162 Restore ARC MFU/MRU pressure
Before r305323 (MFV r302991: 6950 ARC should cache compressed data)
arc_read() code did this for access to a ghost buffer:
 arc_adapt() (from arc_get_data_buf())
 arc_access(hdr, hash_lock)
I.e., we first checked access to the MFU ghost/MRU ghost buffer and
adapt MFU/MRU sizes (in arc_adapt()) and next move buffer from the ghost
state to regular.

After r305323 the sequence is different:
 arc_access(hdr, hash_lock);
 arc_hdr_alloc_pabd(hdr);
I.e., we first move the buffer from the ghost state in arc_access() and
then we check access to buffer in ghost state (in arc_hdr_alloc_pabd()
-> arc_get_data_abd() -> arc_get_data_impl() -> arc_adapt()).  This is
incorrect: arc_adapt() never see access to the ghost buffer because
arc_access() already migrated the buffer from the ghost state to
regular.

So, the fix is to restore a call to arc_adapt() before arc_access() and
to suppress the call to arc_adapt() after arc_access().

Submitted by:	Slawa Olhovchenkov <slw@zxy.spb.ru>
MFC after:	2 weeks
Sponsored by:	Integros [integros.com]
Differential Revision: https://reviews.freebsd.org/D19094
2019-06-07 06:35:42 +00:00
Navdeep Parhar
27c3a85d07 cxgbe(4): Rename the DDP sysctl to rx_zcopy to match the tx_zcopy sysctl
and update its description.  The old name continues to work for now.

Sponsored by:	Chelsio Communications
2019-06-07 05:03:03 +00:00
Ryan Libby
3cf556f05e Allow fail points to have separate declarations, definitions, and evals
Submitted by:	Matthew Bryan <matthew.bryan@isilon.com>
Reviewed by:	cem
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D20546
2019-06-07 04:09:12 +00:00
Alexander Motin
3b2f2cb8e9 Allow UMA hash tables to expand faster then 2x in 20 seconds.
ZFS ABD allocates tons of 4KB chunks via UMA, requiring huge hash tables.
With initial hash table size of only 32 elements it takes ~20 expansions
or ~400 seconds to adapt to handling 220GB ZFS ARC.  During that time not
only the hash table is highly inefficient, but also each of those expan-
sions takes significant time with the lock held, blocking operation.

On my test system with 256GB of RAM and ZFS pool of 28 HDDs this change
reduces time needed to first time read 240GB from ~300-400s, during which
system is quite busy and unresponsive, to only ~150s with light CPU load
and just 5 sub-second CPU spikes to expand the hash table.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2019-06-06 23:57:28 +00:00
Luiz Otavio O Souza
5429f5f309 Do not overwrite the RGMII bits in the CPU port register of Switch.
Fixes the network on Espressobin.

The GENERIC kernel now boots over NFS.

Sponsored by:	Rubicon Communications, LLC (Netgate)
2019-06-06 21:25:46 +00:00
Luiz Otavio O Souza
e5b6bcc7d2 Zero the GPIO regulator pins memory.
This fixes a panic in Espressobin when gpioregulator fails to allocate the
GPIO pin (the GPIO controller is not there).

Sponsored by:	Rubicon Communications, LLC (Netgate)
2019-06-06 20:54:09 +00:00
D Scott Phillips
806adc6c00 nvdimm: Provide nvdimm location information
Provide the acpi handle path as the location string for the nvdimm
children of the nvdimm_root device.

Reviewed by:	kib
Approved by:	jhb (mentor)
MFC after:	1 week
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D20528
2019-06-06 20:12:04 +00:00
Mark Johnston
ce7fb386d8 Restore the comment removed in r348745.
LAGG_RLOCK() enters an epoch section, so the comment wasn't stale.

Reported by:	jhb
MFC with:	r348745
2019-06-06 17:20:35 +00:00
Doug Moore
f96e8a0bab The means of finding ranges of free pages was changed for
vm_reserv_break in r348484, and there was found to improve performance
minutely and reduce code size. This change applies a similar change to
vm_reserv_reclaim_config, expecting similar benefits. This change also
allows quick rejection of page ranges that are unsuitable on account
of alignment or boundary issues, where those issues are processed a
page at a time in the current implementation.  For contrived test
cases, this can make finding a reservation satisfying a major
alignment requirement around 30 times faster.

Tested by: pho
Approved by: markj (mentor)
Differential Revision: https://reviews.freebsd.org/D20274
2019-06-06 16:28:34 +00:00
Mark Johnston
fbd9585915 Add sysctls for uma_kmem_{limit,total}.
Reviewed by:	alc, dougm, kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20514
2019-06-06 16:26:58 +00:00
Mark Johnston
058f0f7464 Remove the volatile qualifer from uma_kmem_total.
No functional change intended.

Reviewed by:	alc, dougm, kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20514
2019-06-06 16:23:44 +00:00
Mark Johnston
9995dfd364 Conditionalize an in_epoch() call on INVARIANTS.
Its result is only used to determine whether to perform further
INVARIANTS-only checks.  Remove a stale comment while here.

Submitted by:	Sebastian Huber <sebastian.huber@embedded-brains.de>
MFC after:	1 week
2019-06-06 16:22:29 +00:00
Mark Johnston
1ef5e651fd Make the linuxkpi's alloc_pages() consistently return wired pages.
Previously it did this only on platforms without a direct map.  This
also more closely matches Linux's semantics.

Since some DRM v5.0 code assumes the old behaviour, use a
LINUXKPI_VERSION guard to preserve that until the out-of-tree module
is updated.

Reviewed by:	hselasky, kib (earlier versions), johalun
MFC after:	1 week
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D20502
2019-06-06 16:09:19 +00:00
Mark Johnston
c080655467 Fix a race between fasttrap and the user breakpoint handler.
When disabling the last enabled userspace probe, fasttrap clears the
function pointers which hook in to the breakpoint handler.  If a traced
thread hit a fasttrap breakpoint before it was removed, we must ensure
that it is able to call the hook; otherwise fasttrap will not consume
the trap and SIGTRAP will be delievered to the thread.  Synchronize
with such threads by ensuring that they load the hook pointer with
interrupts disabled, and by completing an SMP rendezvous after removing
breakpoints and before clearing the pointers.

Reported by:	Alexander Alexeev <Alexander.Alexeev@dell.com>
Tested by:	Alexander Alexeev (earlier version)
Reviewed by:	cem, kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20526
2019-06-06 16:03:25 +00:00
Ian Lepore
3aad8ca854 For armv6 and armv7, build hwpmc_armv7.c as well as the base hwpmc_arm.c.
Submitted by:	Arnaud YSMAL <arnaud.ysmal@stormshield.eu>
2019-06-06 15:21:36 +00:00
Ian Lepore
fbc27301ba Don't refer to the cpu variable in a KASSERT before initializing it. 2019-06-06 15:18:23 +00:00
Alan Somers
46f8169aea Add a testing facility to manually reclaim a vnode
Add the debug.try_reclaim_vnode sysctl. When a pathname is written to it, it
will be reclaimed, as long as it isn't already or doomed. The purpose is to
gain test coverage for vnode reclamation, which is otherwise hard to
achieve.

Add the debug.ftry_reclaim_vnode sysctl.  It does the same thing, except
that its argument is a file descriptor instead of a pathname.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20519
2019-06-06 15:04:50 +00:00
Michael Tuexen
d1156b0505 r347382 added receiver side DSACK support for the TCP base stack.
The corresponding changes for the RACK stack where missed and are added
by this commit.

Reviewed by:		Richard Scheffenegger, rrs@
MFC after:		3 days
Differential Revision:	https://reviews.freebsd.org/D20372
2019-06-06 07:49:03 +00:00
Cy Schubert
121d6c186b Whitespace adjustment.
MFC after:	3 days
2019-06-06 03:02:25 +00:00
Mariusz Zaborski
1808673cc4 geli: build warning fixes
Submitted by:	Aaron Prieger <aprieger@llnw.com>
Reviewed by:	sbruno
Differential Revision:	https://reviews.freebsd.org/D11068
2019-06-05 22:46:18 +00:00
Mariusz Zaborski
8da024d941 dtrace: 64-bits registers support
The registers in ilumos and FreeBSD have a different number.
In the illumos, last 32-bits register defined is SS an in FreeBSD is GS.
This off-by-one caused the uregs array to returns the wrong 64-bits register
on amd64.

Reviewed by:	markj
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20363
2019-06-05 22:29:05 +00:00
Konstantin Belousov
32d2014dde In vm_map_entry_set_vnode_text(), tolerate tmpfs mappings for which
vnode is no longer resident.

Mapping of tmpfs file does not bump use count on the vnode, because
backing object has swap type.  As result, even during normal
operations, and of course on forced unmount, we might end up with text
mapping from tmpfs node which has no vnode in memory.  In this case,
there is no v_writecount to clear (this was done during reclaim), and
no reason to assert that the vnode is present.

Restructure the code to silently ignore OBJ_SWAP objects with
OBJ_TMPFS_NODE flag set, but OBJ_TMPFS flag clear.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-06-05 20:21:17 +00:00
Konstantin Belousov
3c93d22758 Manually clear text references on reclaim for nullfs and tmpfs.
Both filesystems do no use vnode_pager_dealloc() which would handle
this case otherwise.  Nullfs because vnode vm_object handle never
points to nullfs vnode.  Tmpfs because its vm_object is never vnode
object at all.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-06-05 20:16:25 +00:00
John Baldwin
0d1fd6e541 Support MSI-X for passthrough devices with a separate PBA BAR.
pci_alloc_msix() requires both the table and PBA BARs to be allocated
by the driver.  ppt was only allocating the table BAR so would fail
for devices with the PBA in a separate BAR.  Fix this by allocating
the PBA BAR before pci_alloc_msix() if it is stored in a separate BAR.

While here, release BARs after calling pci_release_msi() instead of
before.  Also, don't call bus_teardown_intr() in error handling code
if bus_setup_intr() has just failed.

Reported by:	gallatin
Tested by:	gallatin
Reviewed by:	rgrimes, markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D20525
2019-06-05 19:30:32 +00:00
Andriy Gapon
2b1d0ab2f6 first step towards enforcing must-succeed semantics for bus accessors
Unlike BUS_READ_IVAR / BUS_WRITE_IVAR, bus accessors do not have a
return code.  It is assumed that there is a tight coupling between a bus
driver and a driver for a device on the bus with respect to instance
variables that the bus defines for its children.  So, the driver is
supposed to have only valid accesses to the variables and, thus, the
accessors must always succeed.

Of course, programming errors sometimes happen.  At present, such errors
go completely unnoticed.  The idea of this change is to start catching
them.  As a first step, there will be a warning about a failed accessor
call.  This is to give developers a heads-up.  I plan to replace the
printf with a KASSERT a week later, so that the warning is harder to
ignore.

Reviewed by:	cem, imp, ian
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D20458
2019-06-05 13:18:00 +00:00
Tycho Nightingale
56db4ebd34 another occurrence where a very large dma mapping can cause integer overflow
Submitted by:	rlibby
Sponsored by:	Dell EMC Isilon
2019-06-05 13:08:21 +00:00
Andrey V. Elsukov
efdadaa2d8 Initialize V_nat64out methods explicitly.
It looks like initialization of static variable doesn't work for
VIMAGE and this leads to panic.

Reported by:	olivier
MFC after:	1 week
2019-06-05 09:25:40 +00:00
Colin Percival
e0235fd34a Only respond to the PCIe Attention Button if a device is already plugged in.
Prior to this commit, if PCIEM_SLOT_STA_ABP and PCIEM_SLOT_STA_PDC are
asserted simultaneously, FreeBSD sets a 5 second "hardware going away" timer
and then processes the "presence detect" change. In the (physically
challenging) case that someone presses the "attention button" and inserts
a new PCIe device at exactly the same moment, this results in FreeBSD
recognizing that the device is present, attaching it, and then detaching it
5 seconds later.

On EC2 "bare metal" hardware this is the precise sequence of events which
takes place when a new EBS volume is attached; virtual machines have no
difficulty effecting physically implausible simultaneity.

This patch changes the handling of PCIEM_SLOT_STA_ABP to only detach a
device if the presence of a device was detected *before* the interrupt
which reports the Attention Button push.

Reported by:	Matt Wilson
Reviewed by:	jhb
MFC after:	1 week
Sponsored by:	https://www.patreon.com/cperciva
Differential Revision:	https://reviews.freebsd.org/D20499
2019-06-05 04:58:42 +00:00
Cy Schubert
37dbd136c3 While working on a PR, more are discovered.
Remove more #ifdefs missed in r343701.

MFC after:	1 week
2019-06-04 19:37:51 +00:00
Cy Schubert
e5492b8bc4 Clean up #ifdefs from old unsupported releases of FreeBSD.
MFC after:	1 week
2019-06-04 19:25:32 +00:00
Mark Johnston
2d2748710a Remove an outdated header comment for vm_page.c.
The listed rules were incomplete and outdated.  There is a much more
comprehensive comment in vm_page.h.

Reviewed by:	alc, kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D20503
2019-06-04 18:38:27 +00:00
Hans Petter Selasky
9ccaf2215a In usb(4) fix a lost completion event issue towards libusb(3). It may happen
if a USB transfer is cancelled that we need to fake a completion event.
Implement missing support in ugen_fs_copy_out() to handle this.

This fixes issues with webcamd(8) and firefox.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-06-04 16:40:18 +00:00
Alan Cox
6f1412f800 The changes to pmap_demote_pde_locked()'s control flow in r348476 resulted
in the loss of a KASSERT that guarded against the invalidation a wired
mapping.  Restore this KASSERT.

Remove an unnecessary KASSERT from pmap_demote_pde_locked().  It guards
against a state that was already handled at the start of the function.

Reviewed by:	kib
X-MFC with:	r348476
2019-06-04 16:21:14 +00:00
Ed Maste
b734222edd elf_common: add GNU note types and NT_GNU_PROPERTY_TYPE_0 bits
To support Intel CET IBT/Shadow Stack.

Reviewed by:	markj
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2019-06-04 15:44:31 +00:00
Ed Maste
004caac2d8 style(9) / tidying for r348611
MFC with:	r348611
Event:		Waterloo Hackathon 2019
2019-06-04 13:45:30 +00:00
Ed Maste
74cd06b42e Expose the kernel's build-ID through sysctl
After our migration (of certain architectures) to lld the kernel is built
with a unique build-ID.  Make it available via a sysctl and uname(1) to
allow the user to identify their running kernel.

Submitted by:	Ali Mashtizadeh <ali_mashtizadeh.com>
MFC after:	2 weeks
Relnotes:	Yes
Event:		Waterloo Hackathon 2019
Differential Revision:	https://reviews.freebsd.org/D20326
2019-06-04 13:07:10 +00:00
Hans Petter Selasky
253c93f26b In xhci(4) there is no stream ID in the completion TRB.
Instead iterate all the stream IDs in stream mode to find
the matching USB transfer.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-06-04 09:01:02 +00:00
Hans Petter Selasky
76a3555808 Make sure the DMA tags get freed in mlx5en(4).
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-06-04 08:06:51 +00:00
Slava Shwartsman
bb43866c38 Fix prio vs. nonprio tagged traffic in RDMACM
In current RDMACM implementation RDMACM server will not find a GID
index when the request was prio-tagged and the sever is non
prio-tagged and vise-versa.
According to 802.1Q-2014, VLAN tagged packets with VLAN id 0 should
be considered as untagged. Treat RDMACM request the same.

Reviewed by:    hselasky, kib
MFC after:      3 Days
Sponsored by:   Mellanox Technologies
2019-06-04 06:21:31 +00:00
Conrad Meyer
0f6040f03e virtio(4): Add PNP match metadata for virtio devices
Register MODULE_PNP_INFO for virtio devices using the newbus PNP information
provided by the previous commit.  Matching can be quite simple; existing
probe routines only matched on bus (implicit) and device_type.  The same
matching criteria are retained exactly, but is now also available to
devmatch(8).

Reviewed by:	bryanv, markj; imp (earlier version)
Differential Revision:	https://reviews.freebsd.org/D20407
2019-06-04 02:37:11 +00:00
Conrad Meyer
dfca0a8b3d virtio(4): Expose PNP metadata through newbus
Expose the same fields and widths from both vtio buses, even though they
don't quite line up; several virtio drivers can attach to both buses,
and sharing a PNP info table for both seems more convenient.

In practice, I doubt any virtio driver really needs to match on anything
other than bus and device_type (eliminating the unused entries for
vtmmio), and also in practice device_type is << 2^16 (so far, values
range from 1 to 20).  So it might be fine to only expose a 16-bit
device_type for PNP purposes.  On the other hand, I don't see much harm
in overkill here.

Reviewed by:	bryanv, markj (earlier version)
Differential Revision:	https://reviews.freebsd.org/D20406
2019-06-04 02:34:59 +00:00
Conrad Meyer
ad5979f7da virtio_random(4): Fix random(4) integration
random(4) masks unregistered entropy sources.  Prior to this revision,
virtio_random(4) did not correctly register a random_source and did not
function as a source of entropy.

Random source registration for loadable pure sources requires registering a
poll callback, which is invoked periodically by random(4)'s harvestq
kthread.  The periodic poll makes virtio_random(4)'s periodic entropy
collection redundant, so this revision removes the callout.

The current random source API is somewhat limiting, so simply fail to attach
any virtio_random devices if one is already registered as a source.  This
scenario is expected to be uncommon.

While here, handle the possibility of short reads from the hypervisor random
device gracefully / correctly.  It is not clear why a hypervisor would
return a short read or if it is allowed by spec, but we may as well handle
it.

Reviewed by:	bryanv (earlier version), markm
Security:	yes (note: many other "pure" random sources remain broken)
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D20419
2019-06-04 00:01:37 +00:00
Alexander Motin
0b5319dda0 MFV r348585: 9683 Allow bypassing devid in vdev_disk_open()
illumos/illumos-gate@6fe4f3002c

Reviewed by: Sara Hartse <sara.hartse@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author:     Pavel Zakharov <pavel.zakharov@delphix.com>

This is irrelevant to FreeBSD, just to reduce divergence.
2019-06-03 20:55:52 +00:00
Alexander Motin
9b048dd219 MFV r348583: 9847 leaking dd_clones (DMU_OT_DSL_CLONES) objects
illumos/illumos-gate@17fb938fd6

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author:     Matthew Ahrens <mahrens@delphix.com>
2019-06-03 20:49:20 +00:00
Alexander Motin
07a5c938c9 MFV r348578: 9962 zil_commit should omit cache thrash
illumos/illumos-gate@cab3a55e15

Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed by: Patrick Mooney <patrick.mooney@joyent.com>
Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com>
Approved by: Joshua M. Clulow <josh@sysmgr.org>
Author:     Prakash Surya <prakash.surya@delphix.com>
2019-06-03 20:24:40 +00:00
Alexander Motin
a66a7143d4 MFV r348576: 9963 Seperate tunable for disabling ZIL vdev flush
illumos/illumos-gate@f8fdf68125

Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed by: Patrick Mooney <patrick.mooney@joyent.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author:     Prakash Surya <prakash.surya@delphix.com>
2019-06-03 20:05:43 +00:00
Cy Schubert
de982ef60d Properly define the fourth argument to ipf_check, the main entry point
into ipfilter. A proper definition simplifies dtrace scripts a little.

MFC after:	1 week
2019-06-03 19:37:14 +00:00
Alexander Motin
c9719c9a6d MFV r348573: 9993 zil writes can get delayed in zio pipeline
illumos/illumos-gate@2258ad0b75

Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed by: Matt Ahrens <matt@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author:     George Wilson <george.wilson@delphix.com>
2019-06-03 19:25:53 +00:00
Tycho Nightingale
88e9fbe568 very large dma mappings can cause integer overflow
Reviewed by:	kib
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D20505
2019-06-03 19:19:35 +00:00
Alexander Motin
4d6afba5e0 MFV r348555: 9690 metaslab of vdev with no space maps was flushed during removal
illumos/illumos-gate@4e75ba6826

Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author:     Serapheim Dimitropoulos <serapheim@delphix.com>
2019-06-03 19:03:24 +00:00
Alexander Motin
1b61262505 MFC r348554: 9688 aggsum_fini leaks memory
illumos/illumos-gate@29bf2d68be

Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed by: Jorgen Lundman <lundman@lundman.net>
Reviewed by: Igor Kozhukhov <igor@dilos.org>
Approved by: Robert Mustacchi <rm@joyent.com>
Author:     Paul Dagnelie <pcd@delphix.com>
2019-06-03 19:00:24 +00:00
Alexander Motin
677ef2563d MFV r348553: 9681 ztest failure in spa_history_log_internal due to spa_rename()
illumos/illumos-gate@6aee0ad769

Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author:     Matthew Ahrens <mahrens@delphix.com>
2019-06-03 18:32:56 +00:00
Alexander Motin
74f7070445 MFV r348552: 9682 page fault in dsl_async_clone_destroy() while opening pool
illumos/illumos-gate@ade2c82828

Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: Sara Hartse <sara.hartse@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author:     Serapheim Dimitropoulos <serapheim@delphix.com>
2019-06-03 17:56:44 +00:00
Alexander Motin
2eff60e998 MFV r348551: 9862 fix typo in comment in vdev_impl.h
illumos/illumos-gate@84927f52bd

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Robert Mustacchi <rm@joyent.com>
Author:     Allan Jude <allanjude@freebsd.org>
2019-06-03 17:44:47 +00:00
Alexander Motin
bd2ae688a4 MFV r348550: 1700 Add SCSI UNMAP support
illumos/illumos-gate@047c81d31d

Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com>
Reviewed by: Garrett D'Amore <garrett@damore.org>
Reviewed by: Igor Kozhukhov <igor@dilos.org>
Approved by: Dan McDonald <danmcd@joyent.com>
Author:     Saso Kiselkov <saso.kiselkov@nexenta.com>

This is irrelevant to FreeBSD, just a diff reduction.
2019-06-03 17:43:32 +00:00
Alexander Motin
d40f6a585a MFV r348548: 9617 too-frequent TXG sync causes excessive write inflation
illumos/illumos-gate@7928f4baf4

Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author:     Matthew Ahrens <mahrens@delphix.com>
2019-06-03 17:40:11 +00:00
Alexander Motin
b3ed2d08e4 MFV r348537: 8601 memory leak in get_special_prop()
illumos/illumos-gate@e19b450bec

Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Reviewed by: Sara Hartse <sara.hartse@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: Robert Mustacchi <rm@joyent.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author:     John Gallagher <john.gallagher@delphix.com>
2019-06-03 17:29:57 +00:00
Alexander Motin
c066dcc074 MFV r348535: 9677 panic from zio_write_gang_block() when creating dump device on fragmented rpool
illumos/illumos-gate@7341a7de4f

Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Prashanth Sreenivasa <pks@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author:     Brad Lewis <brad.lewis@delphix.com>
2019-06-03 17:27:25 +00:00
Konstantin Belousov
d852f79b23 hwpmc_intel: List all Silvermont ids.
PR:	238310
Based on submission by:	Masse Nicolas <nicolas.masse@stormshield.eu>
MFC after:	1 week
2019-06-03 16:21:09 +00:00
John Baldwin
e8d8cc0139 Warn about deprecated features on all major OS versions.
Reviewed by:	imp
MFC after:	3 days
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D20490
2019-06-03 15:43:40 +00:00
Konstantin Belousov
b5c45a3e12 efirt efi_enter(): Release acquired locks and restore FPU ownership if
efi_arch_enter() returned an error.

Submitted:	Jan Martin Mikkelsen <janm@transactionware.com>
MFC after:	1 week
2019-06-03 15:41:45 +00:00
Konstantin Belousov
185f7e0a9d amd64 ef_rt_arch_call: Preserve %rflags around call into EFI RT service.
If service code faulted, we might end up unwinding with interrupts
disabled.  Top-level kernel code should have interrupts enabled, which
is enforced by checks.

Save %rflags before entering EFI, and restore to the known good value
on return.  This handles situation with disabled interrupts on fault
and perhaps other potential bugs, e.g. invalid value for PSL_D.

Reported and tested by:	Jan Martin Mikkelsen <janm@transactionware.com>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-06-03 15:32:42 +00:00
Konstantin Belousov
f6e5ddff6b Remove dead check.
We already handled the case when symstrindex < 0 at line 680.

Reported by:	danfe using PVS-studio
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-06-03 15:23:37 +00:00
Konstantin Belousov
21d7728498 Remove dead store.
sw_flags is set to the function argument several lines later.

Reported by:	danfe using PVS-studio
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-06-03 15:19:11 +00:00
Vladimir Kondratyev
3b11e3b6e1 psm(4): Add natural scrolling support to sysmouse protocol
This change enables natural scrolling with two finger scroll enabled
and when user is using a trackpad (mouse and trackpoint are not affected).
Depending on trackpad model it can be activated with setting of
hw.psm.synaptics.natural_scroll or hw.psm.elantech.natural_scroll sysctl
values to 1.

Evdev protocol is not affected by this change too. Tune userland client
e.g. libinput to enable natural scrolling in that case.

Submitted by:	nyan_myuji.xyz
Reviewed by:	wulf
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20447
2019-06-03 10:04:34 +00:00
Alan Cox
2d5039db18 Retire vm_reserv_extend_{contig,page}(). These functions were introduced
as part of a false start toward fine-grained reservation locking.  In the
end, they were not needed, so eliminate them.

Order the parameters to vm_reserv_alloc_{contig,page}() consistently with
the vm_page functions that call them.

Update the comments about the locking requirements for
vm_reserv_alloc_{contig,page}().  They no longer require a free page
queues lock.

Wrap several lines that became too long after the "req" and "domain"
parameters were added to vm_reserv_alloc_{contig,page}().

Reviewed by:	kib, markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D20492
2019-06-03 05:15:36 +00:00
Maxim Sobolev
5ec57af4b2 Fix several places where tool name has been hardcoded:
install -> ${INSTALL}
    mtree -> ${MTREE_CMD}
    services_mkdb -> ${SERVICES_MKDB_CMD}
    cap_mkdb -> ${CAP_MKDB_CMD}
    pwd_mkdb -> ${PWD_MKDB_CMD}
    kldxref -> ${KLDXREF_CMD}

If you do custom FreeBSD builds you may want to override those
in some cases.

Sponsored by:	Sippy Software, Inc.
2019-06-02 23:38:19 +00:00
Vladimir Kondratyev
9a554d090c psm(4): Add Elantech touchpad IC type 15 found on Thinkpad L480 laptops
PR:		238291
Submitted by:	Andrey Kosachenko <andrey.kosachenko@gmail.com>
MFC after:	2 weeks
2019-06-02 22:27:26 +00:00
Mark Johnston
d842aa5114 Add a vm_page_wired() predicate.
Use it instead of accessing the wire_count field directly.  No
functional change intended.

Reviewed by:	alc, kib
MFC after:	1 week
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D20485
2019-06-02 01:00:17 +00:00
Ed Maste
0ca3c38188 octusb: fix detach loop over USB ports
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2019-06-01 18:19:16 +00:00
Warner Losh
d0aaeffdb4 Since a fatal trap can happen at aribtrary times, don't panic when the
completions are not in a consistent state. Cope with the different
places the normal I/O completion polling thread can be interrupted and
then re-entered during a kernel panic + dump.

Reviewed by: jhb and markj (both prior versions)
Differential Revision:  https://reviews.freebsd.org/D20478
2019-06-01 15:37:44 +00:00
Bjoern A. Zeeb
eafaa1bc35 After parts of the locking fixes in r346595, syzkaller found
another one in udp_output(). This one is a race condition.
We do check on the laddr and lport without holding a lock in
order to determine whether we want a read or a write lock
(this is in the "sendto/sendmsg" cases where addr (sin) is given).

Instrumenting the kernel showed that after taking the lock, we
had bound to a local port from a parallel thread on the same socket.

If we find that case, unlock, and retry again. Taking the write
lock would not be a problem in first place (apart from killing some
parallelism). However the retry is needed as later on based on
similar condition checks we do acquire the pcbinfo lock and if the
conditions have changed, we might find ourselves with a lock
inconsistency, hence at the end of the function when trying to
unlock, hitting the KASSERT.

Reported by:	syzbot+bdf4caa36f3ceeac198f@syzkaller.appspotmail.com
Reviewed by:	markj
MFC after:	6 weeks
Event:		Waterloo Hackathon 2019
2019-06-01 14:57:42 +00:00
Bjoern A. Zeeb
8adf420203 Improve error/debug messages in sdhci.c
When starting a command also print the opcode and flags.
More consitently print flags as hex.
Use slot_printf rather than printf in one case.

MFC after:		6 weeks
Reviewed by:		marius, kibab, imp
Sponsored by:		The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D19748
2019-06-01 14:39:12 +00:00
Navdeep Parhar
ebb8639822 cxgbe/t4_tom: adjust the hardware receive window to match changes to the
receive sockbuf's high water mark.

Calculate rx credits on the spot instead of tracking sbused/sb_cc and
rx_credits in the toepcb.  The previous method worked when the high
water mark changed due to SB_AUTOSIZE but not when it was adjusted
directly (for example, by the soreserve in nfsrvd_addsock).

This fixes a connection hang while running iozone over an NFS mounted
share where nfsd's TCP sockets are being handled by t4_tom.

MFC after:	3 days
Sponsored by:	Chelsio Communications
2019-06-01 03:03:48 +00:00
Justin Hibbits
4420fc895f powerpc/moea: Fix moea64 native VA invalidation
Summary:
moea64_insert_pteg_native()'s invalidation only works by happenstance.
The purpose of the shifts and XORs is to extract the VSID in order to
reverse-engineer the lower bits of the VPN.  Currently a segment size is 256MB
(2**28), and ADDR_API_SHFT64 is 16, so ADDR_PIDX_SHIFT is equivalent.  However,
it's semantically incorrect, in that we don't want to shift by the page shift
size, we want to shift to get to the VSID.

Tested by:	bdragon
Differential Revision: https://reviews.freebsd.org/D20467
2019-06-01 01:40:14 +00:00
Conrad Meyer
5ca5dfe938 random(4): Fix RANDOM_LOADABLE build
I introduced an obvious compiler error in r346282, so this change fixes
that.

Unfortunately, RANDOM_LOADABLE isn't covered by our existing tinderbox, and
it seems like there were existing latent linking problems.  I believe these
were introduced on accident in r338324 during reduction of the boolean
expression(s) adjacent to randomdev.c and hash.c.  It seems the
RANDOM_LOADABLE build breakage has gone unnoticed for nine months.

This change correctly annotates randomdev.c and hash.c with !random_loadable
to match the pre-r338324 logic; and additionally updates the HWRNG drivers
in MD 'files.*', which depend on random_device symbols, with
!random_loadable (it is invalid for the kernel to depend on symbols from a
module).

(The expression for both randomdev.c and hash.c was the same, prior to
r338324: "optional random random_yarrow | random !random_yarrow
!random_loadable".  I.e., "random && (yarrow || !loadable)."  When Yarrow
was removed ("yarrow := False"), the expression was incorrectly reduced to
"optional random" when it should have retained "random && !loadable".)

Additionally, I discovered that virtio_random was missing a MODULE_DEPEND on
random_device, which breaks kld load/link of the driver on RANDOM_LOADABLE
kernels.  Address that issue as well.

PR:		238223
Reported by:	Eir Nym <eirnym AT gmail.com>
Reviewed by:	delphij, markm
Approved by:	secteam(delphij)
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D20466
2019-06-01 01:22:21 +00:00
Warner Losh
89afd39c2c Defer evaluation of modified until after it's set
With the reorg r348175, we now look at modified before it is
set. Rearrange things so that we can set include_metadata to either
yes, no or if-modified. This should fix the -R flag that was broken in
r348175, which broke WITH_REPRODUCIBLE_BUILD for kernels.

Feedback From: emaste@
Differential Revision: https://reviews.freebsd.org/D20480
2019-05-31 22:57:20 +00:00
Doug Moore
b8590dae50 The function vm_phys_free_contig invokes vm_phys_free_pages for every
power-of-two page block it frees, launching an unsuccessful search for
a buddy to pair up with each time.  The only possible buddy-up mergers
are across the boundaries of the freed region, so change
vm_phys_free_contig simply to enqueue the freed interior blocks, via a
new function vm_phys_enqueue_contig, and then call vm_phys_free_pages
on the bounding blocks to create as big a cross-boundary block as
possible after buddy-merging.

The only callers of vm_phys_free_contig at the moment call it in
situations where merging blocks across the boundary is clearly
impossible, so just call vm_phys_enqueue_contig in those places and
avoid trying to buddy-up at all.

One beneficiary of this change is in breaking reservations.  For the
case where memory is freed in breaking a reservation with only the
first and last pages allocated, the number of cycles consumed by the
operation drops about 11% with this change.

Suggested by: alc
Reviewed by: alc
Approved by: kib, markj (mentors)
Differential Revision: https://reviews.freebsd.org/D16901
2019-05-31 21:02:42 +00:00
Konstantin Belousov
85d76c38ba Simplify flow of pmap_demote_pde_locked() and add more comprehensive
debugging checks.

In particular,
- Move the code to handle failure to allocate page table page into
  a helper.
- After the previous item is done, it is possible to distinguish !PG_A
  case and case of missed page, in the control flow.
- Make the variable to indicate that in-kernel mapping is demoted.
- Assert that missed page table page can only happen for in-kernel
  mapping when demoting direct map.
- If DIAGNOSTIC is enabled, and the page table page should be already
  filled, check all ptes instead of only first one.

Reviewed by:	alc, markj
Tested by:	pho (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D20266
2019-05-31 18:53:04 +00:00
Mark Johnston
8726929d67 netdump: Buffer pages to avoid calling netdump_send() on each 4KB write.
netdump waits for acknowledgement from the server for each write.  When
dumping page table pages, we perform many small writes, limiting
throughput.  Use the netdump client's buffer to buffer small contiguous
writes before calling netdump_send() to flush the MAXDUMPPGS-sized
buffer.  This results in a significant reduction in the time taken to
complete a netdump.

Submitted by:	Sam Gwydir <sam@samgwydir.com>
Reviewed by:	cem
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20317
2019-05-31 18:29:12 +00:00
Mark Johnston
8e9105dbae acpi_dock(4): Notify devd(8) on dock status change.
PR:		238138
Submitted by:	Muhammad Kaisar Arkhan <hi@yukiisbo.red>
MFC after:	2 weeks
2019-05-31 15:44:33 +00:00
Mark Johnston
42447bb506 Remove a redundant vm_page_remove() call.
vm_page_free_prep() removes the page from its object.  No functional
change intended.

Reviewed by:	alc, kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D20469
2019-05-31 14:59:40 +00:00
Ed Maste
1eb8dfc67a newvers.sh correct typo from r348175 2019-05-31 13:54:01 +00:00
Rick Macklem
6aab442af9 Get rid of extraneous initialization.
Get rid of an extraneous initialization, mainly to keep a static analyser
happy. No semantic change.

PR:		238167
Submitted by:	Alexey Dokuchaev
2019-05-31 03:13:09 +00:00
Rick Macklem
26fd36b29d Clean up silly code case.
This silly code segment has existed in the sources since it was brought
into FreeBSD 10 years ago. I honestly have no idea why this was done.
It was possible that I thought that it might have been better to not
set B_ASYNC for the "else" case, but I can't remember.
Anyhow, this patch gets rid of the if/else that does the same thing
either way, since it looks silly and upsets a static analyser.
This will have no semantic effect on the NFS client.

PR:		238167
2019-05-31 00:56:31 +00:00
Brooks Davis
4af6033324 makesyscalls.sh: always use absolute path for syscalls.conf
syscalls.conf is included using "." which per the Open Group:

 If file does not contain a <slash>, the shell shall use the search
 path specified by PATH to find the directory containing file.

POSIX shells don't fall back to the current working directory.

Submitted by:	Nathaniel Wesley Filardo <nwf20@cl.cam.ac.uk>
Reviewed by:	bdrewery
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D20476
2019-05-30 20:56:23 +00:00
Li-Wen Hsu
f1b0e65941 Add the missing braces to fix the code not guarded by the if clause and has
misleading indentation.  This is found by gcc -Wmisleading-indentation

Approved by:	erj
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20428
2019-05-30 20:42:36 +00:00
Navdeep Parhar
35c0026f42 cxgbe/t4_tom: Do not attempt to look up entries in the TCB history if
it hasn't been initialized.

This fixes a bug in r346570 that could cause a panic when servicing
TCP_INFO for offloaded connections.

MFC after:	3 days
Sponsored by:	Chelsio Communications
2019-05-30 17:27:40 +00:00
Dmitry Chagin
c8124e20e5 Remove wrong inline keyword.
Reported by:	markj
MFC after:	1 week
2019-05-30 16:11:20 +00:00
Konstantin Belousov
5c066cd2e2 Remove TODO comment after posixshmcontrol(1) added.
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2019-05-30 16:04:00 +00:00
Konstantin Belousov
5d993207da Silence witness warning about duplicated mutex type.
The order is correct, it is nullfs vnode interlock -> lower vnode
interlock.  vop_stdadd_writecount() is called from nullfs
VOP_ADD_WRITECOUNT() and both take interlocks.

Requested by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2019-05-30 15:04:09 +00:00
Dmitry Chagin
c5afec6e89 Complete LOCAL_PEERCRED support. Cache pid of the remote process in the
struct xucred. Do not bump XUCRED_VERSION as struct layout is not changed.

PR:		215202
Reviewed by:	tijl
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D20415
2019-05-30 14:24:26 +00:00
Dmitry Chagin
1410bfe142 Linux does not support MSG_OOB for unix(4) or non-stream oriented socket,
return EOPNOTSUPP as a Linux does.

Reviewed by:	tijl
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D20409
2019-05-30 14:21:51 +00:00
Alexander Motin
1a15d60d0e Fix busy status leak in case of incorrect passthrough args.
MFC after:	1 week
2019-05-30 14:13:09 +00:00
Marcin Wojtas
9d0073e413 Update ENA version to v2.0.0
ENAv2 introduces many new features, bug fixes and improvements.

Main new features are LLQ (Low Latency Queues) and independent queues
reconfiguration using sysctl commands.

The year in copyright notice was updated to 2019.

Submitted by:  Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2019-05-30 13:52:32 +00:00
Marcin Wojtas
858659f752 Improve ENA reset handling
For easier debugging, the reset is being triggered and the reset reason is
being set only in case it is done for the first time. Such approach will
ensure that the first reset reason is not going to be overwritten and
will make it easier for debugging.

Also, add a reset trigger upon invalid Tx requested ID.

Submitted by:  Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2019-05-30 13:45:41 +00:00
Marcin Wojtas
77958fcdab Fix NULL pointer dereference in ena_up()
If the call to ena_up() in ena_restore_device() fails, next usage of
`ifconfig up` will cause NULL pointer dereference.

This patch adds additional checks to prevent that.

Submitted by:  Rafal Kozik <rk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2019-05-30 13:42:52 +00:00
Marcin Wojtas
30425f9333 Unify new line characters in the ENA driver
Some messages were missing new line character and traces were not having
unified behavior. To fix that, each trace and printout should add new
line character at the end of each string - that should improve
readability.

Submitted by:  Rafal Kozik <rk@semihalf.com>
Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2019-05-30 13:41:39 +00:00
Marcin Wojtas
a870eab232 Fix Tx offloads for fragmented pkt headers in ENA
If the headers of the packets are split into multiple segments of the
mbuf chain, the previous version of ena_tx_csum which was assuming,
that all segments will lay in the first mbuf, will eventually fail to
map the headers properties to meta descriptor.

That will cause Tx checksum offload to do not work and was leading to
memory corruption. It could even cause the crash of the system.

Submitted by:  Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2019-05-30 13:40:51 +00:00
Marcin Wojtas
32f63fa7f9 Split ENA reset routine into restore and destroy stages
For alignment with Linux driver and better handling ena_detach(), the
reset is now calling ena_device_restore() and ena_device_destroy().

The ena_device_destroy() is also being called on ena_detach(), so the
code will be more readable.

The watchdog is now being activated after reset only, if it was active
before.

There were added additional checks to ensure, that there is no race with
the link state change AENQ handler.

Submitted by:  Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2019-05-30 13:39:25 +00:00
Marcin Wojtas
fd43fd2af0 Use bitfield for storing global ENA device states
As the ENA can have multiple states turned on/off, it is more convenient
to store them in single bitfield instead of multiple boolean variables.

The bitset FreeBSD API was used for the bitfield implementation, as it
provides flexible structure together with API which also supports atomic
bitfield operations.

For better readability basic macros from API were wrapped into custom
ENA_FLAG_* macros, which are filling up common parameters for all calls.

Submitted by:  Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2019-05-30 13:37:15 +00:00
Marcin Wojtas
804402a54e Fix error handling when ENA reset fails
Before the patch, error handling was not releasing all resources and
was not issuing device reset if the reset task failed.

That could cause memory leak and fault of the device.

Submitted by:  Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2019-05-30 13:35:43 +00:00
Marcin Wojtas
460212715f Fill bdf field of the host_info structure in ENA
The host info bdf field is the abbreviation for the bus, device,
function of the PCI on which the device is being attached to.

Now the driver is filling information about that using FreeBSD RID
resource.

Submitted by:  Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2019-05-30 13:35:02 +00:00
Marcin Wojtas
af66d7d029 Add additional doorbells on ENA Tx path
The new ENA HAL is introducing API, which can determine on Tx path if
the doorbell is needed.

That way, it can tell the driver, that it should call an doorbell.
The old threshold value wasn't removed, as not all HW is supporting this
feature - so it was reworked to also work with the new API.

Submitted by:  Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2019-05-30 13:33:31 +00:00
Marcin Wojtas
82f5a7921c Limit maximum size of Rx refill threshold in ENA
The Rx ring size can be as high as 8k. Because of that we want to limit
the cleanup threshold by maximum value of 256.

Submitted by:  Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2019-05-30 13:31:35 +00:00
Marcin Wojtas
4fa9e02d9b Add support for the LLQv2 and WC in ENA
LLQ (Low Latency Queue) is the feature, that allows pushing header
directly to the device through PCI before even DMA is triggered.

It reduces latency, because device can start preparing packet before
payload is sent through DMA.

To speed up sending data through PCI, the Write Combining is enabled,
which allows hardware to buffer data before sending them on the PCI - it
allows to reduce number of PCI IO operations.

ENAv2 is using special descriptor for the negotiation of the LLQ.
Currently, only the default configuration is supported.

Submitted by:  Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2019-05-30 13:30:52 +00:00
Marcin Wojtas
5cb9db0706 Lock optimization in ENA
Handle IO interrupts using filter routine. That way, the main cleanup
task could be moved to the separate thread using taskqueue.

The deferred Rx cleanup task was removed, and now the cleanup task is
begin called instead. That way, the Rx lock could be removed.

In addition, Queue management (wake up and stop TX ring) was added, so
the TX cleanup task can be performed mostly lockless.

Submitted by:  Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2019-05-30 13:29:24 +00:00
Marcin Wojtas
6064f2899f Add tuneable drbr ring size and hw queues depth for ENA
The driver now supports per adapter tuning of buffer ring size and HW Rx
ring size.

It can be achieved using sysctl node dev.ena.X.

Submitted by:  Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2019-05-30 13:28:03 +00:00
Marcin Wojtas
4e30699966 Fix error in validate_tx_req_id() in ENA
If the requested ID was out of range, the tx_info structure was NULL and
the function was trying to access the field of the NULL object.

Submitted by:  Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2019-05-30 13:26:18 +00:00
Marcin Wojtas
c115a1e258 Change attach order to prevent crash upon failure in ENA
The if_detach was causing crash if the MSI-x configuration in the attach
failed. To prevent this issue, the ifnet is being configured at the end
of the attach function.

Submitted by:  Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2019-05-30 13:24:47 +00:00
Marcin Wojtas
9151c55d02 Change order of ifp release on ENA detach
In rare case, when the ifconfig is called just before kldunload, it is
possible, that ena_up routine will be called after queue locks are
released.

To prevent that, ifp is detached before the last ena_down is called and
further, the ifp is freed at the end of the function.

Submitted by:  Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2019-05-30 13:22:53 +00:00
Marcin Wojtas
2b5b60fe0d Check for number of MSI-x upon partial allocation in ENA
The ENA driver needs at least 2 MSI-x - one for admin queue, and one for
IO queues pair. If there were not enough resources to allocate more than
one MSI-x, the device should not be attached.

Submitted by:  Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2019-05-30 13:22:12 +00:00
Marcin Wojtas
469a84079c Set error value when allocation of IO irq fails in ENA
bus_alloc_resource_any() is not returning error value in case of an
error.
If the function call fails, the error value was not passed to the
ena_up() function.

Submitted by:  Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2019-05-30 13:20:42 +00:00
Marcin Wojtas
5b14f92e6c Set vaddr and paddr as NULL when DMA alloc fails in ENA
To prevent errors from assigning values from the DMA structure in case
of an error, zero the vaddr and paddr values upon failure.

Submitted by:  Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2019-05-30 13:19:32 +00:00
Marcin Wojtas
e80737381f Fix DMA synchronization in the ENA driver Tx and Rx paths
The DMA in FreeBSD requires explicit synchronization. ENA driver was
only doing PREREAD and PREWRITE synchronizations. Missing
bus_dmamap_sync() calls were added.

It is also required to synchronize DMA engine before unloading DMA map.

Submitted by:  Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2019-05-30 13:18:23 +00:00
Marcin Wojtas
d12f7bfc17 Check for missing MSI-x and Tx completions in ENA
If the first MSI-x won't be executed, then the timer service will detect
that and trigger device reset.

The checking for missing Tx completion was reworked, so it will also
check for missing interrupts. Checking number of missing Tx completions
can be performed after loop, instead of checking it every iteration.

Submitted by:  Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2019-05-30 13:16:56 +00:00
Marcin Wojtas
8ece6b25de Fill number of CPUs field on ENA host_info structure
The new ena_com allows the number of CPUs to be passed to the device in
the host info structure as a hint.

Submitted by:  Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2019-05-30 13:15:38 +00:00
Marcin Wojtas
e3cecf70c3 Print ENA Tx error conditionally
Information about Tx error should be only displayed, if packet
preparation failed due to error other than out of memory.

Submitted by:  Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2019-05-30 13:14:58 +00:00
Marcin Wojtas
c9b099ec94 Trigger reset in ENA if there are too many Rx descriptors
Whenever the driver will receive too many descriptors from the device,
it should trigger the device reset, as it is indicating that the device
is in invalid state.

Submitted by:  Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2019-05-30 13:13:15 +00:00
Marcin Wojtas
277f11c401 Remove RSS support in ENA
Receive Side Scaling is optional feature that could be enabled in kernel
configuration by defining flag RSS.

Kernel uses hash to store and find protocol control block which is
stored in hash tables.
Kernel and NIC hash functions must be consistent. Otherwise case lookup
fails.

To achieve this kernel provides API to set proper hash key to NIC.
As it is not possible to change key for virtual ENA NIC, this driver
cannot support RSS function.

ENA is designed to work in virtual environments so supporting hardware
version of this card is unnecessary.

Submitted by:  Rafal Kozik <rk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2019-05-30 13:12:14 +00:00
Marcin Wojtas
40621d71fd Add notification AENQ handler for ENA
Notification AENQ handler is responsible for handling requests from ENA
device. Missing Tx threshold, Tx timeout and keep alive timeout can be
set using hints from the aenq descriptor which can be delivered in the
ENA admin notification.

The queue suspending and resuming tasks are not supported by the
driver.

Submitted by:  Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2019-05-30 13:09:53 +00:00
Marcin Wojtas
e6de9a8384 Print information when ENA admin error occurs
ENA_ADMIN_FATAL_ERROR and ENA_ADMIN_WARNING aenq groups were indicated
as supported, so the unimplemented_aenq_handler() will print out error
message, whenever an error will occur within the ENA admin context.

Submitted by:  Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2019-05-30 13:08:00 +00:00
Marcin Wojtas
b8ca5dbe9e Do not specify active media type in ENA
As the ENA is working only in virtualized environment, the active media
is not specified. Instead, the active link type is set as unknown.

Submitted by:  Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2019-05-30 13:06:07 +00:00
Marcin Wojtas
67ec48bb3a Adjust ENA driver to the new ena-com
Recent HAL change preparing to support ENAv2 required minor driver
modifications.

The ena_com_sq_empty_space() is not available in this ena-com, so it had
to be replaced with ena_com_free_desc().

Moreover, the ena_com_admin_init() is no longer using 3rd argument
indicating if the spin lock should be initialized, so it was removed.

Submitted by:  Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2019-05-30 13:01:46 +00:00
Jayachandran C.
33e4f8169d arm64 gicv3_its: Fix a typo
Fix 'Cavium' spelling in errata description.

Reviewed by:	andrew
Differential Revision:	https://reviews.freebsd.org/D20418
2019-05-30 01:39:07 +00:00
Jayachandran C.
55d9048856 gicv3_its: do LPI init only once per CPU
The initialization required for LPIs (setting up pending tables etc.)
has to be done just once per CPU, even in the case where there are
multiple ITS blocks associated with the CPU.

Add a flag lpi_enabled in the per-cpu distributor info for this and
use it to ensure that we call its_init_cpu_lpi() just once.

This enables us to support platforms where multiple GIC ITS blocks
can generate LPIs to a CPU.

Reviewed by:	andrew
Differential Revision:	https://reviews.freebsd.org/D19844
2019-05-30 01:32:00 +00:00
Jayachandran C.
db359ad3d6 gicv3_its: refactor LPI init into a new function
Move the per-cpu LPI intialization to a separate function. This is
in preparation for a commit that does LPI init only once for a CPU,
even when there are multiple ITS blocks associated with the CPU.

No functional changes in this commit.

Reviewed by:	andrew
Differential Revision:	https://reviews.freebsd.org/D19843
2019-05-30 01:24:47 +00:00
Jayachandran C.
a9b702ddde gic_v3: consolidate per-cpu redistributor information
Update 'struct gic_redists' to consolidate all per-cpu redistributor
information into a new 'struct redist_pcpu'. Provide a new interface
(GICV3_IVAR_REDIST) for the GIC driver, which can be used to retrieve
the per-cpu data.

This per-cpu redistributor struct will be later used to improve the
GIC ITS setup.

While there, remove some unused fields in gic_v3_var.h interface.
No functional changes.

Reviewed by:	andrew
Differential Revision:	https://reviews.freebsd.org/D19842
2019-05-30 01:21:08 +00:00
Ravi Pokala
ffac5d814c Add bits related to SANITIZE, SED, and form-factor to (struct ata_params)
Based on ATA-ACS-4, recognize several bit-fields related to the ATA SANITIZE
feature-set, Self-Encrypting Drives, and form-factor identification.

As part of this change, the name of word 48 of (struct ata_params) is being
changed. The previous name, "usedmovsd" does not appear to be related to the
previous definition of the word ("double-word IO supported"). The word was
defined that way in ATA-1 (1994), but it was marked "Reserved" (meaning
"unused, but might be used in the future") in ATA-2 (1996). It stayed that
way until ATA-8 (2008), which re-defined it as implemented in this change.
The field is not used in-tree.

Reviewed by:	mav
Sponsored by:	Panasas
Differential Revision:	https://reviews.freebsd.org/D20455
2019-05-29 23:50:31 +00:00
Gleb Smirnoff
4a9f6ba75b In r343857 the referred comment moved to uma_vm_zone_stats(). 2019-05-29 22:33:37 +00:00
Eric Joyner
668d6dbb4c iflib: provide probe wrapper for vendor drivers
From Jake:
Vendor drivers that exist out-of-tree generally should return
BUS_PROBE_VENDOR from their device probe functions. This helps ensure
that a vendor replacement driver will supersede the in-kernel driver for
a given device.

Currently, if a vendor wants to implement a driver based on iflib, it
will always report BUS_PROBE_DEFAULT.

Add a wrapper function, iflib_device_probe_vendor() which can be used in
place of iflib_device_probe(). This function will just return
BUS_PROBE_VENDOR whenever iflib_device_probe() would return
BUS_PROBE_DEFAULT.

While vendor drivers can already implement such a wrapper themselves,
providing it in the iflib.h header makes it easier for the vendor driver
to do the right thing.

Submitted by:	Jacob Keller <jacob.e.keller@intel.com>
Reviewed by:	erj@, gallatin@, marius@
MFC after:	1 week
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D20221
2019-05-29 22:24:10 +00:00
Allan Jude
ad579d984f Fix assertion in ZFS TRIM code
Due to an attempt to check two conditions at once in a macro not designed
as such, the assertion would always evaluate to true.

#define VERIFY3_IMPL(LEFT, OP, RIGHT, TYPE) do { \
        const TYPE __left = (TYPE)(LEFT); \
        const TYPE __right = (TYPE)(RIGHT); \
        if (!(__left OP __right)) \
                assfail3(#LEFT " " #OP " " #RIGHT, \
                        (uintmax_t)__left, #OP, (uintmax_t)__right, \
                        __FILE__, __LINE__); \
_NOTE(CONSTCOND) } while (0)
#define ASSERT3U(x, y, z)       VERIFY3_IMPL(x, y, z, uint64_t)

Mean that we compared:
left = (type == ZIO_TYPE_FREE || psize)
OP = "<="
right = (SPA_MAXBLOCKSIZE)

If the type was not FREE, 0 is less than SPA_MAXBLOCKSIZE (16MB)
If the type is ZIO_TYPE_FREE, 1 is less than SPA_MAXBLOCKSIZE
The constraint on psize (physical size of the FREE operation) is never
checked against SPA_MAXBLOCKSIZE

Reported by:	Ka Ho Ng <khng300@gmail.com>
Reviewed by:	kevans
MFC after:	2 weeks
Sponsored by:	Klara Systems
2019-05-29 20:34:35 +00:00
Li-Wen Hsu
6c9e56b231 Add the likely missing braces in ips(4). This is found by gcc warning that
the code is not guarded by the if clause and has misleading indentation.

Approved by:	scottl
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20427
2019-05-29 18:11:17 +00:00
Ruslan Bukin
33da49cd2e Don't copy the data from bounce buffer back to the mbuf if channel does
not use bounce buffering.

Sponsored by:	DARPA, AFRL
2019-05-29 16:01:34 +00:00
Ruslan Bukin
7b4ec8d2fc Pass pci_base address instead of physical address to rman_manage_region().
This should had been part of r347930 ("pci: ecam: Correctly parse memory
and IO region").

Sponsored by:	DARPA, AFRL
2019-05-29 15:53:33 +00:00
Konstantin Belousov
ab74c84333 Do not go into sleep in sleepq_catch_signals() when SIGSTOP from
PT_ATTACH was consumed.

In particular, do not clear TDP_FSTP in ptracestop() if td_wchan is
non-NULL. Leave it to sleepq_catch_signal() to clear and convert zero
return code to EINTR.

Otherwise, per submitter report, if the PT_ATTACH SIGSTOP was
delivered right after the thread was added to the sleepqueue but not
yet really sleep, and cursig() caused debugger attach, the thread
sleeps instead of returning to the userspace boundary with EINTR.

PR: 231445
Reported by:	Efi Weiss <valmarelox@gmail.com>
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D20381
2019-05-29 14:05:27 +00:00
Andriy Gapon
fec2f12ebd revert r273728 and parts of r306589, iicbus no-stop by default feature
Since drm2 removal, there has not been any consumer of the feature in the
tree.  I am also unaware of any out-of-tree consumer.
More importantly, the feature has been broken from the very start, both
before and after r306589, because the ivar was set on a device that does
not support it and it was read from another device that also does not
support it.

A bus-wide no-stop flag cannot be implemented as an ivar as iicbus
attaches as a child of various drivers.  Implementing the ivar in each
and every I2C driver is just impractical.

If we ever want to implement this feature properly, then probably the
easiest way to do it would be via a flag in the softc of iicbus.
In fact, we might have to do that in the stable branches if we want to
fix the code for them.

Reported by:	ian (long time ago)
MFC after:	1 month (maybe)
X-MFC-note:	cannot just merge the change, must keep drm2 happy
2019-05-29 09:08:20 +00:00
Justin Hibbits
78473c580b Update __FreeBSD_version and Makefile check for r348347
libdwarf needs forcibly rebuilt after r348347.
2019-05-29 02:26:15 +00:00
Pedro F. Giffuni
ec845b07c6 typo: suppported. 2019-05-29 02:08:23 +00:00
Kyle Evans
d8b985430c if_bridge(4): Complete bpf auditing of local traffic over the bridge
There were two remaining "gaps" in auditing local bridge traffic with
bpf(4):

Locally originated outbound traffic from a member interface is invisible to
the bridge's bpf(4) interface. Inbound traffic locally destined to a member
interface is invisible to the member's bpf(4) interface -- this traffic has
no chance after bridge_input to otherwise pass it over, and it wasn't
originally received on this interface.

I call these "gaps" because they don't affect conventional bridge setups.
Alas, being able to establish an audit trail of all locally destined traffic
for setups that can function like this is useful in some scenarios.

Reviewed by:	kp
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D19757
2019-05-29 01:08:30 +00:00
Johannes Lundberg
1e363d64a5 pseudofs: Ignore unsupported commands in vop_setattr.
Users of pseudofs (e.g. lindebugfs), should be able to receive
input from command line via commands like "echo 1 > /path/to/file".
Currently this fails because sh tries to truncate the file first and
vop_setattr returns not supported error for this. This patch simply
ignores the error and returns 0 instead.

Reviewed by:	imp (mentor), asomers
Approved by:	imp (mentor), asomers
MFC after:	1 week
Differential Revision: D20451
2019-05-28 20:54:59 +00:00
Alexander Motin
3582828053 Fix array out of bound panic introduced in r306219.
As I see, different NICs in different configurations may have different
numbers of TX and RX queues.  The code was assuming 1:1 mapping between
event queues (interrupts) and TX/RX queues.  Since number of interrupts
is set to maximum of TX and RX queues, when those two are different, the
system is doomed.

I have no documentation or deep knowledge about this hardware, so this
change is based on general observations and code reading.  If some of my
guesses are wrong, please do better.  I just confirmed HP NC550SFP NICs
are working now.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2019-05-28 18:32:04 +00:00
Adrian Chadd
1bae1560ee [ath_hal] Fix queue bits a bit
Found by PVS Studio: duplicate assignment; add assignment of tqi_compBuf.

Submitted by:	<mizhka@gmail.com>
Differential Revision:	https://reviews.freebsd.org/D20431
2019-05-28 18:05:10 +00:00
Kirk McKusick
e94828443c Add a missing bresle() in seldom-used error return. 2019-05-28 17:31:35 +00:00
Kirk McKusick
af6aeacb3e Convert use of UFS-specific #ifdef DEBUG to DIAGNOSTIC or INVARIANTS
as appropriate. No functional change intended.

Suggested-by: markj
2019-05-28 16:32:04 +00:00
Doug Moore
1c76d3a9fb Implement the ffs and fls functions, and their longer counterparts, in
cpufunc, in terms of __builtin_ffs and the like, for arm32 v6 and v7
architectures, and use those, rather than the simple libkern
implementations, in building arm32 kernels.

Reviewed by: manu
Approved by: kib, markj (mentors)
Tested by: iz-rpi03_hs-karlsruhe.de, mikael.urankar_gmail.com, ian
Differential Revision: https://reviews.freebsd.org/D20412
2019-05-28 15:47:00 +00:00
Andrey V. Elsukov
de25327313 Rework r348303 to reduce the time of holding global BPF lock.
It appeared that using NET_EPOCH_WAIT() while holding global BPF lock
can lead to another panic:

spin lock 0xfffff800183c9840 (turnstile lock) held by 0xfffff80018e2c5a0 (tid 100325) too long
panic: spin lock held too long
...
#0  sched_switch (td=0xfffff80018e2c5a0, newtd=0xfffff8000389e000, flags=<optimized out>) at /usr/src/sys/kern/sched_ule.c:2133
#1  0xffffffff80bf9912 in mi_switch (flags=256, newtd=0x0) at /usr/src/sys/kern/kern_synch.c:439
#2  0xffffffff80c21db7 in sched_bind (td=<optimized out>, cpu=<optimized out>) at /usr/src/sys/kern/sched_ule.c:2704
#3  0xffffffff80c34c33 in epoch_block_handler_preempt (global=<optimized out>, cr=0xfffffe00005a1a00, arg=<optimized out>)
    at /usr/src/sys/kern/subr_epoch.c:394
#4  0xffffffff803c741b in epoch_block (global=<optimized out>, cr=<optimized out>, cb=<optimized out>, ct=<optimized out>)
    at /usr/src/sys/contrib/ck/src/ck_epoch.c:416
#5  ck_epoch_synchronize_wait (global=0xfffff8000380cd80, cb=<optimized out>, ct=<optimized out>) at /usr/src/sys/contrib/ck/src/ck_epoch.c:465
#6  0xffffffff80c3475e in epoch_wait_preempt (epoch=0xfffff8000380cd80) at /usr/src/sys/kern/subr_epoch.c:513
#7  0xffffffff80ce970b in bpf_detachd_locked (d=0xfffff801d309cc00, detached_ifp=<optimized out>) at /usr/src/sys/net/bpf.c:856
#8  0xffffffff80ced166 in bpf_detachd (d=<optimized out>) at /usr/src/sys/net/bpf.c:836
#9  bpf_dtor (data=0xfffff801d309cc00) at /usr/src/sys/net/bpf.c:914

To fix this add the check to the catchpacket() that BPF descriptor was
not detached just before we acquired BPFD_LOCK().

Reported by:	slavash
Tested by:	slavash
MFC after:	1 week
2019-05-28 11:45:00 +00:00
Andrew Turner
51db930589 The alignment is passed into contigmalloc_domainset in the 7th argument.
KUBSAN was complaining the pointer contigmalloc_domainset returned was
misaligned. Fix this by using the correct argument to find the alignment
in the function signature.

Reported by:	KUBSAN
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
2019-05-28 10:55:59 +00:00
Andrew Turner
5393cbce3d Teach the kernel KUBSAN runtime about alignment_assumption
This checks the alignment of a given pointer is sufficient for the
requested alignment asked for. This fixes the build with a recent
llvm/clang.

Sponsored by:	DARPA, AFRL
2019-05-28 09:12:15 +00:00
Cy Schubert
0d5de29a10 Contuation of r343701, removal of irrelevant #ifdefs.
MFC after:	1 week
2019-05-28 01:41:08 +00:00
Doug Moore
e67a5068ec Reduce the code size and number of ffsl calls in vm_reserv_break. Use
xor to find where free ranges begin and end.

Tested by: pho
Reviewed by:alc
Approved by:markj, kib (mentors)
Differential Revision:	https://reviews.freebsd.org/D20256
2019-05-28 00:51:23 +00:00
Cy Schubert
8a5969801d style(9)
MFC after:	1 week
2019-05-27 20:22:54 +00:00
Cy Schubert
ef7860a1e1 Fix indentation and while at it simplfy the code.
Reported by:	lwhsu@
MFC after:	1 week
2019-05-27 20:22:51 +00:00
Cy Schubert
8cd20ebdcb Remove compile-time tests for unsupported versions of FreeBSD.
MFC after:	1 week
2019-05-27 20:22:48 +00:00
Konstantin Belousov
fcd0c06eee Correct some inconsistencies in the earliest created kernel page
tables which affect demotion.

The last last-level page table under 2M mappings below KERNend was
only partially initialized.  When that page was used as the hardware
page table for demotion of the 2M mapping, the result was not
consistent.  Since pmap_demote_pde() is switched to use PG_PROMOTED as
the test for the validity of the saved last level page table page, we
can keep page table pages zero-initialized instead.  Demotion would
fill them as needed.

Only map the created page tables beyond KERNend, there is no need to
pre-promote PTmap after KERNend, because the extra mapping is not used.

Only round up *firstaddr to 2M boundary when it is below rounded
KERNend.  Sometimes the allocpages() calls advance *firstaddr past the
end of the last 2MB page mapping. In that case, this conditional
avoids wasting an average of 1MB of physical memory.

Update comments to explain action in more clean and direct language.

Reported and tested by:	pho
In collaboration with:	alc
Sponsored by:	The FreeBSD Foundation (kib)
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D20380
2019-05-27 15:21:26 +00:00
Andrey V. Elsukov
44a514745c Fix possible NULL pointer dereference.
bpf_mtap() can invoke catchpacket() for already detached descriptor.
And this can lead to NULL pointer dereference, since bd_bif pointer
was reset to NULL in bpf_detachd_locked(). To avoid this, use
NET_EPOCH_WAIT() when descriptor is removed from interface's descriptors
list. After the wait it is safe to modify descriptor's content.

Submitted by:	kib
Reported by:	slavash
MFC after:	1 week
2019-05-27 12:41:41 +00:00
Kirk McKusick
298184acb8 Add function name and line number debugging information to softupdates
worklist structures to help track their movement between work lists.
No functional change to the operation of soft updates intended.
2019-05-27 06:22:43 +00:00
Justin Hibbits
a5868885fa kern/CTF: link_elf_ctf_get() on big endian platforms
Check the CTF magic number in big endian platforms.  This lets DTrace FBT
handle types correctly on these platforms.

Submitted by:	Brandon Bergren
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20413
2019-05-27 04:20:31 +00:00
Justin Hibbits
b2aea1ad8f powerpc/dtrace: Fix fbt function probing for ELFv2
'.' function names exist only in ELFv1.  ELFv2 does away with function
descriptors, and look more like they do on powerpc(32) and most other
platforms, as direct function pointers.  Stop blacklisting regular function
names in ELFv2.

Submitted by:	Brandon Bergren
Differential Revision:	https://reviews.freebsd.org/D20346
2019-05-27 03:18:56 +00:00
Conrad Meyer
af8f74ad14 virtio_random(4): Remove unneeded reference to device
The device_t always references the softc, so we can pass the device and
obtain the softc instead of the other way around.
2019-05-27 00:55:46 +00:00
Conrad Meyer
6fe286ed83 aesni(4): Fix trivial type typo
This fixes the kernel build with xtoolchain-gcc (6.4.0).

X-MFC-With:	r348268
2019-05-27 00:47:51 +00:00
Conrad Meyer
9ecc02ea86 sys/bufobj.h: Avoid using C++ reserved keyword 'private'
No functional change (except for out-of-tree C++ kmods).
2019-05-27 00:43:43 +00:00
Jayachandran C.
87820437f9 arm64 nexus: remove incorrect warning
acpi_config_intr() will be called when an arm64 system booted with ACPI.
We do the interrupt mapping for ACPI interrupts in nexus_acpi_map_intr()
on arm64, so acpi_config_intr() has to just return success without
printing this error message.

Reviewed by:	andrew
Differential Revision:	https://reviews.freebsd.org/D19432
2019-05-26 23:04:21 +00:00
Michael Tuexen
bc35229fad When an ACK segment as the third message of the three way handshake is
received and support for time stamps was negotiated in the SYN/SYNACK
exchange, perform the PAWS check and only expand the syn cache entry if
the check is passed.
Without this check, endpoints may get stuck on the incomplete queue.

Reviewed by:		jtl@
MFC after:		3 days
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D20374
2019-05-26 17:18:14 +00:00
Alexey Dokuchaev
0a16ee7544 Fix two errors reported by PVS Studio: V646 Consider inspecting the
application's logic.  It's possible that 'else' keyword is missing.

Reviewed by:	gallatin, np, pfg
Approved by:	pfg
Differential Revision:	https://reviews.freebsd.org/D20396
2019-05-26 12:41:03 +00:00
Li-Wen Hsu
d086d41363 Remove an uneeded indentation introduced in r223637 to silence gcc warnging
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2019-05-25 23:58:09 +00:00