130682 Commits

Author SHA1 Message Date
markj
ce55b46f28 Ensure the background laundering threshold is positive after a scan.
The division added in r331732 meant that we wouldn't attempt a
background laundering until at least v_free_target - v_free_min clean
pages had been freed by the page daemon since the last laundering. If
the inactive queue is depleted but not completely empty (e.g., because
it contains busy pages), it can thus take a long time to meet this
threshold. Restore the pre-r331732 behaviour of using a non-zero
background laundering threshold if at least one inactive queue scan has
elapsed since the last attempt at background laundering.

Submitted by:	tijl (original version)
2018-04-02 15:07:41 +00:00
avg
cbde65132d unify amd64 and i386 cpu_reset() in x86/cpu_machdep.c
Because I didn't see any reason not too.
I've been making some changes to the code and couldn't help but notice
that the i386 and am64 code was nearly identical.

MFC after:	17 days
2018-04-02 13:45:23 +00:00
andrew
5fa8a3511e Add the missing header for malloc(9). It was pulled in through header
pollution that doesn't seem to exist in some configurations.
2018-04-02 13:36:48 +00:00
avg
f909741389 x86 cpu_reset: if failed to switch to BSP proceed to cpu_reset_real
If cpu_reset() is called on an AP and if it somehow fails to wake the
BSP, then it's better to attempt the reset on the AP than just sit there
spinning on an unusable and undebuggable system.

MFC after:	16 days
2018-04-02 08:06:18 +00:00
avg
a9c1c585d1 x86 cpu_reset_proxy: no need to stop_cpus() the original processor
The processor is "parked" in a spin-loop already and that's sufficient
for the reset.  There is nothing that stop_cpus() would add here, only
extra complexity and fragility.
The original processor does not need to enable interrupts now, in fact,
it must not do that.

MFC after:	2 weeks
2018-04-02 07:45:13 +00:00
glebius
a76113efee Use UMA_SLAB_SPACE macro. No functional change here. 2018-04-02 05:15:25 +00:00
glebius
f1818db2fc In uma_startup_count() handle special case when zone will fit into
single slab, but with alignment adjustment it won't. Again, when
there is only one item in a slab alignment can be ignored. See
previous revision of this file for more info.

PR:		227116
2018-04-02 05:14:31 +00:00
glebius
b8ac450ecc Handle a special case when a slab can fit only one allocation,
and zone has a large alignment. With alignment taken into
account uk_rsize will be greater than space in a slab. However,
since we have only one item per slab, it is always naturally
aligned.

Code that will panic before this change with 4k page:

	z = uma_zcreate("test", 3984, NULL, NULL, NULL, NULL, 31, 0);
	uma_zalloc(z, M_WAITOK);

A practical scenario to hit the panic is a machine with 56 CPUs
and 2 NUMA domains, which yields in zone size of 3984.

PR:		227116
MFC after:	2 weeks
2018-04-02 05:11:59 +00:00
ian
60d1e1d592 Fix the build on arches with default unsigned char. Capture the fubyte()
return value in an int as well as the char, and test the full int value
for fubyte() failure.
2018-04-01 18:53:27 +00:00
ian
b2a0ec2c83 Add opt_platform.h for several modules that have #ifdef FDT in the source.
Submitted by:	Andre Albsmeier <Andre.Albsmeier@siemens.com>
2018-04-01 18:22:24 +00:00
jeff
e5b3bfbbcf Add a uma cache of free pages in the DEFAULT freepool. This gives us
per-cpu alloc and free of pages.  The cache is filled with as few trips
to the phys allocator as possible by the use of a new
vm_phys_alloc_npages() function which allocates as many as N pages.

This code was originally by markj with the import function rewritten by
me.

Reviewed by:	markj, kib
Tested by:	pho
Sponsored by:	Netflix, Dell/EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D14905
2018-04-01 04:50:05 +00:00
jeff
c9b029efcd Add the flag ZONE_NOBUCKETCACHE. This flag instructions UMA not to keep
a cache of fully populated buckets.  This will be used in a follow-on
commit.

The flag idea was originally from markj.

Reviewed by:	markj, kib
Tested by:	pho
Sponsored by:	Netflix, Dell/EMC Isilon
2018-04-01 04:47:05 +00:00
imp
f5b78aa5cb The Uninorth ID was really for Uninorth 2.
Submitted by: Sevan Janiyan
Differential Revision: https://reviews.freebsd.org/D14919
2018-04-01 00:25:47 +00:00
markj
82b6c3b99e Don't verify td_locks accounting after a panic.
Reported by:	pho
X-MFC with:	r331738
2018-03-31 23:24:28 +00:00
imp
a7dfa8ed18 fwohcireg.h is 99% the same between the boot loader and the
kernel. Delete it and fix up the 1% difference because there's no need
for them to be different.
2018-03-31 22:02:59 +00:00
jah
14c0c0b041 Remove MK_AUTO_OBJ from env passed to PORTS_MODULES
This fixes a failure to resolve object file paths seen when buildkernel
(which sets MK_AUTO_OBJ=yes) and installkernel (which sets MK_AUTO_OBJ=no)
are run as separate steps.  r329232 partially fixed this scenario by removing
MAKEOBJDIR, but it seems the AUTO_OBJ setting also needs to be on the same
page for the build and install steps.

Reviewed by:	bdrewery
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D14143
2018-03-31 05:17:12 +00:00
brooks
2b96daf50f Document and enforce assumptions about struct (in6_)ifreq.
- The two types must be type-punnable for shared members of ifr_ifru.
  This allows compatibility accessors to be shared.

- There must be no padding gap between ifr_name and ifr_ifru.  This is
  assumed in tcpdump's use of SIOCGIFFLAGS output which attempts to be
  broadly portable.  This is true for all current architectures, but very
  large (256-bit) fat-pointers could violate this invariant.

Reviewed by:	kib
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14910
2018-03-30 21:38:53 +00:00
brooks
019d267b75 Add deprecation notices for Arcnet and FDDI drivers.
We intend to remove support before FreeBSD 12 is branched.

Reviewed by:	imp, emaste
MFC after:	3 days
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14890
2018-03-30 20:27:47 +00:00
brooks
069da6dc0f Fall back to ether_ioctl() by default.
The common pratice in ethernet device drivers is to fall back to
ether_ioctl() to implement generic ioctls not implemented by the driver
and to fail if no handler exists.

Convert these drivers to follow that practice rather than calling
ether_ioctl() for specific cases.

vxge(4) aready had the default case, but it was only called on failure
to match.

Reviewed by:	imp
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14895
2018-03-30 20:24:29 +00:00
hselasky
bf24bc5621 Optimise use of Giant in the LinuxKPI.
- Make sure Giant is locked when calling PCI device methods.
Newbus currently requires this.

- Avoid unlocking Giant right before aquiring the sleepqueue lock.
This can save a task switch.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-30 20:11:12 +00:00
hselasky
7cf4e46e9b Remove unused structure field in mlx5core.
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2018-03-30 19:58:58 +00:00
hselasky
008fe18378 Bump mlx5core driver version.
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2018-03-30 19:55:31 +00:00
hselasky
e5983f6a99 Fix for use after free in mlx5core.
Make sure the command completion handler is not called when the device is
in internal error state. This can easily trigger use after free situations.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2018-03-30 19:50:45 +00:00
hselasky
76b81f0e68 Make sure Giant is locked when allocating bus resources in mlx5core.
During health care IRQ resources will be reallocated.
Newbus requires that Giant is locked before accessing
these resources.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2018-03-30 19:49:35 +00:00
hselasky
e40f63b342 Collect firmware dump when mlx5core is in device error state.
Firmware dump collecting should be triggered in case firmware syndrome
with request for reset bit is set.

MFC after:	3 days
Submitted by:	slavash@
Sponsored by:	Mellanox Technologies
2018-03-30 19:48:25 +00:00
hselasky
c86a08d6bb Reorganize health recovery in mlx5core.
- Move the semaphore locking and unlocking to the same function.
- Flags are no longer needed if the reset and crdump will be done in the
  same function.

MFC after:	3 days
Submitted by:	slavash@
Sponsored by:	Mellanox Technologies
2018-03-30 19:45:48 +00:00
hselasky
9b9f7307dd Prepare for FW dump in error state in mlx5core.
- Move firmware dump prep and cleanup to init_one() and remove_one() so that
the init and cleanup will happen only upon driver reload.
- Add some prints to indicate firmware dump.

MFC after:	3 days
Submitted by:	slavash@
Sponsored by:	Mellanox Technologies
2018-03-30 19:43:15 +00:00
hselasky
10305ffbeb Properly check if crspace is supported in mlx5core.
The old code checked for MLX5_CR_SPACE_DOMAIN which is irrelevant here.
However, if dev->vsec_addr would be 0, an access to wrong offset would
happen.

MFC after:	3 days
Submitted by:	slavash@
Sponsored by:	Mellanox Technologies
2018-03-30 19:39:27 +00:00
hselasky
4a9629969c Add missing newline character in print in mlx5core.
MFC after:	3 days
Submitted by:	slavash@
Sponsored by:	Mellanox Technologies
2018-03-30 19:35:31 +00:00
brooks
ac0325b4db Use an accessor function to access ifr_data.
This fixes 32-bit compat (no ioctl command defintions are required
as struct ifreq is the same size).  This is believed to be sufficent to
fully support ifconfig on 32-bit systems.

Reviewed by:	kib
Obtained from:	CheriBSD
MFC after:	1 week
Relnotes:	yes
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14900
2018-03-30 18:50:13 +00:00
brooks
349ad8a8de Remove a comment that suggests checking that a non-pointer is non-NULL.
Reviewed by:	melifaro, markj, hrs, ume
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14904
2018-03-30 18:26:29 +00:00
cem
ba233160f1 ocs_fc(4): Fix GCC build (-Wredundant-decls)
These objects are defined earlier in the same file; an extern declaration
after definition is redundant.

Broken in r331766 (introduction of ocs_fc(4)).

Sponsored by:	Dell EMC Isilon
2018-03-30 16:44:54 +00:00
ken
570099bbdd Bring in the Broadcom/Emulex Fibre Channel driver, ocs_fc(4).
The ocs_fc(4) driver supports the following hardware:

Emulex 16/8G FC GEN 5 HBAS
	LPe15004 FC Host Bus Adapters
	LPe160XX FC Host Bus Adapters

Emulex 32/16G FC GEN 6 HBAS
	LPe3100X FC Host Bus Adapters
	LPe3200X FC Host Bus Adapters

The driver supports target and initiator mode, and also supports FC-Tape.

Note that the driver only currently works on little endian platforms.  It
is only included in the module build for amd64 and i386, and in GENERIC
on amd64 only.

Submitted by:	Ram Kishore Vegesna <ram.vegesna@broadcom.com>
Reviewed by:	mav
MFC after:	5 days
Relnotes:	yes
Sponsored by:	Broadcom
Differential Revision:	https://reviews.freebsd.org/D11423
2018-03-30 15:28:25 +00:00
avg
2bd01ebfdb align i386 cpu_reset() with amd64 version
Maybe this code could be moved to x86.

MFC after:	1 week
2018-03-30 11:25:30 +00:00
kib
5d38ec36c4 Make vm_map_max/min/pmap KBI stable.
There are out of tree consumers of vm_map_min() and vm_map_max(), and
I believe there are consumers of vm_map_pmap(), although the later is
arguably less in the need of KBI-stable interface. For the consumers
benefit, make modules using this KPI not depended on the struct vm_map
layout.

Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D14902
2018-03-30 10:55:31 +00:00
emaste
b3c581c635 Correct comment typo in Hyper-V
PR:		226665
Submitted by:	Ryo ONODERA
MFC after:	3 days
2018-03-30 02:25:12 +00:00
landonf
286c7eb892 bhnd(4): Use the new BHND_CAP_BP64 capability flag to exclude DMA
translations unsupported by the backplane.
2018-03-29 19:48:50 +00:00
np
82a840f165 Fix RSS build (broken in r331309).
Sponsored by:	Chelsio Communications
2018-03-29 19:48:17 +00:00
landonf
9bc0c1eaa5 bhnd(4): include a subset of the ChipCommon capability flags in bhnd_chipid;
this provides early access to device capability flags required by bhnd(4)
bus and bhndb(4) bridge drivers.
2018-03-29 19:44:15 +00:00
davidcs
1b4fdbf03e 1. Add additional debug prints.
2. Break transmit when IFF_DRV_RUNNING is OFF.
3. set desc_count=0 for default case in switch in ql_rcv_isr()
MFC after:5 days
2018-03-29 17:36:34 +00:00
markj
9f0f8596bb Have TD_LOCKS_DEC() assert that td_locks is positive.
This makes it easier to catch lock accounting bugs, since the problem
is otherwise only detected upon a return to user mode (or never, for
kernel threads).

Reviewed by:	cem
MFC after:	1 week
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D14896
2018-03-29 17:19:59 +00:00
brooks
ce438e4cdd GC never enabled support for SIOCGADDRROM and SIOCGCHIPID.
When de(4) was imported in 1997 the world was not ready for these ioctls.
In over 20 years that hasn't changed so it seems safe to assume their
time will never come.

Reviewed by:	imp, jhb
Approved by:	CheriBSD
MFC after:	1 week
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14889
2018-03-29 15:58:49 +00:00
markj
373c34dba6 Fix the background laundering mechanism after r329882.
Rather than using the number of inactive queue scans as a metric for
how many clean pages are being freed by the page daemon, have the
page daemon keep a running counter of the number of pages it has freed,
and have the laundry thread use that when computing the background
laundering threshold.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D14884
2018-03-29 14:27:40 +00:00
cem
43c841bef3 opencrypto: Integrate Chacha20 algorithm into OCF
Mostly this is a thin shim around existing code to integrate with enc_xform
and cryptosoft (+ cryptodev).

Expand the cryptodev buffer used to match that of Chacha20's native block
size as a performance enhancement for chacha20_xform_crypt_multi.
2018-03-29 04:02:50 +00:00
jeff
5e244328ad Implement several enhancements to NUMA policies.
Add a new "interleave" allocation policy which stripes pages across
domains with a stride or width keeping contiguity within a multi-page
region.

Move the kernel to the dedicated numbered cpuset #2 making it possible
to assign kernel threads and memory policy separately from user.  This
also eliminates the need for the complicated interrupt binding code.

Add a sysctl API for viewing and manipulating domainsets.  Refactor some
of the cpuset_t manipulation code using the generic bitset type so that
it can be used for both.  This probably belongs in a dedicated subr file.

Attempt to improve the include situation.

Reviewed by:	kib
Discussed with:	jhb (cpuset parts)
Tested by:	pho (before review feedback)
Sponsored by:	Netflix, Dell/EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D14839
2018-03-29 02:54:50 +00:00
brooks
a45d44647f Remove infrastructure for token-ring networks.
Reviewed by:	cem, imp, jhb, jmallett
Relnotes:	yes
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14875
2018-03-28 23:33:26 +00:00
mav
d31db32426 MFV r331712:
9280 Assertion failure while running removal_with_ganging test with 4K devices

illumos/illumos-gate@243952c7ee

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Approved by: Garrett D'Amore <garrett@damore.org>
Author: Matt Ahrens <Matt.Ahrens@delphix.com>
2018-03-28 23:17:29 +00:00
mav
5502ac3ee9 MFV 331710:
9188 increase size of dbuf cache to reduce indirect block decompression

illumos/illumos-gate@268bbb2a2f

With compressed ARC (6950) we use up to 25% of our CPU to decompress indirect
blocks, under a workload of random cached reads. To reduce this decompression
cost, we would like to increase the size of the dbuf cache so that more
indirect blocks can be stored uncompressed.

If we are caching entire large files of recordsize=8K, the indirect blocks
use 1/64th as much memory as the data blocks (assuming they have the same
compression ratio). We suggest making the dbuf cache be 1/32nd of all memory,
so that in this scenario we should be able to keep all the indirect blocks
decompressed in the dbuf cache. (We want it to be more than the 1/64th that
the indirect blocks would use because we need to cache other stuff in the
dbuf cache as well.)

In real world workloads, this won't help as dramatically as the example
above, but we think it's still worth it because the risk of decreasing
performance is low. The potential negative performance impact is that we
will be slightly reducing the size of the ARC (by ~3%).

Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Allan Jude <allanjude@freebsd.org>
Reviewed by: Igor Kozhukhov <igor@dilos.org>
Approved by: Garrett D'Amore <garrett@damore.org>
Author: George Wilson <george.wilson@delphix.com>
2018-03-28 23:05:48 +00:00
mav
071413aa6e MFV r331708:
9321 arc_loan_compressed_buf() can increment arc_loaned_bytes by the wrong value

illumos/illumos-gate@9be12bd737

arc_loan_compressed_buf() increments arc_loaned_bytes by psize unconditionally
In the case of zfs_compressed_arc_enabled=0, when the buf is returned via
arc_return_buf(), if ARC_BUF_COMPRESSED(buf) is false, then arc_loaned_bytes
is decremented by lsize, not psize.

Switch to using arc_buf_size(buf), instead of psize, which will return
psize or lsize, depending on the result of ARC_BUF_COMPRESSED(buf).

Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Garrett D'Amore <garrett@damore.org>
Author: Allan Jude <allanjude@freebsd.org>
2018-03-28 22:50:05 +00:00
mav
e6907ec1f0 MFV r331706:
9235 rename zpool_rewind_policy_t to zpool_load_policy_t

illumos/illumos-gate@5dafeea3eb

We want to be able to pass various settings during import/open of a pool,
which are not only related to rewind. Instead of adding a new policy and
duplicate a bunch of code, we should just rename rewind_policy to a more
generic term like load_policy.

For instance, we'd like to set spa->spa_import_flags from the nvlist,
rather from a flags parameter passed to spa_import as in some cases we want
those flags not only for the import case, but also for the open case. One
such flag could be ZFS_IMPORT_MISSING_LOG (as used in zdb) which would
allow zfs to open a pool when logs are missing.

Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Pavel Zakharov <pavel.zakharov@delphix.com>
2018-03-28 22:29:11 +00:00