Commit Graph

120763 Commits

Author SHA1 Message Date
Justin Hibbits
6708989b60 Remove platform_cpu_idle() and platform_cpu_idle_wakeup() interfaces
These interfaces were put in place to let QorIQ SoCs dictate CPU idling
semantics, in order to support capabilities such as NAP mode and deep sleep.
However, this never stabilized, and the idling support reverted back to
CPU-level rather than SoC level.  Move this code back to cpu.c instead.  If
at a later date the lower power modes do come to fruition, it should be done
by overriding the cpu_idle_hook instead of this platform hook.
2018-02-24 01:46:56 +00:00
Jung-uk Kim
0ef8c0cb57 Partially revert r197863 to reduce diff against i386.
When I wrote the patch, I wanted to remove SYSINIT() usage from amd64 code.
There is no reason to keep the divergence any more because iwasaki merged
most amd64 suspend/resume code to i386 with r235622.  Note this also fixed
an enge case reported by royger. [1]

Suggested by:		jhb
Reviewed by:		royger
Tested by:		royger [1]
MFC after:		1 week
Differential Revision:	https://reviews.freebsd.org/D14400 [1]
2018-02-24 01:24:57 +00:00
Conrad Meyer
63901c0171 kern/sys_generic.c: style(9) return(foo) -> return (foo)
No functional change.

Sponsored by:	Dell EMC Isilon
2018-02-24 01:15:33 +00:00
Mark Johnston
5f70fb1425 Correct some comments after r328954.
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D14486
2018-02-23 23:27:53 +00:00
Mark Johnston
9140bff7ed Remove a bogus assertion from vm_page_launder().
After r328977, a wired page m may have m->queue != PQ_NONE.

Reviewed by:	kib
X-MFC with:	r328977
Differential Revision:	https://reviews.freebsd.org/D14485
2018-02-23 23:25:22 +00:00
Jeff Roberson
5f8cd1c0bf Add a generic Proportional Integral Derivative (PID) controller algorithm and
use it to regulate page daemon output.

This provides much smoother and more responsive page daemon output, anticipating
demand and avoiding pageout stalls by increasing the number of pages to match
the workload.  This is a reimplementation of work done by myself and mlaier at
Isilon.

Reviewed by:	bsdimp
Tested by:	pho
Sponsored by:	Netflix, Dell/EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D14402
2018-02-23 22:51:51 +00:00
Kirk McKusick
16680b6af5 Include error number in the "fsync: giving up on dirty" message
(in case it ever starts happening again in spite of 328444).

Submitted by: Andreas Longwitz <longwitz at incore.de>
2018-02-23 21:57:10 +00:00
Conrad Meyer
849ce31a82 Remove unused error return from API that cannot fail
No implementation of fpu_kern_enter() can fail, and it was causing needless
error checking boilerplate and confusion. Change the return code to void to
match reality.

(This trivial change took nine days to land because of the commit hook on
sys/dev/random.  Please consider removing the hook or otherwise lowering the
bar -- secteam never seems to have free time to review patches.)

Reported by:	Lachlan McIlroy <Lachlan.McIlroy AT isilon.com>
Reviewed by:	delphij
Approved by:	secteam (delphij)
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D14380
2018-02-23 20:15:19 +00:00
Ed Maste
5e246cb89b Correct typo in ATA_WRITE_UNCORRECTABLE_PSEUDO
Also correct a typo in the comment for these values, noted by jimharris.

Reviewed by:	jimharris
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D3715
2018-02-23 20:01:42 +00:00
Ed Maste
716cfaab96 Use linux types for linux-specific syscalls
Sponsored by:	Turing Robotic Industries Inc.
Differential Revision:	https://reviews.freebsd.org/D14065
2018-02-23 19:09:27 +00:00
Konstantin Belousov
4c8a8cfcde Restore UP build.
Reviewed by:	truckman
Sponsored by:	The FreeBSD Foundation
2018-02-23 18:26:31 +00:00
Ed Maste
315fbaeca2 Correct pseudo misspelling in sys/ comments
contrib code and #define in intel_ata.h unchanged.
2018-02-23 18:15:50 +00:00
Warner Losh
2d87718fda Use bool instead of int for predicate functions relating to work
available.
2018-02-23 16:06:54 +00:00
Konstantin Belousov
33099716f3 Do not return out of bound pointers from intr_lookup_source().
This hardens the code against driver and upper level bugs causing
invalid indexes used, e.g. on msi release.

Reported by:	gallatin
Reviewed by:	gallatin, hselasky
Sponsored by:	Mellanox Technologies
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D14470
2018-02-23 11:20:59 +00:00
Wojciech Macek
3c41c1d446 powerpc64: add NVMe to GENERIC64
NVMe support is ready and should be compiled-in
to the ppc64 kernel.

Submitted by:          Wojciech Macek <wma@semihalf.org>
Obtained from:         Semihalf
Sponsored by:          IBM, QCM Technologies
2018-02-23 07:43:52 +00:00
Warner Losh
ef1fcaf0f5 Do not include float interfaces when using libsa.
We don't support float in the boot loaders, so don't include
interfaces for float or double in systems headers. In addition, take
the unusual step of spiking double and float to prevent any more
accidental seepage.
2018-02-23 04:04:25 +00:00
David C Somayajulu
b65c0c07b2 1. Added support to offline a port if is error recovery on successful.
2. Sysctls to enable/disable driver_state_dump and error_recovery.
3. Sysctl to control the delay between hw/fw reinitialization and
   restarting the fastpath.
4. Stop periodic stats retrieval if interface has IFF_DRV_RUNNING flag off.
5. Print contents of PEG_HALT_STATUS1 and PEG_HALT_STATUS2 on heartbeat
   failure.
6. Speed up slowpath shutdown during error recovery.
7. link_state update using atomic_store.
8. Added timestamp information on driver state and minidump captures.
9. Added support for Slowpath event logging
10.Added additional failure injection types to simulate failures.
2018-02-23 03:36:24 +00:00
Don Lewis
97e9382d56 Decrease latency by not wrapping the idle loop's potentially lengthy
search for a thread to steal inside a critical section.  Since this
allows the search to be preempted, restart the search if preemption
happens since the search results found earlier may no longer be
valid.

Decrease the latency of starting a thread that may be assigned to
this CPU during the search by polling for incoming threads during
the search and switching to that thread instead of continuing the
search.

Test for stale search results and restart the search before going
through the expense of calling tdq_lock_pair().  Retry some tests
after grabbing the locks since things may have changed while waiting
to get both locks.

Eliminate special case handling for stealing from an SMT peer that
uses 1 as the steal threshold.  This can only succeed if a thread
has been assigned but our SMT peer has not yet started executing
it.  This is quite rare and when it happens the other SMT thread
is generally waiting for the same tdq lock that we hold.  Basically
both SMT threads are racing to grab the same spin lock.

Add the kern.sched.always_steal knob from a ULE patch by jeff@.

Incorporate another idea from Jeff's ULE patch.  If the sched_switch()
detects that the CPU is about to go idle, try to steal a thread
before switching to the idle thread.  Since the search for a thread
to steal has to be done inside a critical section in this context,
limit the impact on latency by adding the knob kern.sched.trysteal_limit
to limit the topological distance of the search and don't restart
the search if we detect stale results.  If this search can't find
an stealable thread, the idle loop can do a more complete search.
Also poll for threads being assigned to this CPU during the search
and switch to them instead of continuing the search.  This change
is responsibile for the majority of the improvement in parallel
buildworld times.

In sched_balance_group() change the minimum threshold from stealing
a thread from 1 to 2.  Poaching a newly assigned thread from a CPU
that is waking up hasn't yet switched to that thread from idle is
likely very rare and is likely to have the same lock race as is
seen when stealing threads in the idle loop.  Also use tdq_notify()
to kick the destintation CPU instead of always sending an IPI.
Update a stale comment, the number of transferable threads is not
calculated.

Reviewed by:	kib (earlier version)
Comments by:	avg, jeff, mav
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D12130
2018-02-23 00:12:51 +00:00
Ravi Pokala
dcd935dfd1 jedec_dimm(4): report asset info and temperatures for DDR3 and DDR4 DIMMs
A super-set of the functionality of jedec_ts(4). jedec_dimm(4) reports asset
information (Part Number, Serial Number) encoded in the "Serial Presence
Detect" (SPD) data on JEDEC DDR3 and DDR4 DIMMs. It also calculates and
reports the memory capacity of the DIMM, in megabytes. If the DIMM includes
a "Thermal Sensor On DIMM" (TSOD), the temperature is also reported.

Reviewed by:	cem
MFC after:	1 week
Relnotes:	yes
Sponsored by:	Panasas
Differential Revision:	https://reviews.freebsd.org/D14392
Discussed with:	avg, cem
Tested by:	avg, cem (previous version, no semantic changes)
2018-02-22 23:18:46 +00:00
Ian Lepore
363b2c7fd2 Add a missing line continuation.
How many commits does it take to get a simple module makefile working?
Apparently at least three.

Pointy hat to:  ian
2018-02-22 22:25:26 +00:00
Mateusz Guzik
a0c722bdbf Fix up sysctl vfs.buffercache broken in r329612
Sample problem:
top: sysctl(vfs.bufspace...) expected 8, got 4

Reported by:	O. Hartmann <ohartmann walstatt.org>
2018-02-22 20:39:25 +00:00
Oleksandr Tymoshenko
94b8a54ae6 [chvgpio] add GPIO driver for Intel Z8xxx SoC family
Add chvgpio(4) driver for Intel Z8xxx SoC family. This product
was formerly known as Cherry Trail but Linux and OpenBSD drivers
refer to it as Cherry View. This driver is derived from OpenBSD
one so the name is kept for alignment with another BSD system.

Submitted by:	Tom Jones <tj@enoti.me>
Reviewed by:	gonzo, wblock(man page)
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D13086
2018-02-22 19:12:32 +00:00
Eric van Gyzen
0127914caa sched_ule: update a comment to reflect reality
MFC after:	3 days
Sponsored by:	Dell EMC
2018-02-22 17:09:26 +00:00
Kyle Evans
afdc2600c2 nvme: Unbreak LE builds after r329824
The parameter 'p' is unused if _BYTE_ORDER == _LITTLE_ENDIAN. Add in a
(void)p to fix the build.
2018-02-22 16:16:49 +00:00
Hans Petter Selasky
949440623b Return correct error code to user-space when a system call receives a
signal in the LinuxKPI.

The read(), write() and mmap() system calls can return either EINTR or
ERESTART upon receiving a signal. Add code to figure out the correct
return value by temporarily storing the return code from the relevant
FreeBSD kernel APIs in the Linux task structure.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2018-02-22 15:29:19 +00:00
Wojciech Macek
0d787e9b35 NVMe: Add big-endian support
Remove bitfields from defined structures as they are not portable.
Instead use shift and mask macros in the driver and nvmecontrol application.

NVMe is now working on powerpc64 host.

Submitted by:          Michal Stanek <mst@semihalf.com>
Obtained from:         Semihalf
Reviewed by:           imp, wma
Sponsored by:          IBM, QCM Technologies
Differential revision: https://reviews.freebsd.org/D13916
2018-02-22 13:32:31 +00:00
Andriy Gapon
de2cb430ad another rework of getzfsvfs / getzfsvfs_impl code
This change is designed to account for yet another difference between
illumos and FreeBSD VFS.  In FreeBSD a filesystem driver is supposed to
clean up mnt_data in its VFS_UNMOUNT method because it's the last call
into the driver before a struct mount object is destroyed.  The VFS
drains all references to the object before destroying it, but for the
driver it's already as good as gone.
In contrast, illumos VFS provides another method, VFS_FREEVFS, that is
called when all references are drained.  So, the driver can keep its
data after VFS_UNMOUNT and clean it up in VFS_FREEVFS after all
references are gone. This is what ZFS does on illumos.
So there a reference to a filesystem is sufficient to guarantee that the
ZFS specific data, aka zfsvfs_t, stays around (even if the filesystem
gets unmounted).  In FreeBSD we need to vfs_busy the filesystem to get
the same guarantee.  vfs_ref guarantees only that the struct mount is
kept.

The following rules should be observed in getzfsvfs / getzfsvfs_impl on
FreeBSD:
- if we need access to zfsvfs_t then we must use vfs_busy
- if only we need to access struct mount (aka vfs_t), then vfs_ref is
  enough
- when illumos code actually needs only the vfs_t, they still can pass
  the zfsvfs_t and get the vfs_t from it;  that can work in FreeBSD if
  the filesystem is busied, but when it's just referenced then we have
  to pass the vfs_t explicitly
- we cannot call vfs_busy while holding a dataset because that creates a
  LOR with dp_config_rwlock

As a result:
- getzfsvfs_impl now only references the filesystem, same as in illumos,
  but unlike illumos it has to return the vfs_t
- the consumers are updated to account for the change
- getzfsvfs busies the filesystem (and drops the reference from
  getzfsvfs_impl)

Also, zfs_unmount_snap() now gets a busied a filesystem, references it
and then unbusies it essentially reverting actions done in getzfsvfs.
This is needed because the code may perform some checks that require the
zfsvfs_t.  So, those are done before the unbusying.

MFC after:	2 weeks
2018-02-22 13:06:27 +00:00
Warner Losh
07e5967a22 Revert r329814 as well. It should have been in r329819. 2018-02-22 11:51:50 +00:00
Andriy Gapon
8d69fe5cc8 followup to r329556, completely remove the covered vnode assert
vrele() acquires the vnode lock only if the hold count drops to zero.
In other scenarios it needs only the interlock.  So,
zfsctl_snapdir_lookup() can race with vfs_mount_destroy() -> vrele()
such that the lookup adds a new reference and then vrele() drops the
mountpoint's reference and only then we check the reference count.
It would be just one in this case.

In fact, the assert should have been removed in r323483 when the code
learned how to deal with the uncovered vnode.

PR:		225795
MFC after:	4 days
X-MFC with:	r329556
2018-02-22 11:41:00 +00:00
Warner Losh
0028abe633 Backout r329818, r329816 and r329815.
These aren't the commits I thought I was testing prior to
commit. Revert until I can sort out what happened and fix it.
2018-02-22 11:18:33 +00:00
Warner Losh
91acaad987 Fix typo in last commit after last rebase before commit... 2018-02-22 10:55:23 +00:00
Warner Losh
4d87e27125 Combine BIO_DELETE requests for nda devices
Now that we're queueing BIO_DELETE requests in the CAM I/O scheduler,
it make sense to try to combine as many as possible into a single
request to send down to hardware. Hopefully, lots of larger requests
like this are better than lots of individual transactions.

Note for future: need to limit based on total size of the trim
request. Should also collapse adjacent ranges where possible to
increase the size of the max payload.

Sponsored by: Netflix
2018-02-22 05:44:00 +00:00
Warner Losh
c5fe3ae9b8 Introduce capacity flags for periphs
Introduce flags word to describe the capacities of the peripheral.
First bit will describe if the periph driver allows multiple
outstanding TRIMS to be active in a device.

Modify the I/O scheduler so that the nda driver can queue trims
for a while after the first one arrives. We'll queue until we see
a I/O scheduler tick, then we'll schedule as many TRIMs as allowed
by other factors (currently this is slocts in the NVMe controller).
This mariginally helps the read latency issues we see with reads,
but sets the stage for the nda driver to do TRIM collapsing like the
da and ada drivers do today.

Sponsored by: Netflix
2018-02-22 05:43:55 +00:00
Warner Losh
c9878d6d63 Note when we tick.
To help implement a policy of 'queue all trims until next I/O sched
tick' policy to help coalesce them, note when we tick so we can do
something special on the first call after the tick to get more work.

Sponsored by: Netflix
2018-02-22 05:43:50 +00:00
Warner Losh
f2b9885036 Wrap an extra long line
This debugging line is too big for even my largest xterm. wrap it at
about 80 columns.

Sponsored by: Netflix
2018-02-22 05:43:45 +00:00
Warner Losh
97f8aa050e Don't sort TRIMs.
While the code for ada and da both assume that the trim list is
ordered when doing the coaleascing the TRIMs, it turns out that
creating the sorted list uses more resources than are saved by having
slightly fewer trims sent to the device.

Sponsored by: Netflix
2018-02-22 05:43:20 +00:00
Alexander Motin
dd9ceab333 MFV r329803:
9080 recursive enter of vdev_indirect_rwlock from vdev_indirect_remap()

illumos/illumos-gate@bdfded42e6

A scenario came up where a callback executed by vdev_indirect_remap() on a vdev, calls
vdev_indirect_remap() on the same vdev and tries to reacquire vdev_indirect_rwlock that
was already acquired from the first call to vdev_indirect_remap(). The specific scenario,
is that we want to remap a block pointer that is snapshoted but its dataset's remap_deadlist
is not cached. So in order to add it we issue a read through a vdev_indirect_remap() on the
same vdev, which brings up the aforementioned issue.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Hans Rosenfeld <rosenfeld@grumpf.hope-2000.org>
Author: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
2018-02-22 03:54:59 +00:00
Alexander Motin
064827be34 MFV r329799, r329800:
9079 race condition in starting and ending condesing thread for indirect vdevs

illumos/illumos-gate@667ec66f1b

The timeline of the race condition is the following:
[1] Thread A is about to finish condesing the first vdev in spa_condense_indirect_thread(),
so it calls the spa_condense_indirect_complete_sync() sync task which sets the
spa_condensing_indirect field to NULL. Waiting for the sync task to finish, thread A
sleeps until the txg is done. When this happens, thread A will acquire spa_async_lock
and set spa_condense_thread to NULL.
[2] While thread A waits for the txg to finish, thread B which is running spa_sync() checks
whether it should condense the second vdev in vdev_indirect_should_condense() by checking
the spa_condensing_indirect field which was set to NULL by spa_condense_indirect_thread()
from thread A. So it goes on and tries to spawn a new condensing thread in
spa_condense_indirect_start_sync() and the aforementioned assertions fails because thread A
has not set spa_condense_thread to NULL (which is basically the last thing it does before
returning).

The main issue here is that we rely on both spa_condensing_indirect and spa_condense_thread to
signify whether a condensing thread is running. Ideally we would only use one throughout the
codebase. In addition, for managing spa_condense_thread we currently use spa_async_lock which
basically tights condensing to scrubing when it comes to pausing and resuming those actions
during spa export.

Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Approved by: Hans Rosenfeld <rosenfeld@grumpf.hope-2000.org>
Author: Serapheim Dimitropoulos <serapheim@delphix.com>
2018-02-22 03:49:06 +00:00
Ed Maste
a0409b6f36 Remove accidental vim droppings
Reported by:	cy
2018-02-22 03:37:01 +00:00
Alexander Motin
1ea10a60f9 MFV r329793, r329795:
9075 Improve ZFS pool import/load process and corrupted pool recovery

illumos/illumos-gate@6f7938128a

Some work has been done lately to improve the debugability of the ZFS pool
load (and import) process. This includes:

https://www.illumos.org/issues/7638: Refactor spa_load_impl into several functions
https://www.illumos.org/issues/8961: SPA load/import should tell us why it failed
https://www.illumos.org/issues/7277: zdb should be able to print zfs_dbgmsg's

To iterate on top of that, there's a few changes that were made to make the
import process more resilient and crash free. One of the first tasks during the
pool load process is to parse a config provided from userland that describes
what devices the pool is composed of. A vdev tree is generated from that config,
and then all the vdevs are opened.

The Meta Object Set (MOS) of the pool is accessed, and several metadata objects
that are necessary to load the pool are read. The exact configuration of the
pool is also stored inside the MOS. Since the configuration provided from
userland is external and might not accurately describe the vdev tree
of the pool at the txg that is being loaded, it cannot be relied upon to safely
operate the pool. For that reason, the configuration in the MOS is read early
on. In the past, the two configurations were compared together and if there was
a mismatch then the load process was aborted and an error was returned.

The latter was a good way to ensure a pool does not get corrupted, however it
made the pool load process needlessly fragile in cases where the vdev
configuration changed or the userland configuration was outdated. Since the MOS
is stored in 3 copies, the configuration provided by userland doesn't have to be
perfect in order to read its contents. Hence, a new approach has been adopted:
The pool is first opened with the untrusted userland configuration just so that
the real configuration can be read from the MOS. The trusted MOS configuration
is then used to generate a new vdev tree and the pool is re-opened.

When the pool is opened with an untrusted configuration, writes are disabled
to avoid accidentally damaging it. During reads, some sanity checks are
performed on block pointers to see if each DVA points to a known vdev;
when the configuration is untrusted, instead of panicking the system if those
checks fail we simply avoid issuing reads to the invalid DVAs.

This new two-step pool load process now allows rewinding pools accross
vdev tree changes such as device replacement, addition, etc. Loading a pool
from an external config file in a clustering environment also becomes much
safer now since the pool will import even if the config is outdated and didn't,
for instance, register a recent device addition.

With this code in place, it became relatively easy to implement a
long-sought-after feature: the ability to import a pool with missing top level
(i.e. non-redundant) devices. Note that since this almost guarantees some loss
Of data, this feature is for now restricted to a read-only import.

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
Approved by: Hans Rosenfeld <rosenfeld@grumpf.hope-2000.org>
Author: Pavel Zakharov <pavel.zakharov@delphix.com>
2018-02-22 03:15:35 +00:00
John Baldwin
642ffab5fc Avoid grabbing locks when grabbing the vt(4) console for DDB.
Trying to grab locks during cngrab() when entering the debugger is
deadlock prone as all other CPUs are already halted (and thus unable
to release locks) when cngrab() is invoked.  One could instead use
try-locks.  However, the case that the try-lock fails still has to
be handled.  In addition, if the try-lock works it doesn't provide
any greater ordering guarantees than is already provided by entering
and exiting DDB.  It is simpler to define a simpler path for the
case that the try-lock would fail and always use that when entering
DDB.  Messing with timers, etc. when entering DDB is dubious even if
the try-lock succeeds.

This patch attempts to use the smallest possible set of operations to
grab the vt(4) console when entering DDB without using any locks.

Reviewed by:	emaste
Tested by:	Matthew Macy
MFC after:	1 week
2018-02-22 02:26:29 +00:00
Ed Maste
eae594f7d5 Correct proper nouns in the Linuxulator
- Capitalize Linux
- Spell FreeBSD out in full
- Address some style(9) on changed lines

Sponsored by:	Turing Robotic Industries Inc.
2018-02-22 02:24:17 +00:00
John Baldwin
6619d9fb70 Bring in additional constants and message fields for TLS-related messages.
Sponsored by:	Chelsio Communications
2018-02-22 02:02:31 +00:00
Ed Maste
581bf7cbda Use 'const int *' for sysentvec errno translation table
This allows an sv_errtbl to be read-only .rodata.

Sponsored by:	Turing Robotic Industries Inc.
2018-02-22 01:59:59 +00:00
John Baldwin
125d42fe81 Move DDP PCB state into a helper structure.
This consolidates all of the DDP state in one place.  Also, the code has
now been fixed to ensure that DDP state is only accessed for DDP
connections.  This should not be a functional change but makes it cleaner
and easier to add state for other TOE socket modes in the future.

MFC after:	1 month
Sponsored by:	Chelsio Communications
2018-02-22 01:50:30 +00:00
Alexander Motin
613b0d87da 8942 zfs promote .../%recv should be an error
illumos/illumos-gate@add927f8c8

Reported on the ZFSonLinux https://github.com/zfsonlinux/zfs/issues/4843,
fixed by https://github.com/zfsonlinux/zfs/pull/6339:

If we are in the middle of an incremental zfs receive, the child .../%recv
will exist. If you concurrently run zfs promote .../%recv, it will "work",
but then zfs gets confused. For example, there's no obvious way to destroy
the containing filesystem (because it is now a clone of its invisible child).

Attempting to do this promote should be an error. We could fix this by
having zfs_ioc_promote() check if zc_name contains a %, similar to
zfs_ioc_rename().

Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: loli10K <ezomori.nozomu@gmail.com>
2018-02-22 01:42:13 +00:00
Alexander Motin
a33ba3dbde MFV r329776: 8477 Assertion failed in vdev_state_dirty(): spa_writeable(spa)
illumos/illumos-gate@f4c1745bd6

Illumos 4080 allows "zpool clear" to work on readonly pools: i don't think
this is the intended behaviour, we shouldn't be allowed to clear readonly
pools. Probably.

A fix is already in the ZFS on Linux repository to addess this issue:
92e43c1718

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: loli10K <ezomori.nozomu@gmail.com>
2018-02-22 01:00:46 +00:00
Alexander Motin
eea9be67e6 MFV r329774:
8408 dsl_props_set_sync_impl() does not handle nested nvlists correctly

illumos/illumos-gate@85723e5eec

When iterating over the input nvlist in dsl_props_set_sync_impl() when we
don't preserve the nvpair name before looking up ZPROP_VALUE, so when we
later go to process it nvpair_name() is always "value" instead of the actual
property name.

This results in a couple of bugs in the recv code:

 - received properties are not restored correctly when failing to receive
   an incremental send stream
 - received properties are not completely replaced by the new ones when
   successfully receiving an incremental send stream

This was discovered on ZFS on Linux (fixed in
5f1346c299)

Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: loli10K <ezomori.nozomu@gmail.com>
2018-02-22 00:55:25 +00:00
Alexander Motin
756595f675 MFV r329770: 9035 zfs: this statement may fall through
illumos/illumos-gate@46ac8fdfc5

Reviewed by: Yuri Pankov <yuripv@yuripv.net>
Reviewed by: Andy Fiddaman <omnios@citrus-it.co.uk>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Toomas Soome <tsoome@me.com>
2018-02-22 00:47:38 +00:00
Alexander Motin
502d18a8f1 MFV r329766: 8962 zdb should work on non-idle pools
illumos/illumos-gate@e144c4e6c9

Currently `zdb` consistently fails to examine non-idle pools as it fails
during the `spa_load()` process. The main problem seems to be that
`spa_load_verify()` fails as can be seen below:

$ sudo zdb -d -G dcenter
    zdb: can't open 'dcenter': I/O error

ZFS_DBGMSG(zdb):
    spa_open_common: opening dcenter
    spa_load(dcenter): LOADING
    disk vdev '/dev/dsk/c4t11d0s0': best uberblock found for spa dcenter. txg 40824950
    spa_load(dcenter): using uberblock with txg=40824950
    spa_load(dcenter): UNLOADING
    spa_load(dcenter): RELOADING
    spa_load(dcenter): LOADING
    disk vdev '/dev/dsk/c3t10d0s0': best uberblock found for spa dcenter. txg 40824952
    spa_load(dcenter): using uberblock with txg=40824952
    spa_load(dcenter): FAILED: spa_load_verify failed [error=5]
    spa_load(dcenter): UNLOADING

This change makes `spa_load_verify()` a dryrun when ran from `zdb`. This is
done by creating a global flag in zfs and then setting it in `zdb`.

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Andy Stormont <astormont@racktopsystems.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Pavel Zakharov <pavel.zakharov@delphix.com>
2018-02-22 00:42:12 +00:00