search for a thread to steal inside a critical section. Since this
allows the search to be preempted, restart the search if preemption
happens since the search results found earlier may no longer be
valid.
Decrease the latency of starting a thread that may be assigned to
this CPU during the search by polling for incoming threads during
the search and switching to that thread instead of continuing the
search.
Test for stale search results and restart the search before going
through the expense of calling tdq_lock_pair(). Retry some tests
after grabbing the locks since things may have changed while waiting
to get both locks.
Eliminate special case handling for stealing from an SMT peer that
uses 1 as the steal threshold. This can only succeed if a thread
has been assigned but our SMT peer has not yet started executing
it. This is quite rare and when it happens the other SMT thread
is generally waiting for the same tdq lock that we hold. Basically
both SMT threads are racing to grab the same spin lock.
Add the kern.sched.always_steal knob from a ULE patch by jeff@.
Incorporate another idea from Jeff's ULE patch. If the sched_switch()
detects that the CPU is about to go idle, try to steal a thread
before switching to the idle thread. Since the search for a thread
to steal has to be done inside a critical section in this context,
limit the impact on latency by adding the knob kern.sched.trysteal_limit
to limit the topological distance of the search and don't restart
the search if we detect stale results. If this search can't find
an stealable thread, the idle loop can do a more complete search.
Also poll for threads being assigned to this CPU during the search
and switch to them instead of continuing the search. This change
is responsibile for the majority of the improvement in parallel
buildworld times.
In sched_balance_group() change the minimum threshold from stealing
a thread from 1 to 2. Poaching a newly assigned thread from a CPU
that is waking up hasn't yet switched to that thread from idle is
likely very rare and is likely to have the same lock race as is
seen when stealing threads in the idle loop. Also use tdq_notify()
to kick the destintation CPU instead of always sending an IPI.
Update a stale comment, the number of transferable threads is not
calculated.
Reviewed by: kib (earlier version)
Comments by: avg, jeff, mav
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D12130
A super-set of the functionality of jedec_ts(4). jedec_dimm(4) reports asset
information (Part Number, Serial Number) encoded in the "Serial Presence
Detect" (SPD) data on JEDEC DDR3 and DDR4 DIMMs. It also calculates and
reports the memory capacity of the DIMM, in megabytes. If the DIMM includes
a "Thermal Sensor On DIMM" (TSOD), the temperature is also reported.
Reviewed by: cem
MFC after: 1 week
Relnotes: yes
Sponsored by: Panasas
Differential Revision: https://reviews.freebsd.org/D14392
Discussed with: avg, cem
Tested by: avg, cem (previous version, no semantic changes)
Graphics have a tendency to cause 80-col issues, so make an exception to our
standard indentation guidelines for these graphics. This does not hamper
readability too badly.
Two 40-column strings of spaces is trivially replaced with
string.rep(" ", 80)
Add chvgpio(4) driver for Intel Z8xxx SoC family. This product
was formerly known as Cherry Trail but Linux and OpenBSD drivers
refer to it as Cherry View. This driver is derived from OpenBSD
one so the name is kept for alignment with another BSD system.
Submitted by: Tom Jones <tj@enoti.me>
Reviewed by: gonzo, wblock(man page)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D13086
r329725 cleaned up ZFS commands duplicated in multiple places, but userboot
was not setting HAVE_ZFS when MK_ZFS != "no". This resulted in a failure to
boot (as seen in PR 226118) in bhyve, with the following message:
/boot/userboot.so: Undefined symbol "ldi_get_size"
PR: 226118
Glanced at by: imp
signal in the LinuxKPI.
The read(), write() and mmap() system calls can return either EINTR or
ERESTART upon receiving a signal. Add code to figure out the correct
return value by temporarily storing the return code from the relevant
FreeBSD kernel APIs in the Linux task structure.
MFC after: 3 days
Sponsored by: Mellanox Technologies
Remove bitfields from defined structures as they are not portable.
Instead use shift and mask macros in the driver and nvmecontrol application.
NVMe is now working on powerpc64 host.
Submitted by: Michal Stanek <mst@semihalf.com>
Obtained from: Semihalf
Reviewed by: imp, wma
Sponsored by: IBM, QCM Technologies
Differential revision: https://reviews.freebsd.org/D13916
This change is designed to account for yet another difference between
illumos and FreeBSD VFS. In FreeBSD a filesystem driver is supposed to
clean up mnt_data in its VFS_UNMOUNT method because it's the last call
into the driver before a struct mount object is destroyed. The VFS
drains all references to the object before destroying it, but for the
driver it's already as good as gone.
In contrast, illumos VFS provides another method, VFS_FREEVFS, that is
called when all references are drained. So, the driver can keep its
data after VFS_UNMOUNT and clean it up in VFS_FREEVFS after all
references are gone. This is what ZFS does on illumos.
So there a reference to a filesystem is sufficient to guarantee that the
ZFS specific data, aka zfsvfs_t, stays around (even if the filesystem
gets unmounted). In FreeBSD we need to vfs_busy the filesystem to get
the same guarantee. vfs_ref guarantees only that the struct mount is
kept.
The following rules should be observed in getzfsvfs / getzfsvfs_impl on
FreeBSD:
- if we need access to zfsvfs_t then we must use vfs_busy
- if only we need to access struct mount (aka vfs_t), then vfs_ref is
enough
- when illumos code actually needs only the vfs_t, they still can pass
the zfsvfs_t and get the vfs_t from it; that can work in FreeBSD if
the filesystem is busied, but when it's just referenced then we have
to pass the vfs_t explicitly
- we cannot call vfs_busy while holding a dataset because that creates a
LOR with dp_config_rwlock
As a result:
- getzfsvfs_impl now only references the filesystem, same as in illumos,
but unlike illumos it has to return the vfs_t
- the consumers are updated to account for the change
- getzfsvfs busies the filesystem (and drops the reference from
getzfsvfs_impl)
Also, zfs_unmount_snap() now gets a busied a filesystem, references it
and then unbusies it essentially reverting actions done in getzfsvfs.
This is needed because the code may perform some checks that require the
zfsvfs_t. So, those are done before the unbusying.
MFC after: 2 weeks
vrele() acquires the vnode lock only if the hold count drops to zero.
In other scenarios it needs only the interlock. So,
zfsctl_snapdir_lookup() can race with vfs_mount_destroy() -> vrele()
such that the lookup adds a new reference and then vrele() drops the
mountpoint's reference and only then we check the reference count.
It would be just one in this case.
In fact, the assert should have been removed in r323483 when the code
learned how to deal with the uncovered vnode.
PR: 225795
MFC after: 4 days
X-MFC with: r329556
after r190575 there is an option to call rc.firewall with the firewall_type
passed in as an argument.
Submitted by: David P. Discher <dpd@dpdtech.com>
MFC after: 3 weeks.
Sponsored by: iXsystems Inc.
Differential Revision: https://reviews.freebsd.org/D14286
Now that we're queueing BIO_DELETE requests in the CAM I/O scheduler,
it make sense to try to combine as many as possible into a single
request to send down to hardware. Hopefully, lots of larger requests
like this are better than lots of individual transactions.
Note for future: need to limit based on total size of the trim
request. Should also collapse adjacent ranges where possible to
increase the size of the max payload.
Sponsored by: Netflix
Introduce flags word to describe the capacities of the peripheral.
First bit will describe if the periph driver allows multiple
outstanding TRIMS to be active in a device.
Modify the I/O scheduler so that the nda driver can queue trims
for a while after the first one arrives. We'll queue until we see
a I/O scheduler tick, then we'll schedule as many TRIMs as allowed
by other factors (currently this is slocts in the NVMe controller).
This mariginally helps the read latency issues we see with reads,
but sets the stage for the nda driver to do TRIM collapsing like the
da and ada drivers do today.
Sponsored by: Netflix
To help implement a policy of 'queue all trims until next I/O sched
tick' policy to help coalesce them, note when we tick so we can do
something special on the first call after the tick to get more work.
Sponsored by: Netflix
While the code for ada and da both assume that the trim list is
ordered when doing the coaleascing the TRIMs, it turns out that
creating the sorted list uses more resources than are saved by having
slightly fewer trims sent to the device.
Sponsored by: Netflix
We require some --globals due to custom loader extensions in our
environment. Add everything required for this to tools/boot so that other
interested parties can get up and go with linting our scripts and not get a
bunch of false-positives.
luacheck pointed out an assortment of issues, ranging from non-standard
globals being created as well as unused parameters, variables, and redundant
assignments.
Using '_' as a placeholder for values unused (whether it be parameters
unused or return values unused, assuming multiple return values) feels clean
and gets the point across, so I've adopted it. It also helps flag candidates
for cleanup later in some of the lambdas I've created, giving me an easy way
to re-evaluate later if we're still not using some of these features.
8940 Sending an intra-pool resumable send stream may result in EXDEV
illumos/illumos-gate@544132fce3
"zfs send -t <token>" for an incremental send should be able to resume
successfully when sending to the same pool: a subtle issue in
zfs_iter_children() doesn't currently allow this.
Because resuming from a token requires "guid" -> "dataset" mapping
(guid_to_name()), we have to walk the whole hierarchy to find the right
snapshots to send.
When resuming an incremental send both source and destination live in the
same pool and have the same guid: this is where zfs_iter_children() gets
confused and picks up the wrong snapshot, so we end up trying to send an
incremental "destination@snap1 -> source@snap2" stream instead of
"source@snap1 -> source@snap2": this fails with an "Invalid cross-device
link" (EXDEV) error.
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Hans Rosenfeld <rosenfeld@grumpf.hope-2000.org>
Author: loli10K <ezomori.nozomu@gmail.com>
illumos/illumos-gate@544132fce3
"zfs send -t <token>" for an incremental send should be able to resume
successfully when sending to the same pool: a subtle issue in
zfs_iter_children() doesn't currently allow this.
Because resuming from a token requires "guid" -> "dataset" mapping
(guid_to_name()), we have to walk the whole hierarchy to find the right
snapshots to send.
When resuming an incremental send both source and destination live in the
same pool and have the same guid: this is where zfs_iter_children() gets
confused and picks up the wrong snapshot, so we end up trying to send an
incremental "destination@snap1 -> source@snap2" stream instead of
"source@snap1 -> source@snap2": this fails with an "Invalid cross-device
link" (EXDEV) error.
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Hans Rosenfeld <rosenfeld@grumpf.hope-2000.org>
Author: loli10K <ezomori.nozomu@gmail.com>
9080 recursive enter of vdev_indirect_rwlock from vdev_indirect_remap()
illumos/illumos-gate@bdfded42e6
A scenario came up where a callback executed by vdev_indirect_remap() on a vdev, calls
vdev_indirect_remap() on the same vdev and tries to reacquire vdev_indirect_rwlock that
was already acquired from the first call to vdev_indirect_remap(). The specific scenario,
is that we want to remap a block pointer that is snapshoted but its dataset's remap_deadlist
is not cached. So in order to add it we issue a read through a vdev_indirect_remap() on the
same vdev, which brings up the aforementioned issue.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Hans Rosenfeld <rosenfeld@grumpf.hope-2000.org>
Author: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
illumos/illumos-gate@bdfded42e6
A scenario came up where a callback executed by vdev_indirect_remap() on a vdev, calls
vdev_indirect_remap() on the same vdev and tries to reacquire vdev_indirect_rwlock that
was already acquired from the first call to vdev_indirect_remap(). The specific scenario,
is that we want to remap a block pointer that is snapshoted but its dataset's remap_deadlist
is not cached. So in order to add it we issue a read through a vdev_indirect_remap() on the
same vdev, which brings up the aforementioned issue.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Hans Rosenfeld <rosenfeld@grumpf.hope-2000.org>
Author: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
9079 race condition in starting and ending condesing thread for indirect vdevs
illumos/illumos-gate@667ec66f1b
The timeline of the race condition is the following:
[1] Thread A is about to finish condesing the first vdev in spa_condense_indirect_thread(),
so it calls the spa_condense_indirect_complete_sync() sync task which sets the
spa_condensing_indirect field to NULL. Waiting for the sync task to finish, thread A
sleeps until the txg is done. When this happens, thread A will acquire spa_async_lock
and set spa_condense_thread to NULL.
[2] While thread A waits for the txg to finish, thread B which is running spa_sync() checks
whether it should condense the second vdev in vdev_indirect_should_condense() by checking
the spa_condensing_indirect field which was set to NULL by spa_condense_indirect_thread()
from thread A. So it goes on and tries to spawn a new condensing thread in
spa_condense_indirect_start_sync() and the aforementioned assertions fails because thread A
has not set spa_condense_thread to NULL (which is basically the last thing it does before
returning).
The main issue here is that we rely on both spa_condensing_indirect and spa_condense_thread to
signify whether a condensing thread is running. Ideally we would only use one throughout the
codebase. In addition, for managing spa_condense_thread we currently use spa_async_lock which
basically tights condensing to scrubing when it comes to pausing and resuming those actions
during spa export.
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Approved by: Hans Rosenfeld <rosenfeld@grumpf.hope-2000.org>
Author: Serapheim Dimitropoulos <serapheim@delphix.com>
illumos/illumos-gate@667ec66f1b
The timeline of the race condition is the following:
[1] Thread A is about to finish condesing the first vdev in spa_condense_indirect_thread(),
so it calls the spa_condense_indirect_complete_sync() sync task which sets the
spa_condensing_indirect field to NULL. Waiting for the sync task to finish, thread A
sleeps until the txg is done. When this happens, thread A will acquire spa_async_lock
and set spa_condense_thread to NULL.
[2] While thread A waits for the txg to finish, thread B which is running spa_sync() checks
whether it should condense the second vdev in vdev_indirect_should_condense() by checking
the spa_condensing_indirect field which was set to NULL by spa_condense_indirect_thread()
from thread A. So it goes on and tries to spawn a new condensing thread in
spa_condense_indirect_start_sync() and the aforementioned assertions fails because thread A
has not set spa_condense_thread to NULL (which is basically the last thing it does before
returning).
The main issue here is that we rely on both spa_condensing_indirect and spa_condense_thread to
signify whether a condensing thread is running. Ideally we would only use one throughout the
codebase. In addition, for managing spa_condense_thread we currently use spa_async_lock which
basically tights condensing to scrubing when it comes to pausing and resuming those actions
during spa export.
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Approved by: Hans Rosenfeld <rosenfeld@grumpf.hope-2000.org>
Author: Serapheim Dimitropoulos <serapheim@delphix.com>
9075 Improve ZFS pool import/load process and corrupted pool recovery
illumos/illumos-gate@6f7938128a
Some work has been done lately to improve the debugability of the ZFS pool
load (and import) process. This includes:
https://www.illumos.org/issues/7638: Refactor spa_load_impl into several functions
https://www.illumos.org/issues/8961: SPA load/import should tell us why it failed
https://www.illumos.org/issues/7277: zdb should be able to print zfs_dbgmsg's
To iterate on top of that, there's a few changes that were made to make the
import process more resilient and crash free. One of the first tasks during the
pool load process is to parse a config provided from userland that describes
what devices the pool is composed of. A vdev tree is generated from that config,
and then all the vdevs are opened.
The Meta Object Set (MOS) of the pool is accessed, and several metadata objects
that are necessary to load the pool are read. The exact configuration of the
pool is also stored inside the MOS. Since the configuration provided from
userland is external and might not accurately describe the vdev tree
of the pool at the txg that is being loaded, it cannot be relied upon to safely
operate the pool. For that reason, the configuration in the MOS is read early
on. In the past, the two configurations were compared together and if there was
a mismatch then the load process was aborted and an error was returned.
The latter was a good way to ensure a pool does not get corrupted, however it
made the pool load process needlessly fragile in cases where the vdev
configuration changed or the userland configuration was outdated. Since the MOS
is stored in 3 copies, the configuration provided by userland doesn't have to be
perfect in order to read its contents. Hence, a new approach has been adopted:
The pool is first opened with the untrusted userland configuration just so that
the real configuration can be read from the MOS. The trusted MOS configuration
is then used to generate a new vdev tree and the pool is re-opened.
When the pool is opened with an untrusted configuration, writes are disabled
to avoid accidentally damaging it. During reads, some sanity checks are
performed on block pointers to see if each DVA points to a known vdev;
when the configuration is untrusted, instead of panicking the system if those
checks fail we simply avoid issuing reads to the invalid DVAs.
This new two-step pool load process now allows rewinding pools accross
vdev tree changes such as device replacement, addition, etc. Loading a pool
from an external config file in a clustering environment also becomes much
safer now since the pool will import even if the config is outdated and didn't,
for instance, register a recent device addition.
With this code in place, it became relatively easy to implement a
long-sought-after feature: the ability to import a pool with missing top level
(i.e. non-redundant) devices. Note that since this almost guarantees some loss
Of data, this feature is for now restricted to a read-only import.
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
Approved by: Hans Rosenfeld <rosenfeld@grumpf.hope-2000.org>
Author: Pavel Zakharov <pavel.zakharov@delphix.com>
Trying to grab locks during cngrab() when entering the debugger is
deadlock prone as all other CPUs are already halted (and thus unable
to release locks) when cngrab() is invoked. One could instead use
try-locks. However, the case that the try-lock fails still has to
be handled. In addition, if the try-lock works it doesn't provide
any greater ordering guarantees than is already provided by entering
and exiting DDB. It is simpler to define a simpler path for the
case that the try-lock would fail and always use that when entering
DDB. Messing with timers, etc. when entering DDB is dubious even if
the try-lock succeeds.
This patch attempts to use the smallest possible set of operations to
grab the vt(4) console when entering DDB without using any locks.
Reviewed by: emaste
Tested by: Matthew Macy
MFC after: 1 week
9075 Improve ZFS pool import/load process and corrupted pool recovery
illumos/illumos-gate@6f7938128a
Some work has been done lately to improve the debugability of the ZFS pool
load (and import) process. This includes:
https://www.illumos.org/issues/7638: Refactor spa_load_impl into several functions
https://www.illumos.org/issues/8961: SPA load/import should tell us why it failed
https://www.illumos.org/issues/7277: zdb should be able to print zfs_dbgmsg's
To iterate on top of that, there's a few changes that were made to make the
import process more resilient and crash free. One of the first tasks during the
pool load process is to parse a config provided from userland that describes
what devices the pool is composed of. A vdev tree is generated from that config,
and then all the vdevs are opened.
The Meta Object Set (MOS) of the pool is accessed, and several metadata objects
that are necessary to load the pool are read. The exact configuration of the
pool is also stored inside the MOS. Since the configuration provided from
userland is external and might not accurately describe the vdev tree
of the pool at the txg that is being loaded, it cannot be relied upon to safely
operate the pool. For that reason, the configuration in the MOS is read early
on. In the past, the two configurations were compared together and if there was
a mismatch then the load process was aborted and an error was returned.
The latter was a good way to ensure a pool does not get corrupted, however it
made the pool load process needlessly fragile in cases where the vdev
configuration changed or the userland configuration was outdated. Since the MOS
is stored in 3 copies, the configuration provided by userland doesn't have to be
perfect in order to read its contents. Hence, a new approach has been adopted:
The pool is first opened with the untrusted userland configuration just so that
the real configuration can be read from the MOS. The trusted MOS configuration
is then used to generate a new vdev tree and the pool is re-opened.
When the pool is opened with an untrusted configuration, writes are disabled
to avoid accidentally damaging it. During reads, some sanity checks are
performed on block pointers to see if each DVA points to a known vdev;
when the configuration is untrusted, instead of panicking the system if those
checks fail we simply avoid issuing reads to the invalid DVAs.
This new two-step pool load process now allows rewinding pools accross
vdev tree changes such as device replacement, addition, etc. Loading a pool
from an external config file in a clustering environment also becomes much
safer now since the pool will import even if the config is outdated and didn't,
for instance, register a recent device addition.
With this code in place, it became relatively easy to implement a
long-sought-after feature: the ability to import a pool with missing top level
(i.e. non-redundant) devices. Note that since this almost guarantees some loss
Of data, this feature is for now restricted to a read-only import.
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
Approved by: Hans Rosenfeld <rosenfeld@grumpf.hope-2000.org>
Author: Pavel Zakharov <pavel.zakharov@delphix.com>
illumos/illumos-gate@6f7938128a
Some work has been done lately to improve the debugability of the ZFS pool
load (and import) process. This includes:
https://www.illumos.org/issues/7638: Refactor spa_load_impl into several functions
https://www.illumos.org/issues/8961: SPA load/import should tell us why it failed
https://www.illumos.org/issues/7277: zdb should be able to print zfs_dbgmsg's
To iterate on top of that, there's a few changes that were made to make the
import process more resilient and crash free. One of the first tasks during the
pool load process is to parse a config provided from userland that describes
what devices the pool is composed of. A vdev tree is generated from that config,
and then all the vdevs are opened.
The Meta Object Set (MOS) of the pool is accessed, and several metadata objects
that are necessary to load the pool are read. The exact configuration of the
pool is also stored inside the MOS. Since the configuration provided from
userland is external and might not accurately describe the vdev tree
of the pool at the txg that is being loaded, it cannot be relied upon to safely
operate the pool. For that reason, the configuration in the MOS is read early
on. In the past, the two configurations were compared together and if there was
a mismatch then the load process was aborted and an error was returned.
The latter was a good way to ensure a pool does not get corrupted, however it
made the pool load process needlessly fragile in cases where the vdev
configuration changed or the userland configuration was outdated. Since the MOS
is stored in 3 copies, the configuration provided by userland doesn't have to be
perfect in order to read its contents. Hence, a new approach has been adopted:
The pool is first opened with the untrusted userland configuration just so that
the real configuration can be read from the MOS. The trusted MOS configuration
is then used to generate a new vdev tree and the pool is re-opened.
When the pool is opened with an untrusted configuration, writes are disabled
to avoid accidentally damaging it. During reads, some sanity checks are
performed on block pointers to see if each DVA points to a known vdev;
when the configuration is untrusted, instead of panicking the system if those
checks fail we simply avoid issuing reads to the invalid DVAs.
This new two-step pool load process now allows rewinding pools accross
vdev tree changes such as device replacement, addition, etc. Loading a pool
from an external config file in a clustering environment also becomes much
safer now since the pool will import even if the config is outdated and didn't,
for instance, register a recent device addition.
With this code in place, it became relatively easy to implement a
long-sought-after feature: the ability to import a pool with missing top level
(i.e. non-redundant) devices. Note that since this almost guarantees some loss
Of data, this feature is for now restricted to a read-only import.
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
Approved by: Hans Rosenfeld <rosenfeld@grumpf.hope-2000.org>
Author: Pavel Zakharov <pavel.zakharov@delphix.com>
Instead of the global namespace, let's attach these to the cli module. Other
users, including the "local" module, can attach functions to the cli module
at will to add other cli commands and things will still Just Work.
This distills down the candidates for functions that may be invoked via the
cli to a minimal set (boot, autoboot, arguments), rather than any function
that happens to live in the global lua namespace.
This consolidates all of the DDP state in one place. Also, the code has
now been fixed to ensure that DDP state is only accessed for DDP
connections. This should not be a functional change but makes it cleaner
and easier to add state for other TOE socket modes in the future.
MFC after: 1 month
Sponsored by: Chelsio Communications
This will be the translation layer for varargs -> cmd_name, argv for cli
commands. We reserve the right to break exactly what the varargs inclulde,
but this gives us a stable way to pull the arguments out of varargs.
illumos/illumos-gate@add927f8c8
Reported on the ZFSonLinux https://github.com/zfsonlinux/zfs/issues/4843,
fixed by https://github.com/zfsonlinux/zfs/pull/6339:
If we are in the middle of an incremental zfs receive, the child .../%recv
will exist. If you concurrently run zfs promote .../%recv, it will "work",
but then zfs gets confused. For example, there's no obvious way to destroy
the containing filesystem (because it is now a clone of its invisible child).
Attempting to do this promote should be an error. We could fix this by
having zfs_ioc_promote() check if zc_name contains a %, similar to
zfs_ioc_rename().
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: loli10K <ezomori.nozomu@gmail.com>