169 Commits

Author SHA1 Message Date
Alexander Motin
b8a528e092 9166 zfs storage pool checkpoint
illumos/illumos-gate@8671400134

The idea of Storage Pool Checkpoint (aka zpool checkpoint) deals with
exactly that.  It can be thought of as a “pool-wide snapshot” (or a
variation of extreme rewind that doesn’t corrupt your data).  It remembers
the entire state of the pool at the point that it was taken and the user
can revert back to it later or discard it.  Its generic use case is an
administrator that is about to perform a set of destructive actions to ZFS
as part of a critical procedure.  She takes a checkpoint of the pool before
performing the actions, then rewinds back to it if one of them fails or puts
the pool into an unexpected state.  Otherwise, she discards it.  With the
assumption that no one else is making modifications to ZFS, she basically
wraps all these actions into a “high-level transaction”.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
2018-03-28 18:12:06 +00:00
Alexander Motin
34ff7cee7a 8940 Sending an intra-pool resumable send stream may result in EXDEV
illumos/illumos-gate@544132fce3

"zfs send -t <token>" for an incremental send should be able to resume
successfully when sending to the same pool: a subtle issue in
zfs_iter_children() doesn't currently allow this.

Because resuming from a token requires "guid" -> "dataset" mapping
(guid_to_name()), we have to walk the whole hierarchy to find the right
snapshots to send.
When resuming an incremental send both source and destination live in the
same pool and have the same guid: this is where zfs_iter_children() gets
confused and picks up the wrong snapshot, so we end up trying to send an
incremental "destination@snap1 -> source@snap2" stream instead of
"source@snap1 -> source@snap2": this fails with an "Invalid cross-device
link" (EXDEV) error.

Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Hans Rosenfeld <rosenfeld@grumpf.hope-2000.org>
Author: loli10K <ezomori.nozomu@gmail.com>
2018-02-22 04:01:05 +00:00
Alexander Motin
2e4bc6ee5c 9075 Improve ZFS pool import/load process and corrupted pool recovery
illumos/illumos-gate@6f7938128a

Some work has been done lately to improve the debugability of the ZFS pool
load (and import) process. This includes:

https://www.illumos.org/issues/7638: Refactor spa_load_impl into several functions
https://www.illumos.org/issues/8961: SPA load/import should tell us why it failed
https://www.illumos.org/issues/7277: zdb should be able to print zfs_dbgmsg's

To iterate on top of that, there's a few changes that were made to make the
import process more resilient and crash free. One of the first tasks during the
pool load process is to parse a config provided from userland that describes
what devices the pool is composed of. A vdev tree is generated from that config,
and then all the vdevs are opened.

The Meta Object Set (MOS) of the pool is accessed, and several metadata objects
that are necessary to load the pool are read. The exact configuration of the
pool is also stored inside the MOS. Since the configuration provided from
userland is external and might not accurately describe the vdev tree
of the pool at the txg that is being loaded, it cannot be relied upon to safely
operate the pool. For that reason, the configuration in the MOS is read early
on. In the past, the two configurations were compared together and if there was
a mismatch then the load process was aborted and an error was returned.

The latter was a good way to ensure a pool does not get corrupted, however it
made the pool load process needlessly fragile in cases where the vdev
configuration changed or the userland configuration was outdated. Since the MOS
is stored in 3 copies, the configuration provided by userland doesn't have to be
perfect in order to read its contents. Hence, a new approach has been adopted:
The pool is first opened with the untrusted userland configuration just so that
the real configuration can be read from the MOS. The trusted MOS configuration
is then used to generate a new vdev tree and the pool is re-opened.

When the pool is opened with an untrusted configuration, writes are disabled
to avoid accidentally damaging it. During reads, some sanity checks are
performed on block pointers to see if each DVA points to a known vdev;
when the configuration is untrusted, instead of panicking the system if those
checks fail we simply avoid issuing reads to the invalid DVAs.

This new two-step pool load process now allows rewinding pools accross
vdev tree changes such as device replacement, addition, etc. Loading a pool
from an external config file in a clustering environment also becomes much
safer now since the pool will import even if the config is outdated and didn't,
for instance, register a recent device addition.

With this code in place, it became relatively easy to implement a
long-sought-after feature: the ability to import a pool with missing top level
(i.e. non-redundant) devices. Note that since this almost guarantees some loss
Of data, this feature is for now restricted to a read-only import.

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
Approved by: Hans Rosenfeld <rosenfeld@grumpf.hope-2000.org>
Author: Pavel Zakharov <pavel.zakharov@delphix.com>
2018-02-22 02:21:03 +00:00
Alexander Motin
dc763b80f4 8942 zfs promote .../%recv should be an error
illumos/illumos-gate@add927f8c8

Reported on the ZFSonLinux https://github.com/zfsonlinux/zfs/issues/4843,
fixed by https://github.com/zfsonlinux/zfs/pull/6339:

If we are in the middle of an incremental zfs receive, the child .../%recv
will exist. If you concurrently run zfs promote .../%recv, it will "work",
but then zfs gets confused. For example, there's no obvious way to destroy
the containing filesystem (because it is now a clone of its invisible child).

Attempting to do this promote should be an error. We could fix this by
having zfs_ioc_promote() check if zc_name contains a %, similar to
zfs_ioc_rename().

Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: loli10K <ezomori.nozomu@gmail.com>
2018-02-22 01:30:03 +00:00
Alexander Motin
81ef5e369c 8809 libzpool should leverage work done in libfakekernel
illumos/illumos-gate@f06dce2c1f

Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Gordon Ross <gordon.w.ross@gmail.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Andrew Stormont <astormont@racktopsystems.com>
2018-02-21 21:04:46 +00:00
Andriy Gapon
86b66fca8b 8520 7198 lzc_rollback_to should support rolling back to origin
illumos/illumos-gate@95643f75d2
95643f75d2

https://www.illumos.org/issues/8520
  lzc_rollback_to() should support rolling back to a clone's origin.
  The current checks in zfs_ioc_rollback() would not allow that because the
  origin snapshot belongs to a different filesystem.
  The overly restrictive check was introduced in 7600, but it was not a
  regression as none of the existing tools provided a way to rollback to the
  origin.

https://www.illumos.org/issues/7198
  EINVAL is returned when a dataset does not have any snapshots, so there is
  nothing to roll back to.
  Although the code in zfs_do_rollback checks for that condition in advance, it's
  still possible that the snapshot(s) gets removed after the check and before the
  rollback sync task is executed.
  At the moment zfs command would crash when that happens.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Andriy Gapon <avg@FreeBSD.org>
2018-02-21 15:10:33 +00:00
Alexander Motin
79a23a6944 7614 zfs device evacuation/removal
illumos/illumos-gate@5cabbc6b49

https://www.illumos.org/issues/7614:
This project allows top-level vdevs to be removed from the storage pool with
“zpool remove”, reducing the total amount of storage in the pool. This
operation copies all allocated regions of the device to be removed onto other
devices, recording the mapping from old to new location. After the removal is
complete, read and free operations to the removed (now “indirect”) vdev must
be remapped and performed at the new location on disk. The indirect mapping
table is kept in memory whenever the pool is loaded, so there is minimal
performance overhead when doing operations on the indirect vdev.

The size of the in-memory mapping table will be reduced when its entries
become “obsolete” because they are no longer used by any block pointers in
the pool. An entry becomes obsolete when all the blocks that use it are
freed. An entry can also become obsolete when all the snapshots that
reference it are deleted, and the block pointers that reference it have been
“remapped” in all filesystems/zvols (and clones). Whenever an indirect block
is written, all the block pointers in it will be “remapped” to their new
(concrete) locations if possible. This process can be accelerated by using
the “zfs remap” command to proactively rewrite all indirect blocks that
reference indirect (removed) vdevs.

Note that when a device is removed, we do not verify the checksum of the data
that is copied. This makes the process much faster, but if it were used on
redundant vdevs (i.e. mirror or raidz vdevs), it would be possible to copy
the wrong data, when we have the correct data on e.g. the other side of the
mirror. Therefore, mirror and raidz devices can not be removed.

Reviewed by: Alex Reece <alex@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Richard Laager <rlaager@wiktel.com>
Reviewed by: Tim Chase <tim@chase2k.com>
Approved by: Garrett D'Amore <garrett@damore.org>
Author: Prashanth Sreenivasa <pks@delphix.com>
2018-02-18 01:21:52 +00:00
Alexander Motin
1e2bad5ea0 8652 Tautological comparisons with ZPROP_INVAL
illumos/illumos-gate@4ae5f5f06c

https://www.illumos.org/issues/8652:
Clang and GCC prefer to use unsigned ints to store enums. With Clang, that
causes tautological comparison warnings when comparing a zfs_prop_t or
zpool_prop_t variable to the macro ZPROP_INVAL. It's likely that error
handling code is being silently removed as a result.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Igor Kozhukhov <igor@dilos.org>
Approved by: Gordon Ross <gwr@nexenta.com>
Author: Alan Somers <asomers@gmail.com>
2018-01-22 04:48:14 +00:00
Alexander Motin
434e06c6f9 8641 "zpool clear" and "zinject" don't work on "spare" or "replacing" vdevs
illumos/illumos-gate@2ba5f978a4

https://www.illumos.org/issues/8641:
"zpool clear" and "zinject -d" can both operate on specific vdevs, either
leaf or interior. However, due to an oversight, neither works on a "spare"
or "replacing" vdev. For example:

sudo zpool create foo raidz1 c1t5000CCA000081D61d0 c1t5000CCA000186235d0 spare c1t5000CCA000094115d0
sudo zpool replace foo c1t5000CCA000186235d0 c1t5000CCA000094115d0
$ zpool status foo pool: foo
state: ONLINE
scan: resilvered 81.5K in 0h0m with 0 errors on Fri Sep 8 10:53:03 2017
config:

NAME                         STATE     READ WRITE CKSUM
        foo                          ONLINE       0     0     0
          raidz1-0                   ONLINE       0     0     0
            c1t5000CCA000081D61d0    ONLINE       0     0     0
            spare-1                  ONLINE       0     0     0
              c1t5000CCA000186235d0  ONLINE       0     0     0
              c1t5000CCA000094115d0  ONLINE       0     0     0
        spares
          c1t5000CCA000094115d0      INUSE     currently in use
$ sudo zinject -d spare-1 -A degrade foo
cannot find device 'spare-1' in pool 'foo'
$ sudo zpool clear foo spare-1
cannot clear errors for spare-1: no such device in pool

Even though there was nothing to clear, those commands shouldn't have
reported an error. by contrast, trying to clear "raidz1-0" works just fine:
$ sudo zpool clear foo raidz1-0
2018-01-22 04:35:17 +00:00
Alexander Motin
4c30fea809 8898 creating fs with checksum=skein on the boot pools fails ungracefully
illumos/illumos-gate@9fa2266d9a

https://www.illumos.org/issues/8898:
# zfs create -o checksum=skein rpool/test
internal error: Result too large
Abort (core dumped)

Not a big deal per se, but should be handled correctly.

Also reported as: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=222199

Reviewed by: Toomas Soome <tsoome@me.com>
Reviewed by: Andy Stormont <astormont@racktopsystems.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Yuri Pankov <yuri.pankov@nexenta.com>
2018-01-21 23:57:41 +00:00
Alexander Motin
adbaf37829 8897 zpool online -e fails assertion when run on non-leaf vdevs
illumos/illumos-gate@9a551dd645

https://www.illumos.org/issues/8897:
# zpool online -e test mirror-1
Assertion failed: nvlist_lookup_string(tgt, "path", &pathname) == 0, file ../common/libzfs_pool.c, line 2558, function zpool_vdev_online
Abort (core dumped)

Not a big deal per se, but should be handled gracefully, same way as 'offline' and 'online' without '-e'.

Also reported as: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221408

Reviewed by: Toomas Soome <tsoome@me.com>
Reviewed by: Igor Kozhukhov <igor@dilos.org>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Yuri Pankov <yuri.pankov@nexenta.com>
2018-01-21 23:52:37 +00:00
Alexander Motin
619fd3c317 8677 Open-Context Channel Programs
illumos/illumos-gate@a3b2868063

https://www.illumos.org/issues/8677
  We want to be able to run channel programs outside of synching context.
  This would greatly improve performance of channel program that just gather
  information, as we won't have to wait for synching context anymore.

  This feature should introduce the following:
  - A new command line flag in "zfs program" to specify our intention to
  run in open context.
  - A new flag/option within the channel program ioctl which selects the
  context.
  - Appropriate error handling whenever we try a channel program in
  open-context that contains zfs.sync* expressions.
  - Documentation for the new feature in the manual pages.

Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Chris Williamson <chris.williamson@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author:	Serapheim Dimitropoulos <serapheim@delphix.com>
2018-01-21 19:26:38 +00:00
Andriy Gapon
ae99a88d67 follow up to r325013, add libcmdutils.h to the vendor area 2017-10-27 11:26:36 +00:00
Andriy Gapon
fcf62a2809 640 number_to_scaled_string is duplicated in several commands
illumos/illumos-gate@0a0551200e
0a0551200e

https://www.illumos.org/issues/640
  du(1), df(1m), ls(1), and swap(1m) all include a copy (it appears literally
  copied) of the 'number_to_scaled_string' function in their source. This should
  be moved to a shared library and all 4 commands should use this instead.

Reviewed by: Sebastian Wiedenroth <wiedi@frubar.net>
Reviewed by: Robert Mustacchi <rm@joyent.com>
Reviewed by: Yuri Pankov <yuripv@gmx.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Jason King <jason.brian.king@gmail.com>
2017-10-26 16:20:47 +00:00
Andriy Gapon
e0cb71a82f 8081 Compiler warnings in zdb
illumos/illumos-gate@3f7978d02b
3f7978d02b

https://www.illumos.org/issues/8081
  zdb(8) is full of minor problems that generate compiler warnings. On FreeBSD,
  which uses -WError, the only way to build it is to disable all compiler
  warnings. This makes it much harder to detect newly introduced bugs. We should
  cleanup all the warnings.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Alan Somers <asomers@gmail.com>
2017-10-02 11:55:11 +00:00
Andriy Gapon
a373b4201a 8600 ZFS channel programs - snapshot
illumos/illumos-gate@2840dce1a0
2840dce1a0

https://www.illumos.org/issues/8600
  ZFS channel programs should be able to create snapshots.
  In addition to the base snapshot functionality, this will likely entail adding
  extra logic to handle edge cases which were formerly not possible, such as
  creating then destroying a snapshot in the same transaction sync.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Chris Williamson <chris.williamson@delphix.com>
2017-09-22 08:18:05 +00:00
Andriy Gapon
bd021dbd51 8502 illumos#7955 broke delegated datasets when libshare is not present
illumos/illumos-gate@1c18e8fbd8
1c18e8fbd8

https://www.illumos.org/issues/8502
  The code in lib/libzfs/common/libzfs_mount.c already basically handles
  the case when libshare is not installed. We just need to not fail in
  zfs_init_libshare_impl.  I tested this in lx and things work as
  expected. I also tested there trying to set sharenfs and sharesmb on
  the delegated dataset. Neither is allowed from within a zone.  The
  spew of msgs from a native zone is not ZFS specific. I see the same
  spew simply running the share command.

Reviewed by: Robert Mustacchi <rm@joyent.com>
Reviewed by: Yuri Pankov <yuripv@gmx.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Jerry Jelinek <jerry.jelinek@joyent.com>
2017-09-22 08:13:09 +00:00
Andriy Gapon
035e679e27 8567 Inconsistent return value in zpool_read_label
illumos/illumos-gate@c861bfbd77
c861bfbd77

https://www.illumos.org/issues/8567
  If fstat64 fails, pread64 fails, or the label is unintelligible,
  zpool_read_label will return 0. But if malloc fails, it will return -1. For
  consistency, it should always return -1 on failure or 0 on success.

Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Alan Somers <asomers@gmail.com>
2017-09-20 07:18:09 +00:00
Andriy Gapon
bdc44f62ff 7431 ZFS Channel Programs
illumos/illumos-gate@dfc115332c
dfc115332c

https://www.illumos.org/issues/7431
  ZFS channel programs (ZCP) adds support for performing compound ZFS
  administrative actions via Lua scripts in a sandboxed environment (with time
  and memory limits).
  This initial commit includes both base support for running ZCP scripts, and a
  small initial library of API calls which support getting properties and
  listing, destroying, and promoting datasets.
  Testing: in addition to the included unit tests, channel programs have been in
  use at Delphix for several months for batch destroying filesystems. The
  dsl_destroy_snaps_nvl() call has also been replaced with

  For reference, the new zfs-program manpage is included below.
  ZFS-PROGRAM(1M)                       1M                       ZFS-PROGRAM(1M)

  NAME
       zfs program – executes ZFS channel programs

  SYNOPSIS
       zfs program [-t timeout] [-m memory-limit] pool script

  DESCRIPTION
       The ZFS channel program interface allows ZFS administrative operations to
       be run programmatically as a Lua script. The entire script is executed
       atomically, with no other administrative operations taking effect
       concurrently. A library of ZFS calls is made available to channel program
       scripts. Channel programs may only be run with root privileges.

       A modified version of the Lua 5.2 interpreter is used to run channel
       program scripts. The Lua 5.2 manual can be found at:

             http://www.lua.org/manual/5.2/
  ...

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Approved by: Garrett D'Amore <garrett@damore.org>
Author: Chris Williamson <chris.williamson@delphix.com>
2017-09-13 10:45:49 +00:00
Andriy Gapon
af2da9fb2e 5815 libzpool's panic function doesn't set global panicstr, ::status not as useful
illumos/illumos-gate@fae6347731
fae6347731

https://www.illumos.org/issues/5815
  When panic() is called from within ztest, the mdb ::status command isn't as
  useful as it could be since the global panicstr variable isn't updated. We
  should modify the function to make sure panicstr is set, so ::status can
  present the error message just like it does on a failed assertion.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed by: Gordon Ross <gordon.ross@nexenta.com>
Reviewed by: Rich Lowe <richlowe@richlowe.net>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Prakash Surya <prakash.surya@delphix.com>
2017-09-13 10:33:09 +00:00
Andriy Gapon
b804e37054 8331 zfs_unshare returns wrong error code for smb unshare failure
illumos/illumos-gate@4f4378cc54
4f4378cc54

https://www.illumos.org/issues/8331
  zfs_unshare returns EZFS_UNSHARENFSFAILED on error for all share types.

Reviewed by: Marcel Telka <marcel@telka.sk>
Reviewed by: Toomas Soome <tsoome@me.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Andrew Stormont <astormont@racktopsystems.com>
2017-09-13 10:17:14 +00:00
Andriy Gapon
efd3d79ea3 8414 Implemented zpool scrub pause/resume
illumos/illumos-gate@1702cce751
1702cce751

https://www.illumos.org/issues/8414
  This issue tracks the port of scrub pause from ZoL: https://github.com/zfsonlinux/zfs/pull/6167
  Currently, there is no way to pause a scrub. Pausing may be useful when
 the pool is busy with other I/O to preserve bandwidth.

  Description

  This patch adds the ability to pause and resume scrubbing.  This is achieved
  by maintaining a persistent on-disk scrub state.  While the state is 'paused'
  we do not scrub any more blocks.  We do however perform regular scan
  housekeeping such as freeing async destroyed and deadlist blocks while paused.

  If you're testing this change, you probably want to include the patch from #6164

  Motivation and Context

  Scrub pausing can be an I/O intensive operation and people have been asking
  for the ability to pause a scrub for a while. This allows one to preserve scrub
  progress while freeing up bandwidth for other I/O.

  How Has This Been Tested?

  Unit testing and zfs-tests.  to the pool.  This patch will also include the
  patch from https://github.com/zfsonlinux/zfs/ pull/6164 In certain cases
  (dsl_scan_sync() is one), we may end up calling

Reviewed by: George Melikov <mail@gmelikov.ru>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Alek Pinchuk <apinchuk@datto.com>
2017-09-01 17:43:08 +00:00
Andriy Gapon
fd6c8b414e 8430 dir_is_empty_readdir() doesn't properly handle error from fdopendir()
illumos/illumos-gate@ba6e7e6505
ba6e7e6505

https://www.illumos.org/issues/8430
  we should close dirfd if fdopendir() fails.

Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Igor Kozhukhov <igor@dilos.org>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Sowrabha Gopal <sowrabha.gopal@delphix.com>
2017-08-08 10:55:42 +00:00
Andriy Gapon
59946bc86e 7600 zfs rollback should pass target snapshot to kernel
illumos/illumos-gate@77b171372e
77b171372e

https://www.illumos.org/issues/7600
  At present, the kernel side code seems to blindly rollback to whatever happens
  to be the latest snapshot at the time when the rollback task is processed.
  The expected target's name should be passed to the kernel driver and the sync
  task should validate that the target exists and that it is the latest snapshot
  indeed.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Andriy Gapon <avg@FreeBSD.org>
2017-08-08 10:49:56 +00:00
Andriy Gapon
32b3bd238f 8418 zfs_prop_get_table() call in zfs_validate_name() is a no-op
illumos/illumos-gate@e09ba01dcd
e09ba01dcd

https://www.illumos.org/issues/8418
  The following line in zfs_validate_name() is just a no-op and it should be
  removed:
  108    (void) zfs_prop_get_table();

Reviewed by: Vitaliy Gusev <gusev.vitaliy@icloud.com>
Approved by: Matthew Ahrens <mahrens@delphix.com>
Author: Marcel Telka <marcel@telka.sk>
2017-08-08 10:28:01 +00:00
Andriy Gapon
71672e5c5d 8264 want support for promoting datasets in libzfs_core
illumos/illumos-gate@a4b8c9aa65
a4b8c9aa65

https://www.illumos.org/issues/8264
  Oddly there is a lzc_clone function, but no lzc_promote function.

Reviewed by: Andriy Gapon <avg@FreeBSD.org>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan McDonald <danmcd@kebe.com>
Approved by: Dan McDonald <danmcd@kebe.com>
Author: Andrew Stormont <astormont@racktopsystems.com>
2017-06-14 16:23:15 +00:00
Andriy Gapon
79edb7989a 8269 dtrace stddev aggregation is normalized incorrectly
illumos/illumos-gate@79809f9cf4
79809f9cf4

https://www.illumos.org/issues/8269
  It seems that currently normalization of stddev aggregation is done
  incorrectly.
  We divide both the sum of values and the sum of their squares by the
  normalization factor. But we should divide the sum of squares by the
  normalization factor squared to scale the original values properly.

Reviewed by: Bryan Cantrill <bryan@joyent.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Andriy Gapon <avg@FreeBSD.org>
2017-06-09 15:04:10 +00:00
Andriy Gapon
1cb31ff6a4 8168 NULL pointer dereference in zfs_create()
illumos/illumos-gate@690031d326
690031d326

https://www.illumos.org/issues/8168
  If we manage to export the pool on which we are creating a dataset (filesystem
  or zvol) between entering libzfs`zfs_create() and libzfs`zpool_open() call (for
  which we never check the return value) we end up dereferencing a NULL pointer
  in libzfs`zpool_close().
  This was discovered on ZFS on Linux. The same issue can be reproduced on
  Illumos running in parallel:
    while :; do zpool import -d /tmp testpool ; zpool export testpool ; done
    while :; do zfs create testpool/fs; zfs destroy testpool/fs ; done
  Eventually this will result in several core dumps like this one:
  [root@52-54-00-d3-7a-01 /cores]# mdb core.zfs.4244
  Loading modules: [ libumem.so.1 libc.so.1 libtopo.so.1 libavl.so.1
  libnvpair.so.1 ld.so.1 ]
  > ::stack
  libzfs.so.1`zpool_close+0x17(0, 0, 0, 8047450)
  libzfs.so.1`zfs_create+0x1bb(8090548, 8047e6f, 1, 808cba8)
  zfs_do_create+0x545(2, 8047d74, 80778a0, 801, 0, 3)
  main+0x22c(8047d2c, fef5c6e8, 8047d64, 8055a17, 3, 8047d70)
  _start+0x83(3, 8047e64, 8047e68, 8047e6f, 0, 8047e7b)
  >
  Fix and reproducer (systemtap): https://github.com/zfsonlinux/zfs/pull/6096

Reviewed by: Matt Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: loli10K <ezomori.nozomu@gmail.com>
2017-06-09 15:00:13 +00:00
Andriy Gapon
d7f3871103 8021 ARC buf data scatter-ization
8100 8021 seems to cause random BAD TRAP: type=d (#gp General protection)

illumos/illumos-gate@770499e185
770499e185

https://www.illumos.org/issues/8021
  The ARC buf data project (known simply as "ABD" since its genesis in the ZoL
  community) changes the way the ARC allocates `b_pdata` memory from using linear
  `void *` buffers to using scatter/gather lists of fixed-size 1KB chunks. This
  improves ZFS's performance by helping to defragment the address space occupied
  by the ARC, in particular for cases where compressed ARC is enabled. It could
  also ease future work to allocate pages directly from `segkpm` for minimal-
  overhead memory allocations, bypassing the `kmem` subsystem.
  This is essentially the same change as the one which recently landed in ZFS on
  Linux, although they made some platform-specific changes while adapting this
  work to their codebase:
  1. Implemented the equivalent of the `segkpm` suggestion for future work
  mentioned above to bypass issues that they've had with the Linux kernel memory
  allocator.
  2. Changed the internal representation of the ABD's scatter/gather list so it
  could be used to pass I/O directly into Linux block device drivers. (This
  feature is not available in the illumos block device interface yet.)

https://www.illumos.org/issues/8100
  My supermicro system is getting random BAD TRAP: type=d (#gp General
  protection) at about the stage where ZFS filesystems are mounted - usually
  console login prompt is already present but the services are still starting.
  After backing out 8021, the boot is completed and no panics do occur.
  Machine does dump, however savecore fails:
  savecore: bad magic number baddcafe
  I can get more data out with boot -k, if needed.
  # psrinfo -vp
  The physical processor has 4 cores and 8 virtual processors (0-7)
    The core has 2 virtual processors (0 4)
    The core has 2 virtual processors (1 5)
    The core has 2 virtual processors (2 6)
    The core has 2 virtual processors (3 7)
      x86 (GenuineIntel 306C3 family 6 model 60 step 3 clock 3500 MHz)
        Intel(r) Xeon(r) CPU E3-1246 v3 @ 3.50GHz

  # prtconf -m
  32657

  $ zpool status
    pool: rpool
   state: ONLINE
    scan: none requested
  config:

          NAME        STATE     READ WRITE CKSUM
          rpool       ONLINE       0     0     0
            raidz1-0  ONLINE       0     0     0
              c3t0d0  ONLINE       0     0     0
              c3t1d0  ONLINE       0     0     0

Reviewed by: Matthew Ahrens mahrens@delphix.com
Reviewed by: George Wilson george.wilson@delphix.com
Reviewed by: Paul Dagnelie pcd@delphix.com
Reviewed by: John Kennedy john.kennedy@delphix.com
Reviewed by: Prakash Surya prakash.surya@delphix.com
Reviewed by: Prashanth Sreenivasa pks@delphix.com
Reviewed by: Pavel Zakharov pavel.zakharov@delphix.com
Reviewed by: Chris Williamson chris.williamson@delphix.com
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Dan Kimmel <dan.kimmel@delphix.com>
2017-05-26 12:13:27 +00:00
Andriy Gapon
9df2d6f729 7446 zpool create should support efi system partition
illumos/illumos-gate@7855d95b30
7855d95b30

https://www.illumos.org/issues/7446
  Since we support whole-disk configuration for boot pool, we also will need
  whole disk support with UEFI boot and for this, zpool create should create efi-
  system partition.
  I have borrowed the idea from oracle solaris, and introducing zpool create -
  B switch to provide an way to specify that boot partition should be created.
  However, there is still an question, how big should the system partition be.
  For time being, I have set default size 256MB (thats minimum size for FAT32
  with 4k blocks). To support custom size, the set on creation "bootsize"
  property is created and so the custom size can be set as: zpool create B -
  o bootsize=34MB rpool c0t0d0
  After pool is created, the "bootsize" property is read only. When -B switch is
  not used, the bootsize defaults to 0 and is shown in zpool get output with
  value ''. Older zfs/zpool implementations are ignoring this property.
  https://www.illumos.org/rb/r/219/

Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
Reviewed by: Yuri Pankov <yuri.pankov@gmail.com>
Approved by: Dan McDonald <danmcd@kebe.com>
Author: Toomas Soome <tsoome@me.com>
2017-05-26 12:02:14 +00:00
Andriy Gapon
7248081159 5704 libzfs can only handle 255 file descriptors
illumos/illumos-gate@bde3d612a7
bde3d612a7

https://www.illumos.org/issues/5704
  libzfs uses fopen(), at least in libzfs_init(). If there are more than 255
  filedescriptors open, fopen() will fail unless you give 'F' as the last mode
  character. The fix would be to give 'rF' instead of 'r' as mode to fopen().

Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Simon Klinkert <simon.klinkert@gmail.com>
2017-04-14 18:56:00 +00:00
Andriy Gapon
89a08ff6ba 7340 receive manual origin should override automatic origin
illumos/illumos-gate@ed4e7a6a5c
ed4e7a6a5c

https://www.illumos.org/issues/7340
  When -o origin=<snapshot> is specified as part of a ZFS receive, that origin
  should override the automatic detection in libzfs.

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Paul Dagnelie <pcd@delphix.com>
2017-04-14 18:54:11 +00:00
Andriy Gapon
5fab759ec6 5142 libzfs support raidz root pool (loader project)
illumos/illumos-gate@d5f26ad812
d5f26ad812

https://www.illumos.org/issues/5142
  the current libzfs only allows simple disk and mirror setup for boot pool, as
  loader does support booting from raidz, this feature will remove raidz
  restriction from boot pool setup.

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
Reviewed by: Albert Lee <trisk@omniti.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Toomas Soome <tsoome@me.com>
2017-04-14 18:52:48 +00:00
Andriy Gapon
5b3ff7eced 6280 libzfs: unshare_one() could fail with EZFS_SHARENFSFAILED
illumos/illumos-gate@d1672efb6f
d1672efb6f

https://www.illumos.org/issues/6280
  The unshare_one() in libzfs could fail with EZFS_SHARENFSFAILED at line 834
  here:
  831    /* make sure libshare initialized */
  832    if ((err = zfs_init_libshare(hdl, SA_INIT_SHARE_API)) != SA_OK) {
  833        free(mntpt);    /* don't need the copy anymore */
  834        return (zfs_error_fmt(hdl, EZFS_SHARENFSFAILED,
  835            dgettext(TEXT_DOMAIN, "cannot unshare '%s': %s"),
  836            name, _sa_errorstr(err)));
  837    }
  The correct error should be EZFS_UNSHARENFSFAILED instead.

Reviewed by: Toomas Soome <tsoome@me.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Gordon Ross <gwr@nexenta.com>
Author: Marcel Telka <marcel.telka@nexenta.com>
2017-04-14 18:51:16 +00:00
Andriy Gapon
aa64c9679a 6268 zfs diff confused by moving a file to another directory
illumos/illumos-gate@aab04418a7
aab04418a7

https://www.illumos.org/issues/6268
  The zfs diff command presents a description of the changes that have occurred
  to files within a filesystem between two snapshots. If a file is renamed, the
  tool is capable of reporting this, e.g.:
  cd /some/zfs/dataset/subdir
  mv file0 file1
  Will result in a diff record like:
  R        /some/zfs/dataset/subdir/file0  ->  /some/zfs/dataset/subdir/file1
  Unfortunately, it seems that rename detection only uses the base filename to
  determine if a file has been renamed or simply modified. This leads to
  misreporting only the original filename, omitting the more relevant destination
  filename entirely. For example:
  cd /some/zfs/dataset/subdir
  mv file0 ../otherdir/file0
  Will result in a diff entry:
  M        /some/zfs/dataset/subdir/file0
  But it should really emit:
  R        /some/zfs/dataset/subdir/file0  ->  /some/zfs/dataset/otherdir/file0

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Justin Gibbs <gibbs@scsiguy.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Joshua M. Clulow <josh@sysmgr.org>
2017-04-14 18:49:44 +00:00
Andriy Gapon
847b286425 7955 libshare needs to initialize only those datasets being modified by the consumer
illumos/illumos-gate@8a981c3356
8a981c3356

https://www.illumos.org/issues/7955
  Libshare currently initializes all available filesystems when doing any
  libshare operation. This requires iterating through all the filesystem
  multiple times, which is a huge performance problem for sharing and
  unsharing operations.

Reviewed by: Steve Gonczi <steve.gonczi@delphix.com>
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Yuri Pankov <yuri.pankov@gmail.com>
Approved by: Gordon Ross <gordon.w.ross@gmail.com>
Author: Daniel Hoffman <dj.hoffman@delphix.com>
2017-04-14 18:34:03 +00:00
Andriy Gapon
c2745459d3 5380 receive of a send -p stream doesn't need to try renaming snapshots
illumos/illumos-gate@471a88e499
471a88e499

https://www.illumos.org/issues/5380
  A stream created with zfs send -p -I contains properties of all snapshots of a
  given dataset as opposed to only properties of snapshots in a given range.
  Not only this is suboptimal but the receive code also does not filter
  properties by the range. So, properties of earlier snapshots would be updated
  even though the snapshots themselves are not in the stream (just their
  properties).
  Given that modifying the snapshot properties requires a TXG sync and that the
  snapshots are updated one by one the described behavior may lead to a sever
  performance penalty.

Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Andriy Gapon <avg@FreeBSD.org>
2017-04-14 18:30:22 +00:00
Andriy Gapon
2d3fcc82a1 7990 libzfs: snapspec_cb() does not need to call zfs_strdup()
illumos/illumos-gate@d8584ba6fb
d8584ba6fb

https://www.illumos.org/issues/7990
  The snapspec_cb() callback function in libzfs does not need to call zfs_strdup().

Reviewed by: Yuri Pankov <yuri.pankov@gmail.com>
Reviewed by: Toomas Soome <tsoome@me.com>
Approved by: Matthew Ahrens <mahrens@delphix.com>
Author: Marcel Telka <marcel@telka.sk>
2017-04-14 18:27:12 +00:00
Andriy Gapon
b6187af16b 7803 want devid_str_from_path(3devid)
illumos/illumos-gate@46d46cd4fa
46d46cd4fa

https://www.illumos.org/issues/7803
  Make get_devid() from libzfs a public function in libdevid, as its pretty
  usable in other places and duplicating all the logic required to get string
  encoded devid from path seems counter-productive.

Reviewed by: Dan McDonald <danmcd@omniti.com>
Reviewed by: Jason King <jason.brian.king@gmail.com>
Reviewed by: Marcel Telka <marcel@telka.sk>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Yuri Pankov <yuri.pankov@nexenta.com>
2017-04-14 18:21:58 +00:00
Andriy Gapon
e7a96b6403 7541 zpool import/tryimport ioctl returns ENOMEM because provided buffer is too small for config
illumos/illumos-gate@8b65a70b76
8b65a70b76

https://www.illumos.org/issues/7541
  When calling zpool import, zpool does a few ioctls to ZFS.
  zpool allocates a buffer in userland and passes it to the kernel so that ZFS
  can copy info into it. ZFS will use it to put the nvlist that describes the
  pool configuration.
  If the allocated buffer is too small, ZFS will return ENOMEM and the call will
  have to be redone. This wastes CPU time and slows down the import process. This
  happens very often for the ZFS_IOC_POOL_TRYIMPORT call.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Pavel Zakharov <pavel.zakharov@delphix.com>
2017-04-14 18:20:56 +00:00
Andriy Gapon
1e876f2a64 7729 libzfs_core`lzc_rollback() leaks result nvl
illumos/illumos-gate@ac428481f9
ac428481f9

https://www.illumos.org/issues/7729
  libzfs_core`lzc_rollback() doesn't free the result nvl after lzc_ioctl() call.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Yuri Pankov <yuri.pankov@nexenta.com>
2017-04-14 18:15:13 +00:00
Andriy Gapon
43fd8e6d5e 7745 print error if lzc_* is called before libzfs_core_init
illumos/illumos-gate@7c13517fff
7c13517fff

https://www.illumos.org/issues/7745
  The problem is that consumers of `libZFS_Core` that forget to call
  `libzfs_core_init()` before calling any other function of the library
  are having a hard time realizing their mistake. The library's internal
  file descriptor is declared as global static, which is ok, but it is not
  initialized explicitly; therefore, it defaults to 0, which is a valid
  file descriptor. If `libzfs_core_init()`, which explicitly initializes
  the correct fd, is skipped, the ioctl functions return errors that do
  not have anything to do with `libZFS_Core`, where the problem is
  actually located.
  Even though assertions for that existed within `libZFS_Core` for debug
  builds, they were never enabled because the `-DDEBUG` flag was missing
  from the compiler flags.
  This patch applies the following changes:
  1. It adds `-DDEBUG` for debug builds of `libZFS_Core` and `libzfs`,
         to enable their assertions on debug builds.
  2. It corrects an assertion within `libzfs`, where a function had
         been spelled incorrectly (`zpool_prop_unsupported()`) and nobody
         knew because the `-DDEBUG` flag was missing, and the preprocessor
         was taking that part of the code away.
  3. The library's internal fd is initialized to `-1` and `VERIFY`
         assertions have been placed to check that the fd is not equal to
         `-1` before issuing any ioctl. It is important here to note, that
         the `VERIFY` assertions exist in both debug and non-debug builds.
  4. In `libzfs_core_fini` we make sure to never increment the
         refcount of our fd below 0, and also reset the fd to `-1` when no
         one refers to it. The reason for this, is for the rare case that
         the consumer closes all references but then calls one of the
         library's functions without using `libzfs_core_init()` first, and
         in the mean time, a previous call to `open()` decided to reuse
         our previous fd. This scenario would have passed our assertion in

Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Serapheim Dimitropoulos <serapheim@delphix.com>
2017-04-14 18:14:02 +00:00
Andriy Gapon
e90b7ce5ca 7730 libzfs`add_config() leaks config nvl when reading spare/l2cache devices
illumos/illumos-gate@105686550e
105686550e

https://www.illumos.org/issues/7730
  antares:root:~# mdb /usr/sbin/zpool
  > ::sysbp _exit
  > ::run import
     pool: data
       id: 2093977168778024605
    state: ONLINE
   action: The pool can be imported using its name or numeric identifier.
   config:

          data        ONLINE
            c6t0d0    ONLINE
            c6t1d0    ONLINE
          cache
            c6t2d0
  mdb: stop on entry to _exit
  mdb: target stopped at:
  0xfee556ba:     nop
  mdb: You've got symbols!
  Loading modules: [ ld.so.1 libumem.so.1 libc.so.1 libtopo.so.1 libavl.so.1
  libnvpair.so.1 ]
  > ::findleaks -d
  BYTES             LEAKED VMEM_SEG CALLER
  4096                  10 fda7b000 MMAP
  8192                   1 fea8d000 MMAP
  8192                   1 fe76d000 MMAP
  8192                   1 fe66e000 MMAP
  4096                   1 fe570000 MMAP
  8192                   1 fe470000 MMAP
  4096                   1 fe372000 MMAP
  4096                   1 fe273000 MMAP

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Yuri Pankov <yuri.pankov@nexenta.com>
2017-04-14 18:13:33 +00:00
Andriy Gapon
b3264caf7b 7252 7628 compressed zfs send / receive
illumos/illumos-gate@5602294fda
5602294fda

https://www.illumos.org/issues/7252
  This feature includes code to allow a system with compressed ARC enabled to
  send data in its compressed form straight out of the ARC, and receive data in
  its compressed form directly into the ARC.

https://www.illumos.org/issues/7628
  We should have longer, more readable versions of the ZFS send / recv options.

7628 create long versions of ZFS send / receive options

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed by: David Quigley <dpquigl@davequigley.com>
Reviewed by: Thomas Caputi <tcaputi@datto.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Dan Kimmel <dan.kimmel@delphix.com>
2017-04-14 18:07:43 +00:00
Andriy Gapon
9c03f5f793 7604 if volblocksize property is the default, it displays as "-" rather than 8K
illumos/illumos-gate@4d86c0eab2
4d86c0eab2

https://www.illumos.org/issues/7604
  If a zvol has the default setting for the "volblocksize" property, it is
  8KB. However, it is displayed as "-" (not present), rather than "8K".
  The problem was introduced by:
  commit 25228e830e86924a41243343b1de9daf2d7dd43a
      Author: Matthew Ahrens &lt;mahrens@delphix.com&gt;
      Date:   Thu Nov 17 14:37:24 2016 -0800
  7571 non-present readonly numeric ZFS props do not have default value
  which changed changed get_numeric_property() to indicate that readonly
  default properties are not present. However, zfs_prop_readonly() returns
  TRUE for both readonly and set-once properties (e.g. volblocksize).
  Amusingly, that commit essentially reverted:
  6900484 default volblocksize is no longer being reported correctly
  from November 2009. However, that change was not correct either; the
  correct solution is to only do this check for "truly readonly" (i.e. not
  setonce) properties.
  $ zfs list -t volume -o name,volblocksize
      NAME
  VOLBLOCK
      domain0/group-100/appdata_container-101/appdata_windows_timeflow-102/
  archive            -
      domain0/group-100/appdata_container-101/appdata_windows_timeflow-102/
  datafile           -
      domain0/group-100/appdata_container-101/appdata_windows_timeflow-102/
  external           -
      rpool/dump
  128K
      rpool/swap
  4K
      rpool/swap1
  ===============================================================================

Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>
2017-04-14 18:05:20 +00:00
Andriy Gapon
cfaa3c3d8f 7386 zfs get does not work properly with bookmarks
illumos/illumos-gate@edb901aab9
edb901aab9

https://www.illumos.org/issues/7386
  The zfs get command does not work with the bookmark parameter while it works
  properly with both filesystem and snapshot:
  # zfs get -t all -r creation rpool/test
  NAME               PROPERTY  VALUE                  SOURCE
  rpool/test         creation  Fri Sep 16 15:00 2016  -
  rpool/test@snap    creation  Fri Sep 16 15:00 2016  -
  rpool/test#bkmark  creation  Fri Sep 16 15:00 2016  -
  # zfs get -t all -r creation rpool/test@snap
  NAME             PROPERTY  VALUE                  SOURCE
  rpool/test@snap  creation  Fri Sep 16 15:00 2016  -
  # zfs get -t all -r creation rpool/test#bkmark
  cannot open 'rpool/test#bkmark': invalid dataset name
  #
  The zfs get command should be modified to work properly with bookmarks too.

Reviewed by: Simon Klinkert <simon.klinkert@gmail.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Approved by: Matthew Ahrens <mahrens@delphix.com>
Author: Marcel Telka <marcel@telka.sk>
2017-04-14 18:01:43 +00:00
Andriy Gapon
aad99e99a3 7571 non-present readonly numeric ZFS props do not have default value
illumos/illumos-gate@ad2760acbd
ad2760acbd

https://www.illumos.org/issues/7571
  ZFS displays the default value for non-present readonly numeric (and index)
  properties. However, these properties default values are not meaningful.
  Instead, we should display a "-", indicating that they are not present. For
  example, on a version-12 pool, the usedby* properties are not available, but
  they show up as the incorrect value "0":
     1. zfs get all test12
        ...
        test12 usedbysnapshots 0 -
        test12 usedbydataset 0 -
        test12 usedbychildren 0 -
        test12 usedbyrefreservation 0 -
  We will be introducing more sometimes-present numeric readonly properties, so
  it would be nice to fix this.

Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>
2017-04-14 17:27:09 +00:00
Andriy Gapon
910a9fff3b 7542 zfs_unmount failed with EZFS_UNSHARENFSFAILED
illumos/illumos-gate@09c9e6dc9b
09c9e6dc9b

https://www.illumos.org/issues/7542
  libshare keeps a cached copy of the sharetab listing in memory, which can
  become out of date if shares are destroyed or created while leaving a libzfs
  handle open. This results in a spurious unmounting failure when an NFS share
  exists but isn't in the stale libshare cache.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Matt Amdur <matt.amdur@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Chris Williamson <chris.williamson@delphix.com>
2017-04-14 17:24:22 +00:00
Andriy Gapon
6a34841d8a 7336 vfork and O_CLOEXEC causes zfs_mount EBUSY
illumos/illumos-gate@873c4903a5
873c4903a5

https://www.illumos.org/issues/7336
  We can run into a problem where we call into zfs_mount, which in turn calls
  is_dir_empty, which opens the directory to try and make sure it's empty. The
  issue with the current approach is that it holds the directory open while it
  traverses it with readdir, which, due to subtle interaction with the Java JVM,
  vfork, and exec can cause a tricky race condition resulting in zfs_mount
  failures.
  The approach to resolving the issue in this patch is to drop the usage of
  readdir altogether, and instead rely on the fact that ZFS stores the number of
  entries contained in a directory using the st_size field of the stat structure.
  Thus, if the directory in question is a ZFS directory, we can check to see if
  it's empty by calling stat() and inspecting the st_size field of structure
  returned.
  ===============================================================================
  The root cause appears to be an interesting race between vfork, exec, and
  zfs_mount's usage of O_CLOEXEC when calling openat. Here's what is going on:
  1. We call zfs_mount, and this in turn calls openat to check if the directory
  is empty, which results in opening the directory we're trying to mount onto,
  and increment v_count.
  2. As we're in the middle of reading the directory, vfork is called by the JVM
  and proceeds to exec the jspawnhelper utility. As a result of the vfork, we
  take an additional hold on the directory, which increments v_count a second
  time. The semantics of vfork mean the parent process will wait for the child
  process to exit or exec before the parent can continue; at this point the
  parent is in the middle of zfs_mount, reading the directory to determine if
  it's empty or not.
  3. The child process exec-ing jspawnhelper gets to the relvm call within
  exec_args (which is called by exec_common). relvm is the function that releases
  the parent process, allowing the parent to proceed. The problem is, at this
  point of calling relvm, the child hasn't yet called close_exec which is
  responsible for closing the file descriptors inherited from the parent process

Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Robert Mustacchi <rm@joyent.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Prakash Surya <prakash.surya@delphix.com>
2017-04-14 17:23:52 +00:00
Andriy Gapon
68d57fec79 7233 dir_is_empty should open directory with CLOEXEC
illumos/illumos-gate@d420209d9c
d420209d9c

https://www.illumos.org/issues/7233
  This fixes a race where one thread is executing zfs_mount() while another
  thread forks and execs. If the fork occurs while the directory is open, the
  child process will inherit (but not necessarily close immediately) the open fd
  for the directory, preventing the mount.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Alex Reece <alex@delphix.com>
2017-04-14 17:22:54 +00:00