Commit Graph

1150 Commits

Author SHA1 Message Date
Alan Somers
50b779bdc9 ZFS: fix adding vdevs to very large pools
r323791 changed the return value of zpool_read_label.  Error paths that
previously returned 0 began to return -1 instead.  However, not all error
paths initialized errno.  When adding vdevs to a very large pool, errno could
be prepopulated with ENOMEM, causing the operation to fail.  Fix the bug by
setting errno=ENOENT in the case that no ZFS label is found.

PR:		226096
Submitted by:	Nikita Kozlov
Reviewed by:	avg
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D13088
2018-03-02 21:26:48 +00:00
Devin Teske
16e39c48a3 Consistent casing for fallback SIGCHLD (s/Unknown/unknown/) 2018-02-26 00:04:21 +00:00
Devin Teske
4b9c94e576 Updates and enhancements to signal.d to aid DTrace scripting
+ Add missing signals SIGTHR (32) and SIGLIBRT (33)
+ Add inline for converting SIG* int to string
+ Add inline for converting CLD_* int to string

Reviewed by:	markj
Sponsored by:	Smule, Inc.
Differential Revision:	https://reviews.freebsd.org/D14497
2018-02-25 23:59:47 +00:00
Alan Somers
7d3761dc72 Don't declare __assfail as static
It gets called by dmu_buf_init_user, which is inline but not static.  So it
needs global linkage itself.

Reported by:	GCC-6
MFC after:	17 days
X-MFC-With:	329722
2018-02-25 14:29:43 +00:00
Devin Teske
03a3981b89 Updates and enhancements to io.d to aid DTrace scripting
+ Add dev_type do devinfo_t
+ Add b_cmd to bufinfo_t
+ Add constants for BIO_* and DEVSTAT_TYPE_*
+ Add inline for converting BIO_* int to string
+ Add inline for converting DEVSTAT_TYPE_* int to string
+ Add mask for dev_type & DEVSTAT_TYPE_MASK to string
+ Add mask for dev_type & DEVSTAT_TYPE_IF_MASK to string

Reviewed by:	markj
Sponsored by:	Smule, Inc.
Differential Revision:	https://reviews.freebsd.org/D14396
2018-02-24 17:13:15 +00:00
Alexander Motin
03d54eb339 MFV r329807:
8940 Sending an intra-pool resumable send stream may result in EXDEV

illumos/illumos-gate@544132fce3

"zfs send -t <token>" for an incremental send should be able to resume
successfully when sending to the same pool: a subtle issue in
zfs_iter_children() doesn't currently allow this.

Because resuming from a token requires "guid" -> "dataset" mapping
(guid_to_name()), we have to walk the whole hierarchy to find the right
snapshots to send.
When resuming an incremental send both source and destination live in the
same pool and have the same guid: this is where zfs_iter_children() gets
confused and picks up the wrong snapshot, so we end up trying to send an
incremental "destination@snap1 -> source@snap2" stream instead of
"source@snap1 -> source@snap2": this fails with an "Invalid cross-device
link" (EXDEV) error.

Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Hans Rosenfeld <rosenfeld@grumpf.hope-2000.org>
Author: loli10K <ezomori.nozomu@gmail.com>
2018-02-22 04:01:55 +00:00
Alexander Motin
1ea10a60f9 MFV r329793, r329795:
9075 Improve ZFS pool import/load process and corrupted pool recovery

illumos/illumos-gate@6f7938128a

Some work has been done lately to improve the debugability of the ZFS pool
load (and import) process. This includes:

https://www.illumos.org/issues/7638: Refactor spa_load_impl into several functions
https://www.illumos.org/issues/8961: SPA load/import should tell us why it failed
https://www.illumos.org/issues/7277: zdb should be able to print zfs_dbgmsg's

To iterate on top of that, there's a few changes that were made to make the
import process more resilient and crash free. One of the first tasks during the
pool load process is to parse a config provided from userland that describes
what devices the pool is composed of. A vdev tree is generated from that config,
and then all the vdevs are opened.

The Meta Object Set (MOS) of the pool is accessed, and several metadata objects
that are necessary to load the pool are read. The exact configuration of the
pool is also stored inside the MOS. Since the configuration provided from
userland is external and might not accurately describe the vdev tree
of the pool at the txg that is being loaded, it cannot be relied upon to safely
operate the pool. For that reason, the configuration in the MOS is read early
on. In the past, the two configurations were compared together and if there was
a mismatch then the load process was aborted and an error was returned.

The latter was a good way to ensure a pool does not get corrupted, however it
made the pool load process needlessly fragile in cases where the vdev
configuration changed or the userland configuration was outdated. Since the MOS
is stored in 3 copies, the configuration provided by userland doesn't have to be
perfect in order to read its contents. Hence, a new approach has been adopted:
The pool is first opened with the untrusted userland configuration just so that
the real configuration can be read from the MOS. The trusted MOS configuration
is then used to generate a new vdev tree and the pool is re-opened.

When the pool is opened with an untrusted configuration, writes are disabled
to avoid accidentally damaging it. During reads, some sanity checks are
performed on block pointers to see if each DVA points to a known vdev;
when the configuration is untrusted, instead of panicking the system if those
checks fail we simply avoid issuing reads to the invalid DVAs.

This new two-step pool load process now allows rewinding pools accross
vdev tree changes such as device replacement, addition, etc. Loading a pool
from an external config file in a clustering environment also becomes much
safer now since the pool will import even if the config is outdated and didn't,
for instance, register a recent device addition.

With this code in place, it became relatively easy to implement a
long-sought-after feature: the ability to import a pool with missing top level
(i.e. non-redundant) devices. Note that since this almost guarantees some loss
Of data, this feature is for now restricted to a read-only import.

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
Approved by: Hans Rosenfeld <rosenfeld@grumpf.hope-2000.org>
Author: Pavel Zakharov <pavel.zakharov@delphix.com>
2018-02-22 03:15:35 +00:00
Alexander Motin
613b0d87da 8942 zfs promote .../%recv should be an error
illumos/illumos-gate@add927f8c8

Reported on the ZFSonLinux https://github.com/zfsonlinux/zfs/issues/4843,
fixed by https://github.com/zfsonlinux/zfs/pull/6339:

If we are in the middle of an incremental zfs receive, the child .../%recv
will exist. If you concurrently run zfs promote .../%recv, it will "work",
but then zfs gets confused. For example, there's no obvious way to destroy
the containing filesystem (because it is now a clone of its invisible child).

Attempting to do this promote should be an error. We could fix this by
having zfs_ioc_promote() check if zc_name contains a %, similar to
zfs_ioc_rename().

Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: loli10K <ezomori.nozomu@gmail.com>
2018-02-22 01:42:13 +00:00
Alexander Motin
502d18a8f1 MFV r329766: 8962 zdb should work on non-idle pools
illumos/illumos-gate@e144c4e6c9

Currently `zdb` consistently fails to examine non-idle pools as it fails
during the `spa_load()` process. The main problem seems to be that
`spa_load_verify()` fails as can be seen below:

$ sudo zdb -d -G dcenter
    zdb: can't open 'dcenter': I/O error

ZFS_DBGMSG(zdb):
    spa_open_common: opening dcenter
    spa_load(dcenter): LOADING
    disk vdev '/dev/dsk/c4t11d0s0': best uberblock found for spa dcenter. txg 40824950
    spa_load(dcenter): using uberblock with txg=40824950
    spa_load(dcenter): UNLOADING
    spa_load(dcenter): RELOADING
    spa_load(dcenter): LOADING
    disk vdev '/dev/dsk/c3t10d0s0': best uberblock found for spa dcenter. txg 40824952
    spa_load(dcenter): using uberblock with txg=40824952
    spa_load(dcenter): FAILED: spa_load_verify failed [error=5]
    spa_load(dcenter): UNLOADING

This change makes `spa_load_verify()` a dryrun when ran from `zdb`. This is
done by creating a global flag in zfs and then setting it in `zdb`.

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Andy Stormont <astormont@racktopsystems.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Pavel Zakharov <pavel.zakharov@delphix.com>
2018-02-22 00:42:12 +00:00
Alexander Motin
b17bfcde3d 9018 Replace kmem_cache_reap_now() with kmem_cache_reap_soon()
illumos/illumos-gate@36a64e6284

To prevent kmem_cache reaping from blocking other system resources, turn
kmem_cache_reap_now() (which blocks) into kmem_cache_reap_soon(). Callers
to kmem_cache_reap_soon() should use kmem_cache_reap_active(), which
exploits #9017's new taskq_empty().

Reviewed by: Bryan Cantrill <bryan@joyent.com>
Reviewed by: Dan McDonald <danmcd@joyent.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Yuri Pankov <yuripv@yuripv.net>
Author: Tim Kordas <tim.kordas@joyent.com>

FreeBSD does not use taskqueue for kmem caches reaping, so this change
is less dramatic then it is on Illumos, just limiting reaping to 1 time
per second.  It may possibly be improved later, if needed.
2018-02-21 23:15:06 +00:00
Alexander Motin
24433f00ea MFV r329502: 7614 zfs device evacuation/removal
illumos/illumos-gate@5cabbc6b49

https://www.illumos.org/issues/7614:
This project allows top-level vdevs to be removed from the storage pool with
“zpool remove”, reducing the total amount of storage in the pool. This
operation copies all allocated regions of the device to be removed onto other
devices, recording the mapping from old to new location. After the removal is
complete, read and free operations to the removed (now “indirect”) vdev must
be remapped and performed at the new location on disk. The indirect mapping
table is kept in memory whenever the pool is loaded, so there is minimal
performance overhead when doing operations on the indirect vdev.

The size of the in-memory mapping table will be reduced when its entries
become “obsolete” because they are no longer used by any block pointers in
the pool. An entry becomes obsolete when all the blocks that use it are
freed. An entry can also become obsolete when all the snapshots that
reference it are deleted, and the block pointers that reference it have been
“remapped” in all filesystems/zvols (and clones). Whenever an indirect block
is written, all the block pointers in it will be “remapped” to their new
(concrete) locations if possible. This process can be accelerated by using
the “zfs remap” command to proactively rewrite all indirect blocks that
reference indirect (removed) vdevs.

Note that when a device is removed, we do not verify the checksum of the data
that is copied. This makes the process much faster, but if it were used on
redundant vdevs (i.e. mirror or raidz vdevs), it would be possible to copy
the wrong data, when we have the correct data on e.g. the other side of the
mirror. Therefore, mirror and raidz devices can not be removed.

Reviewed by: Alex Reece <alex@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Richard Laager <rlaager@wiktel.com>
Reviewed by: Tim Chase <tim@chase2k.com>
Approved by: Garrett D'Amore <garrett@damore.org>
Author: Prashanth Sreenivasa <pks@delphix.com>
2018-02-21 16:51:02 +00:00
Alan Somers
cb4985fbf6 zdb: raise WARNS from 0 to 2
This has only been possible since r329694 and r329508

MFC after:	3 weeks
X-MFC-With:	329694
X-MFC-With:	329508
Sponsored by:	Spectra Logic
2018-02-21 15:51:48 +00:00
Andriy Gapon
cfb675a138 MFV r329718: 8520 7198 lzc_rollback_to should support rolling back to origin
illumos/illumos-gate@95643f75d2
95643f75d2

https://www.illumos.org/issues/8520
  lzc_rollback_to() should support rolling back to a clone's origin.
  The current checks in zfs_ioc_rollback() would not allow that because the
  origin snapshot belongs to a different filesystem.
  The overly restrictive check was introduced in 7600, but it was not a
  regression as none of the existing tools provided a way to rollback to the
  origin.

https://www.illumos.org/issues/7198
  EINVAL is returned when a dataset does not have any snapshots, so there is
  nothing to roll back to.
  Although the code in zfs_do_rollback checks for that condition in advance, it's
  still possible that the snapshot(s) gets removed after the check and before the
  rollback sync task is executed.
  At the moment zfs command would crash when that happens.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Andriy Gapon <avg@FreeBSD.org>
MFC after:	2 weeks
2018-02-21 15:12:14 +00:00
Alexander Motin
73c9b6d523 MFV r322231:
8430 dir_is_empty_readdir() doesn't properly handle error from fdopendir()

illumos/illumos-gate@ba6e7e6505
ba6e7e6505

https://www.illumos.org/issues/8430
  we should close dirfd if fdopendir() fails.

Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Igor Kozhukhov <igor@dilos.org>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Sowrabha Gopal <sowrabha.gopal@delphix.com>
2018-02-21 02:21:22 +00:00
Alexander Motin
e5a4a83784 MFV r318941: 7446 zpool create should support efi system partition
illumos/illumos-gate@7855d95b30
7855d95b30

https://www.illumos.org/issues/7446
  Since we support whole-disk configuration for boot pool, we also will need
  whole disk support with UEFI boot and for this, zpool create should create efi-
  system partition.
  I have borrowed the idea from oracle solaris, and introducing zpool create -
  B switch to provide an way to specify that boot partition should be created.
  However, there is still an question, how big should the system partition be.
  For time being, I have set default size 256MB (thats minimum size for FAT32
  with 4k blocks). To support custom size, the set on creation "bootsize"
  property is created and so the custom size can be set as: zpool create B -
  o bootsize=34MB rpool c0t0d0
  After pool is created, the "bootsize" property is read only. When -B switch is
  not used, the bootsize defaults to 0 and is shown in zpool get output with
  value ''. Older zfs/zpool implementations are ignoring this property.
  https://www.illumos.org/rb/r/219/

Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
Reviewed by: Yuri Pankov <yuri.pankov@gmail.com>
Approved by: Dan McDonald <danmcd@kebe.com>
Author: Toomas Soome <tsoome@me.com>

This commit makes no sense for FreeBSD, that is why I blocked the option,
but it should be good to stay closer to upstream.
2018-02-21 00:18:57 +00:00
Alexander Motin
502b48823a MFV r316918: 7990 libzfs: snapspec_cb() does not need to call zfs_strdup()
illumos/illumos-gate@d8584ba6fb
d8584ba6fb

https://www.illumos.org/issues/7990
  The snapspec_cb() callback function in libzfs does not need to call zfs_strdup().

Reviewed by: Yuri Pankov <yuri.pankov@gmail.com>
Reviewed by: Toomas Soome <tsoome@me.com>
Approved by: Matthew Ahrens <mahrens@delphix.com>
Author: Marcel Telka <marcel@telka.sk>
2018-02-20 20:46:27 +00:00
Alexander Motin
67aa4cb2b9 MFV r316902: 7745 print error if lzc_* is called before libzfs_core_init
illumos/illumos-gate@7c13517fff
7c13517fff

https://www.illumos.org/issues/7745
  The problem is that consumers of `libZFS_Core` that forget to call
  `libzfs_core_init()` before calling any other function of the library
  are having a hard time realizing their mistake. The library's internal
  file descriptor is declared as global static, which is ok, but it is not
  initialized explicitly; therefore, it defaults to 0, which is a valid
  file descriptor. If `libzfs_core_init()`, which explicitly initializes
  the correct fd, is skipped, the ioctl functions return errors that do
  not have anything to do with `libZFS_Core`, where the problem is
  actually located.
  Even though assertions for that existed within `libZFS_Core` for debug
  builds, they were never enabled because the `-DDEBUG` flag was missing
  from the compiler flags.
  This patch applies the following changes:
  1. It adds `-DDEBUG` for debug builds of `libZFS_Core` and `libzfs`,
         to enable their assertions on debug builds.
  2. It corrects an assertion within `libzfs`, where a function had
         been spelled incorrectly (`zpool_prop_unsupported()`) and nobody
         knew because the `-DDEBUG` flag was missing, and the preprocessor
         was taking that part of the code away.
  3. The library's internal fd is initialized to `-1` and `VERIFY`
         assertions have been placed to check that the fd is not equal to
         `-1` before issuing any ioctl. It is important here to note, that
         the `VERIFY` assertions exist in both debug and non-debug builds.
  4. In `libzfs_core_fini` we make sure to never increment the
         refcount of our fd below 0, and also reset the fd to `-1` when no
         one refers to it. The reason for this, is for the rare case that
         the consumer closes all references but then calls one of the
         library's functions without using `libzfs_core_init()` first, and
         in the mean time, a previous call to `open()` decided to reuse
         our previous fd. This scenario would have passed our assertion in

Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Serapheim Dimitropoulos <serapheim@delphix.com>
2018-02-20 20:40:55 +00:00
Alexander Motin
499a9a917a MFV r316901:
7730 libzfs`add_config() leaks config nvl when reading spare/l2cache devices

illumos/illumos-gate@105686550e
105686550e

https://www.illumos.org/issues/7730
  antares:root:~# mdb /usr/sbin/zpool
  > ::sysbp _exit
  > ::run import
     pool: data
       id: 2093977168778024605
    state: ONLINE
   action: The pool can be imported using its name or numeric identifier.
   config:

          data        ONLINE
            c6t0d0    ONLINE
            c6t1d0    ONLINE
          cache
            c6t2d0
  mdb: stop on entry to _exit
  mdb: target stopped at:
  0xfee556ba:     nop
  mdb: You've got symbols!
  Loading modules: [ ld.so.1 libumem.so.1 libc.so.1 libtopo.so.1 libavl.so.1
  libnvpair.so.1 ]
  > ::findleaks -d
  BYTES             LEAKED VMEM_SEG CALLER
  4096                  10 fda7b000 MMAP
  8192                   1 fea8d000 MMAP
  8192                   1 fe76d000 MMAP
  8192                   1 fe66e000 MMAP
  4096                   1 fe570000 MMAP
  8192                   1 fe470000 MMAP
  4096                   1 fe372000 MMAP
  4096                   1 fe273000 MMAP

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Yuri Pankov <yuri.pankov@nexenta.com>
2018-02-20 20:37:01 +00:00
Alexander Motin
9f6122a6e7 MFV r316893:
7604 if volblocksize property is the default, it displays as "-" rather than 8K

illumos/illumos-gate@4d86c0eab2
4d86c0eab2

https://www.illumos.org/issues/7604
  If a zvol has the default setting for the "volblocksize" property, it is
  8KB. However, it is displayed as "-" (not present), rather than "8K".
  The problem was introduced by:
  commit 25228e830e86924a41243343b1de9daf2d7dd43a
      Author: Matthew Ahrens &lt;mahrens@delphix.com&gt;
      Date:   Thu Nov 17 14:37:24 2016 -0800
  7571 non-present readonly numeric ZFS props do not have default value
  which changed changed get_numeric_property() to indicate that readonly
  default properties are not present. However, zfs_prop_readonly() returns
  TRUE for both readonly and set-once properties (e.g. volblocksize).
  Amusingly, that commit essentially reverted:
  6900484 default volblocksize is no longer being reported correctly
  from November 2009. However, that change was not correct either; the
  correct solution is to only do this check for "truly readonly" (i.e. not
  setonce) properties.
  $ zfs list -t volume -o name,volblocksize
      NAME
  VOLBLOCK
      domain0/group-100/appdata_container-101/appdata_windows_timeflow-102/
  archive            -
      domain0/group-100/appdata_container-101/appdata_windows_timeflow-102/
  datafile           -
      domain0/group-100/appdata_container-101/appdata_windows_timeflow-102/
  external           -
      rpool/dump
  128K
      rpool/swap
  4K
      rpool/swap1
  ===============================================================================

Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>
2018-02-20 20:34:04 +00:00
Alexander Motin
806141acfc MFV r316876: 7542 zfs_unmount failed with EZFS_UNSHARENFSFAILED
illumos/illumos-gate@09c9e6dc9b
09c9e6dc9b

https://www.illumos.org/issues/7542
  libshare keeps a cached copy of the sharetab listing in memory, which can
  become out of date if shares are destroyed or created while leaving a libzfs
  handle open. This results in a spurious unmounting failure when an NFS share
  exists but isn't in the stale libshare cache.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Matt Amdur <matt.amdur@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Chris Williamson <chris.williamson@delphix.com>
2018-02-20 20:30:40 +00:00
Alexander Motin
6d03c5504b MFV r316875: 7336 vfork and O_CLOEXEC causes zfs_mount EBUSY
illumos/illumos-gate@873c4903a5
873c4903a5

https://www.illumos.org/issues/7336
  We can run into a problem where we call into zfs_mount, which in turn calls
  is_dir_empty, which opens the directory to try and make sure it's empty. The
  issue with the current approach is that it holds the directory open while it
  traverses it with readdir, which, due to subtle interaction with the Java JVM,
  vfork, and exec can cause a tricky race condition resulting in zfs_mount
  failures.
  The approach to resolving the issue in this patch is to drop the usage of
  readdir altogether, and instead rely on the fact that ZFS stores the number of
  entries contained in a directory using the st_size field of the stat structure.
  Thus, if the directory in question is a ZFS directory, we can check to see if
  it's empty by calling stat() and inspecting the st_size field of structure
  returned.
  ===============================================================================
  The root cause appears to be an interesting race between vfork, exec, and
  zfs_mount's usage of O_CLOEXEC when calling openat. Here's what is going on:
  1. We call zfs_mount, and this in turn calls openat to check if the directory
  is empty, which results in opening the directory we're trying to mount onto,
  and increment v_count.
  2. As we're in the middle of reading the directory, vfork is called by the JVM
  and proceeds to exec the jspawnhelper utility. As a result of the vfork, we
  take an additional hold on the directory, which increments v_count a second
  time. The semantics of vfork mean the parent process will wait for the child
  process to exit or exec before the parent can continue; at this point the
  parent is in the middle of zfs_mount, reading the directory to determine if
  it's empty or not.
  3. The child process exec-ing jspawnhelper gets to the relvm call within
  exec_args (which is called by exec_common). relvm is the function that releases
  the parent process, allowing the parent to proceed. The problem is, at this
  point of calling relvm, the child hasn't yet called close_exec which is
  responsible for closing the file descriptors inherited from the parent process

Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Robert Mustacchi <rm@joyent.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Prakash Surya <prakash.surya@delphix.com>
2018-02-20 20:26:48 +00:00
Alexander Motin
c2e79753fe MFV r316873: 7233 dir_is_empty should open directory with CLOEXEC
illumos/illumos-gate@d420209d9c
d420209d9c

https://www.illumos.org/issues/7233
  This fixes a race where one thread is executing zfs_mount() while another
  thread forks and execs. If the fork occurs while the directory is open, the
  child process will inherit (but not necessarily close immediately) the open fd
  for the directory, preventing the mount.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Alex Reece <alex@delphix.com>
2018-02-20 20:17:19 +00:00
Alexander Motin
91e16b7e23 MFV r316872: 7502 ztest should run zdb with -G (debug mode)
illumos/illumos-gate@c3c65d17f7
c3c65d17f7

https://www.illumos.org/issues/7502
  Right now ztest executes zdb without -G, so when it has errors, the messages
  are often not very helpful:
  Executing zdb -bccsv -d -U /rpool/tmp/zpool.cache ztest
  zdb: can't open 'ztest': Operation not supported
  ztest: '/usr/sbin/amd64/zdb -bccsv -d -U /rpool/tmp/zpool.cache ztest' exit
  code 1
  With -G, we'd have:
  /usr/sbin/amd64/zdb -bccsv -d -U /rpool/tmp/zpool.cache -G ztest
  zdb: can't open 'ztest': Operation not supported

  ZFS_DBGMSG(zdb):
  spa_open_common: opening ztest
  spa_load(ztest): LOADING
  spa_load(ztest): FAILED: unable to parse config [error=48]
  spa_load(ztest): UNLOADING
  Which indicates where the error came from

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Approved by: Gordon Ross <gordon.w.ross@gmail.com>
Author: Pavel Zakharov <pavel.zakharov@delphix.com>
2018-02-20 20:14:11 +00:00
Alan Somers
d4c225b01c Fix memory leaks in zdb introduced by r329508
Reported by:	Coverity
CID:		1386185
MFC after:	3 weeks
X-MFC-With:	329508
Sponsored by:	Spectra Logic Corp
2018-02-20 19:54:06 +00:00
Alexander Motin
c156ddfcb6 MFV r324198: 8081 Compiler warnings in zdb
illumos/illumos-gate@3f7978d02b
3f7978d02b

https://www.illumos.org/issues/8081
  zdb(8) is full of minor problems that generate compiler warnings. On FreeBSD,
  which uses -WError, the only way to build it is to disable all compiler
  warnings. This makes it much harder to detect newly introduced bugs. We should
  cleanup all the warnings.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Alan Somers <asomers@gmail.com>
2018-02-18 04:00:29 +00:00
Alexander Motin
6c5aa8d10e MFV r323911:
8502 illumos#7955 broke delegated datasets when libshare is not present

illumos/illumos-gate@1c18e8fbd8
1c18e8fbd8

https://www.illumos.org/issues/8502
  The code in lib/libzfs/common/libzfs_mount.c already basically handles
  the case when libshare is not installed. We just need to not fail in
  zfs_init_libshare_impl.  I tested this in lx and things work as
  expected. I also tested there trying to set sharenfs and sharesmb on
  the delegated dataset. Neither is allowed from within a zone.  The
  spew of msgs from a native zone is not ZFS specific. I see the same
  spew simply running the share command.

Reviewed by: Robert Mustacchi <rm@joyent.com>
Reviewed by: Yuri Pankov <yuripv@gmx.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Jerry Jelinek <jerry.jelinek@joyent.com>
2018-02-18 01:42:17 +00:00
Devin Teske
492494d92f Add inline to errno.d for translating int to string
Gives DTrace scripts strerror(3) functionality.

Reviewed by:	markj
Sponsored by:	Smule, Inc.
Differential Revision:	https://reviews.freebsd.org/D14386
2018-02-16 04:22:29 +00:00
Alan Somers
d727d8b0b1 Optimize zfsd for the happy case
If there are no damaged pools, then ignore all GEOM events.  We only use
them to fix damaged pools.  However, still pay attention to ZFS events.

MFC after:	20 days
X-MFC-With:	329284
Sponsored by:	Spectra Logic Corp
2018-02-15 21:30:30 +00:00
Devin Teske
85ef133db3 Add the following errno definitions to /usr/lib/dtrace/errno.d
ENOTCAPABLE (93)
ECAPMODE (94)
ENOTRECOVERABLE (95)
EOWNERDEAD (96)
ERELOOKUP (-5)

and update ELAST (92 -> 96)

Reviewed by:	markj
Sponsored by:	Smule, Inc.
Differential Revision:	https://reviews.freebsd.org/D14340
2018-02-15 18:37:32 +00:00
Alan Somers
a07d59d1da zfsd: Allow zfsd to work on any type of GEOM provider
cddl/usr.sbin/zfsd/zfsd_event.cc
	Remove the check for da and ada devices.  This way zfsd can work on md,
	geli, glabel, gstripe, etc devices.  geli in particular is useful
	combined with ZFS.  gnop is also useful for simulating drive pulls in
	the ZFSD test suite.

	Also, eliminate the DevfsEvent class entirely.  Move its
	responsibilities into GeomEvent.  We can get everything we need to know
	just from listening to GEOM events.

lib/libdevdctl/event.cc
	Fix GeomEvent::DevName for CREATE events.  Oddly, the relevant field is
	named "cdev" for CREATE events but "devname" for disk events.

MFC after:	3 weeks
Relnotes:	Yes (probably worth mentioning the geli part)
Sponsored by:	Spectra Logic Corp
2018-02-14 23:52:39 +00:00
Devin Teske
980b68eaa2 Use tabs in io.d, fix alignment issues, remove extraneous newlines 2018-02-12 23:53:38 +00:00
Alan Somers
ad4bbe575b Fix "zpool add" crash when a replacing vdev has a spare child
Fix an assertion in zpool that causes a crash when running any "zpool add"
command on a spare that contains a replacing vdev with a spare child.

This likely affects Illumos, too.

PR:		225546
MFC after:	3 weeks
Sponsored by:	Spectra Logic Corp
Differential Revision:	https://reviews.freebsd.org/D14138
2018-02-09 16:08:57 +00:00
Alan Somers
af63bbd0f0 zfsd: Don't spare a vdev that's being replaced
If a zfs pool contains a replacing vdev (either created manually by "zpool
replace" or by zfsd(8) via autoreplace by physical path) and then new spares
get added to the pool, zfsd shouldn't use one to replace the drive that is
already being replaced.  That's a waste of resources that just slows down
the rebuild.

PR:		225547
MFC after:	3 weeks
Sponsored by:	Spectra Logic Corp
2018-01-30 21:25:43 +00:00
Mark Johnston
3ddcfe1e46 Remove uneeded parentheses.
MFC after:	1 week
2018-01-25 15:31:56 +00:00
Alexander Motin
0d6993dfd3 MFV r328255: 8972 zfs holds: In scripted mode, do not pad columns with spaces
illumos/illumos-gate@e9b7d6e7f7

https://www.illumos.org/issues/8972:
'zfs holds -H' does not properly output content in scripted mode. It uses a
tab instead of two spaces, but it still pads column widths with spaces when
it should not.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Allan Jude <allanjude@freebsd.org>
2018-01-22 06:00:45 +00:00
Alexander Motin
4eb2803697 MFV r328251: 8652 Tautological comparisons with ZPROP_INVAL
illumos/illumos-gate@4ae5f5f06c

https://www.illumos.org/issues/8652:
Clang and GCC prefer to use unsigned ints to store enums. With Clang, that
causes tautological comparison warnings when comparing a zfs_prop_t or
zpool_prop_t variable to the macro ZPROP_INVAL. It's likely that error
handling code is being silently removed as a result.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Igor Kozhukhov <igor@dilos.org>
Approved by: Gordon Ross <gwr@nexenta.com>
Author: Alan Somers <asomers@gmail.com>
2018-01-22 05:52:39 +00:00
Alexander Motin
cf52647c41 MFV r328249:
8641 "zpool clear" and "zinject" don't work on "spare" or "replacing" vdevs

illumos/illumos-gate@2ba5f978a4

https://www.illumos.org/issues/8641:
"zpool clear" and "zinject -d" can both operate on specific vdevs, either
leaf or interior. However, due to an oversight, neither works on a "spare"
or "replacing" vdev. For example:

sudo zpool create foo raidz1 c1t5000CCA000081D61d0 c1t5000CCA000186235d0 spare c
1t5000CCA000094115d0
sudo zpool replace foo c1t5000CCA000186235d0 c1t5000CCA000094115d0
$ zpool status foo pool: foo
state: ONLINE
scan: resilvered 81.5K in 0h0m with 0 errors on Fri Sep 8 10:53:03 2017
config:

NAME                         STATE     READ WRITE CKSUM
        foo                          ONLINE       0     0     0
          raidz1-0                   ONLINE       0     0     0
            c1t5000CCA000081D61d0    ONLINE       0     0     0
            spare-1                  ONLINE       0     0     0
              c1t5000CCA000186235d0  ONLINE       0     0     0
              c1t5000CCA000094115d0  ONLINE       0     0     0
        spares
          c1t5000CCA000094115d0      INUSE     currently in use
$ sudo zinject -d spare-1 -A degrade foo
cannot find device 'spare-1' in pool 'foo'
$ sudo zpool clear foo spare-1
cannot clear errors for spare-1: no such device in pool

Even though there was nothing to clear, those commands shouldn't have
reported an error. by contrast, trying to clear "raidz1-0" works just fine:
$ sudo zpool clear foo raidz1-0

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Gordon Ross <gwr@nexenta.com>
Author: Alan Somers <asomers@gmail.com>
2018-01-22 04:37:04 +00:00
Alexander Motin
d26c97e2fb MFV r328233:
8898 creating fs with checksum=skein on the boot pools fails ungracefully

illumos/illumos-gate@9fa2266d9a

https://www.illumos.org/issues/8898:
# zfs create -o checksum=skein rpool/test
internal error: Result too large
Abort (core dumped)

Not a big deal per se, but should be handled correctly.

Reviewed by: Toomas Soome <tsoome@me.com>
Reviewed by: Andy Stormont <astormont@racktopsystems.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Yuri Pankov <yuri.pankov@nexenta.com>

PR:		222199
2018-01-22 00:01:36 +00:00
Alexander Motin
9b347baa3a MFV r328231: 8897 zpool online -e fails assertion when run on non-leaf vdevs
illumos/illumos-gate@9a551dd645

https://www.illumos.org/issues/8897:
# zpool online -e test mirror-1
Assertion failed: nvlist_lookup_string(tgt, "path", &pathname) == 0, file ../common/libzfs_pool.c, line 2558, function zpool_vdev_online
Abort (core dumped)

Not a big deal per se, but should be handled gracefully, same way as 'offline' and 'online' without '-e'.

Also reported as: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221408

Reviewed by: Toomas Soome <tsoome@me.com>
Reviewed by: Igor Kozhukhov <igor@dilos.org>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Yuri Pankov <yuri.pankov@nexenta.com>
2018-01-21 23:53:56 +00:00
Alexander Motin
442814b7e7 MFV r328220: 8677 Open-Context Channel Programs
illumos/illumos-gate@a3b2868063

https://www.illumos.org/issues/8677
  We want to be able to run channel programs outside of synching context.
  This would greatly improve performance of channel program that just gather
  information, as we won't have to wait for synching context anymore.

  This feature should introduce the following:
  - A new command line flag in "zfs program" to specify our intention to
  run in open context.
  - A new flag/option within the channel program ioctl which selects the
  context.
  - Appropriate error handling whenever we try a channel program in
  open-context that contains zfs.sync* expressions.
  - Documentation for the new feature in the manual pages.

Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Chris Williamson <chris.williamson@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Serapheim Dimitropoulos <serapheim@delphix.com>
2018-01-21 23:02:05 +00:00
Mark Johnston
d678ce4b6b Remove tst.zonename.d from the list of expected failures.
X-MFC with:	r327888
2018-01-14 17:56:19 +00:00
Mark Johnston
224e0c2f61 Add "jid" and "jailname" variables to DTrace.
These return the jail ID and jail name for the traced process,
respectively, and are analogous to "zonename" on Solaris/illumos.
"zonename" is now aliased to "jailname".

Also add some stress tests for the new variables.

Submitted by:	Domagoj Stolfa <domagoj.stolfa@gmail.com>
Reviewed by:	dteske (previous version)
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D13877
2018-01-12 19:59:46 +00:00
Mark Johnston
60eddb209b Add a regression test for r327794.
MFC after:	2 weeks
2018-01-10 21:40:36 +00:00
Mark Johnston
ff9ebfc4b6 Fix an off-by-one in dt_opt_setenv().
The bug would cause incorrect behaviour when attempting to override
an already set environment variable with -x setenv, as long as the
variable is not the last one in the array.

Reported by:	Samuel Lepetit <slepetit@apple.com>
MFC after:	2 weeks
2018-01-10 21:37:11 +00:00
Mark Johnston
7c3701904a Mark uctf/err.user64mode.ksh as EXFAIL for now.
MFC after:	1 week
2017-12-15 18:09:23 +00:00
Mark Johnston
97cb52fa9a Actually add the -x setenv test Makefile, missed in r326499.
X-MFC with:	r326499
2017-12-08 19:26:25 +00:00
Mark Johnston
04006780d9 Complete support for dtrace's -x setenv option.
This allows one to override the environment for processes created with
dtrace -c. By default, the environment is inherited.

This support was originally merged from illumos in r249367 but was lost
when the commit was later reverted and then brought back piecemeal.

Reported by:	Samuel Lepetit <slepetit@apple.com>
MFC after:	2 weeks
2017-12-03 16:57:28 +00:00
Mark Johnston
5577b8a709 Add an envp argument to proc_create().
This is needed to support dtrace's -x setenv option.

MFC after:	2 weeks
2017-12-03 16:50:16 +00:00
Alan Somers
013953eb5f Add basic tests for ctfconvert(1), fold(1) and rs(1)
Add basic command line parsing test coverage for these utilities.  The tests
were automatically generated based on their man pages.  These tests can be
expanded by hand for more thorough coverage.  The aim is to generate very
basic amount of test coverage for all the utilities in the base system.

Tests generated via: https://github.com/shivansh/smoketestsuite/

Submitted by:	shivansh
Reviewed by:	asomers
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D12424
2017-11-27 20:01:58 +00:00
Mark Johnston
f4b90f5a5c Revert r326181 for now.
We can't link an executable using -m32 until the lib32 phase of a
buildworld, though the build works fine when executing make from
cddl/usr.sbin/dtrace/tests. Some other solution will need to be found.
2017-11-27 17:54:17 +00:00
Andriy Gapon
718cb91ccc zdb: follow-up to r326150, check if malloc succeeded
Reported by:	rpokala
MFC after:	1 week
X-MFC with:	r326150
2017-11-25 09:47:31 +00:00
Mark Johnston
e96f62322b Compile one of the uctf test programs with -m32.
The err.user64mode.ksh test expects it to run as a 32-bit process.

MFC after:	1 week
2017-11-24 19:57:13 +00:00
Mark Johnston
eb381edab7 Fix the type signature for sx(9) DTrace subroutines.
MFC after:	1 week
2017-11-24 19:05:45 +00:00
Andriy Gapon
fd74a38251 zdb: use a heap allocation instead of a huge array on stack
SPA_MAXBLOCKSIZE is 16 MB and having such a large object on the stack is
not nice in general and it could cause some confusing failures in the
single-user mode where the default stack size of 8 MB is used.

I expect that the upstream would make the same change.

MFC after:	1 week
2017-11-24 10:45:33 +00:00
Mark Johnston
483f7100b4 Annotate pragma/err.invalidlibdep.ksh as EXFAIL.
The test creates a D library with a "depends_on library" pragma
referencing a non-existent library, and expects compilation to fail.
However, as far as I can tell, libdtrace is supposed simply abort
compilation of the library in this case, and continue. This behaviour
is desirable when adding libraries which depend on optional KLDs, for
example.

MFC after:	1 week
2017-11-22 15:54:52 +00:00
Mark Johnston
7c72b10912 Annotate usdt/tst.eliminate.ksh as EXFAIL.
It appears to depend on some behaviour specific to the Sun link editor.

MFC after:	1 week
2017-11-21 16:00:18 +00:00
Mark Johnston
fc81ca0045 Don't assume that we can resolve "main" in the ksh executable.
MFC after:	1 week
2017-11-21 15:03:38 +00:00
Ed Maste
4e4805ddf1 dt_modtext: return error on archs lacking an implementation
Reported by:	mmel
Reviewed by:	markj
MFC after:	1 week
MFC with:	r325042
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D13176
2017-11-21 03:15:32 +00:00
Mark Johnston
ad1633b34b Take r313504 into account when recomputing the string table length.
When we encounter a USDT probe in a weak symbol, we emit an alias for
the probe function symbol. Such aliases are named differently from the
aliases we emit for probes in local functions, so make sure to take that
difference into account when resizing the output object file's string
table. Otherwise, we underrun the string table buffer.

PR:		223680
2017-11-16 07:14:29 +00:00
Bryan Drewery
ea825d0274 DIRDEPS_BUILD: Update dependencies.
Sponsored by:	Dell EMC Isilon
2017-10-31 00:07:04 +00:00
Bryan Drewery
3806950135 DIRDEPS_BUILD: Connect new directories.
Sponsored by:	Dell EMC Isilon
2017-10-31 00:04:07 +00:00
Ed Maste
732f2c69a1 libdtrace: replace "DOODAD" with more descriptive string
Previously some unimplemented libdtrace routines printed the function,
file and line number, followed by "DOODAD." That is not particularly
informative, so replace it with a message reporting the actual issue.

Sponsored by:	The FreeBSD Foundation
2017-10-27 16:23:45 +00:00
Andriy Gapon
1c05a6ea6b MFV r325013,r325034: 640 number_to_scaled_string is duplicated in several commands
illumos/illumos-gate@0a0551200e
0a0551200e

https://www.illumos.org/issues/640
  du(1), df(1m), ls(1), and swap(1m) all include a copy (it appears literally
  copied) of the 'number_to_scaled_string' function in their source. This should
  be moved to a shared library and all 4 commands should use this instead.

FreeBSD note: of all libcmdutils functionality ZFS (and other illumos
contrib code) currently uses only nicenum() function (which is similar
to humanize_number but has some formatting differences).  For this
reason I decided to not port the whole library.  As a result, nicenum.c
from libcmdutils is compiled into libzfs and libzpool.  This is a bit
ugly, but works.  If one day we are forced to create libillumos, then
the file should be moved to that library.

Reviewed by: Sebastian Wiedenroth <wiedi@frubar.net>
Reviewed by: Robert Mustacchi <rm@joyent.com>
Reviewed by: Yuri Pankov <yuripv@gmx.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Jason King <jason.brian.king@gmail.com>

MFC after:	2 weeks
2017-10-27 12:37:22 +00:00
Alan Somers
12a88a3d63 zfsd should be able to online an L2ARC that disappears and returns
Previously, this didn't work because L2ARC devices' labels don't contain
pool GUIDs.  Modify zfsd so that the pool GUID won't be required:

lib/libdevdctl/guid.h
	Change INVALID_GUID from a uint64_t constant to a function that
	returns an invalid Guid object.  Remove the void constructor.
	Nothing uses it, and it violates RAII.

cddl/usr.sbin/zfsd/case_file.h
cddl/usr.sbin/zfsd/case_file.cc
	Allow CaseFile::Find to match a CaseFile based on Vdev GUID alone.
	In CaseFile::ReEvaluate, attempt to online devices even if the newly
	arrived device has no pool GUID.

cddl/usr.sbin/zfsd/vdev_iterator.cc
	Iterate through a pool's cache devices as well as its regular
	devices.

Reported by:	avg
Reviewed by:	avg
MFC after:	3 weeks
Sponsored by:	Spectra Logic Corp
Differential Revision:	https://reviews.freebsd.org/D12791
2017-10-26 15:28:18 +00:00
Alan Somers
63f8025d6a Fix zpool_read_all_labels when vfs.aio.enable_unsafe=0
Previously, zpool_read_all_labels was trying to do 256KB reads, which are
greater than the default MAXPHYS and therefore must go through the slow,
unsafe AIO path.  Shrink these reads to 112KB so they can use the safe, fast
AIO path instead.

MFC after:	1 week
X-MFC-With:	324568
Sponsored by:	Spectra Logic Corp
2017-10-25 16:01:19 +00:00
Mark Johnston
7421ff0751 Address some miscellaneous issues in the CTF man page.
- Fix a number of typos.
- Replace some illumos-specific references.
- Note that a type definition of kind CTF_K_FUNCTION may be followed by
  a null type identifier in order to provide 4-byte alignment for the
  next type definition.

MFC after:	2 weeks
2017-10-22 19:17:25 +00:00
Mark Johnston
ef36b3f756 MFV r323105 (partial): 8300 fix man page issues found by mandoc 1.14.1
illumos/illumos-gate@72d3dbb9ab
72d3dbb9ab

https://www.illumos.org/issues/8300
  Prior to integrating the mdocml update to 1.14.1, fix issues found by
  new version, especially the "new sentence, new line" style rule.

FreeBSD note: this revision merges only the changes to the CTF manual
page. The changes to the ZFS pages cannot be applied directly.

Reviewed by: Robert Mustacchi <rm@joyent.com>
Reviewed by: Toomas Soome <tsoome@me.com>
Approved by: Gordon Ross <gwr@nexenta.com>
Author: Yuri Pankov <yuri.pankov@nexenta.com>

Discussed with:	avg
MFC after:	2 weeks
2017-10-22 18:32:28 +00:00
Alan Somers
b49e9abcf4 Optimize zpool_read_all_labels with AIO
Read all labels in parallel instead of sequentially

MFC after:	3 weeks
X-MFC-With: 322854
Sponsored by:	Spectra Logic Corp
Differential Revision:	https://reviews.freebsd.org/D12495
2017-10-12 21:25:11 +00:00
Mark Johnston
11a7f121b8 Avoid adding an extra "0x" prefix before pointer formats.
MFC after:	1 week
2017-10-06 18:29:00 +00:00
Andriy Gapon
ac63ac6859 zdb.8: replace with the slighly modified upstream version
The upstream has converted their manual page to the same format as we
have, so we can use the upstream version with minimal modifications.

The current modifications are:
- different zpool.cache path
- different manual sections for zdb, zfs, zpool commands

igor reports a few minor issues, it would be nice to fix them both in
FreeBSD and in the upstream.

MFC after:	3 weeks
2017-10-06 08:28:35 +00:00
Andriy Gapon
e117882ba2 MFV r322235: 8067 zdb should be able to dump literal embedded block pointer
illumos/illumos-gate@4923c69fdd
4923c69fdd

FreeBSD note: the manual page is to be updated separately.

https://www.illumos.org/issues/8067
  Add an option to zdb to print a literal embedded block pointer supplied on the
  command line:
  zdb -E [-A] word0:word1:...:word15

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Alex Reece <alex@delphix.com>
Reviewed by: Yuri Pankov <yuri.pankov@gmail.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>

MFC after:	3 weeks
2017-10-06 08:21:06 +00:00
Andriy Gapon
c40fbcbc18 MFV r316934: 7340 receive manual origin should override automatic origin
illumos/illumos-gate@ed4e7a6a5c
ed4e7a6a5c

https://www.illumos.org/issues/7340
  When -o origin=<snapshot> is specified as part of a ZFS receive, that origin
  should override the automatic detection in libzfs.

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Paul Dagnelie <pcd@delphix.com>

MFC after:	3 weeks
2017-10-06 08:17:12 +00:00
Andriy Gapon
3c7b32b890 MFV r316933: 5142 libzfs support raidz root pool (loader project)
illumos/illumos-gate@d5f26ad812
d5f26ad812

https://www.illumos.org/issues/5142
  the current libzfs only allows simple disk and mirror setup for boot pool, as
  loader does support booting from raidz, this feature will remove raidz
  restriction from boot pool setup.

FreeBSD note: we have long supported this feature, this commit only
removes a small difference in libzfs.

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
Reviewed by: Albert Lee <trisk@omniti.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Toomas Soome <tsoome@me.com>

MFC after:	3 weeks
2017-10-06 08:15:37 +00:00
Andriy Gapon
6a880de6e0 MFV r316931: 6268 zfs diff confused by moving a file to another directory
illumos/illumos-gate@aab04418a7
aab04418a7

https://www.illumos.org/issues/6268
  The zfs diff command presents a description of the changes that have occurred
  to files within a filesystem between two snapshots. If a file is renamed, the
  tool is capable of reporting this, e.g.:
  cd /some/zfs/dataset/subdir
  mv file0 file1
  Will result in a diff record like:
  R        /some/zfs/dataset/subdir/file0  ->  /some/zfs/dataset/subdir/file1
  Unfortunately, it seems that rename detection only uses the base filename to
  determine if a file has been renamed or simply modified. This leads to
  misreporting only the original filename, omitting the more relevant destination
  filename entirely. For example:
  cd /some/zfs/dataset/subdir
  mv file0 ../otherdir/file0
  Will result in a diff entry:
  M        /some/zfs/dataset/subdir/file0
  But it should really emit:
  R        /some/zfs/dataset/subdir/file0  ->  /some/zfs/dataset/otherdir/file0

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Justin Gibbs <gibbs@scsiguy.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Joshua M. Clulow <josh@sysmgr.org>

MFC after:	3 weeks
2017-10-06 08:12:13 +00:00
Andriy Gapon
f079bf7ce3 MFV r316877: 7571 non-present readonly numeric ZFS props do not have default value
illumos/illumos-gate@ad2760acbd
ad2760acbd

https://www.illumos.org/issues/7571
  ZFS displays the default value for non-present readonly numeric (and index)
  properties. However, these properties default values are not meaningful.
  Instead, we should display a "-", indicating that they are not present. For
  example, on a version-12 pool, the usedby* properties are not available, but
  they show up as the incorrect value "0":
     1. zfs get all test12
        ...
        test12 usedbysnapshots 0 -
        test12 usedbydataset 0 -
        test12 usedbychildren 0 -
        test12 usedbyrefreservation 0 -
  We will be introducing more sometimes-present numeric readonly properties, so
  it would be nice to fix this.

Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>

MFC after:	3 weeks
2017-10-06 08:10:54 +00:00
Andriy Gapon
f51efc68f6 MFV r316864: 6392 zdb: introduce -V for verbatim import
illumos/illumos-gate@dfd5965f7e
dfd5965f7e

FreeBSD note: the manual page is to be updated separately.

https://www.illumos.org/issues/6392
  When given a pool name via -e, zdb would attempt an import. If it
  failed, then it would attempt a verbatim import. This behavior is
  not always desirable so a -V switch is added to zdb to control the
  behavior. When specified, a verbatim import is done. Otherwise,
  the behavior is as it was previously, except no verbatim import
  is done on failure.
  a5778ea242
  Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
  Reviewed by: Matthew Ahrens <mahrens@delphix.com>

Reviewed by: Yuri Pankov <yuri.pankov@gmail.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Richard Yao <ryao@gentoo.org>

MFC after:	3 weeks
2017-10-06 08:09:20 +00:00
Andriy Gapon
65512687a9 MFV r316862: 6410 teach zdb to perform object lookups by path
illumos/illumos-gate@ed61ec1da9
ed61ec1da9

FreeBSD note: this commit does not update the manual page.
The original change includes conversion of the manual page from *roff format
to mandoc format.  So, it is hard to extract the content change from
that.  I am going to replace our zdb manual page, which is an earlier
independent conversion, with a slighly modified version of the upstream page.

https://www.illumos.org/issues/6410
  This is primarily intended to ease debugging & testing ZFS when one is only
  interested in things like the on-disk location of a specific object's blocks,
  but doesn't know their object id. This allows doing things like the following
  (FreeBSD-based example):
          # zpool create -f foo da0
          # dd if=/dev/zero of=/foo/1 bs=1M count=4 >/dev/null 2>&1
          # zpool export foo
          # zdb -vvvvv -o "ZFS plain file" foo /1
          Object  lvl   iblk   dblk  dsize  lsize   %full  type
               8    2    16K   128K  3.99M     4M  100.00  ZFS plain file (K=inherit) (Z=inherit)
                                              168   bonus  System attributes
          dnode flags: USED_BYTES USERUSED_ACCOUNTED
          dnode maxblkid: 31
          path    /1
          uid     0
          gid     0
          atime   Thu Apr 23 22:45:32 2015
          mtime   Thu Apr 23 22:45:32 2015
          ctime   Thu Apr 23 22:45:32 2015
          crtime  Thu Apr 23 22:45:32 2015
          gen     7
          mode    100644
          size    4194304
          parent  4
          links   1
          pflags  40800000004
          Indirect blocks:
                0 L1  DVA[0]=<0:c19200:600> DVA[1]=<0:10800019200:600> [L1 ZFS
  plain file] fletcher4 lz4 LE contiguous unique double size=4000L/200P birth=7L/

Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Will Andrews <will@freebsd.org>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Yuri Pankov <yuri.pankov@nexenta.com>

MFC after:	3 weeks
2017-10-06 07:52:25 +00:00
Alan Somers
d22f655459 MFV r319743: 8108 zdb -l fails to read labels 2 and 3
illumos/illumos-gate@22c8b9583d
22c8b9583d

https://www.illumos.org/issues/8108

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Toomas Soome <tsoome@me.com>
Reviewed by: Igor Kozhukhov <igor@dilos.org>
Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Yuri Pankov <yuri.pankov@nexenta.com>

MFC after:	3 weeks
2017-10-02 22:39:12 +00:00
Alan Somers
58dde1c585 MFV r316863: 3871 fix issues introduced by 3604
illumos/illumos-gate@de05b58863
de05b58863

https://www.illumos.org/issues/3871
  GCC 4.5.3 on Gentoo Linux did not like a few of the changes made in the issue
  3604 patch. It printed an error and a couple of warnings:
  ../../cmd/zdb/zdb.c: In function 'dump_bpobj':
  ../../cmd/zdb/zdb.c:1257:3: error: 'for' loop initial declarations are only
  allowed in C99 mode
  ../../cmd/zdb/zdb.c:1257:3: note: use option -std=c99 or -std=gnu99 to compile
  your code
  ../../cmd/zdb/zdb.c: In function 'dump_deadlist':
  ../../cmd/zdb/zdb.c:1323:8: warning: too many arguments for format
  ../../cmd/zdb/zdb.c:1323:8: warning: too many arguments for format

Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Richard Yao <ryao@gentoo.org>

MFC after:	3 weeks
2017-10-02 22:35:35 +00:00
Alan Somers
2091d1c3b9 MFV r316861: 6866 zdb -l should return non-zero if it fails to find any label
illumos/illumos-gate@64723e3611
64723e3611

https://www.illumos.org/issues/6866
  Need this for #6865.
  To be generally more scripting-friendly, overload this issue with adding '-q'
  option which should skip printing any label information.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Yuri Pankov <yuri.pankov@nexenta.com>

MFC after:	3 weeks
2017-10-02 22:13:20 +00:00
Alan Somers
66eafdf052 MFC r316858 7280 Allow changing global libzpool variables in zdb
7280 Allow changing global libzpool variables in zdb and ztest through command line

illumos/illumos-gate@0e60744c98
0e60744c98

https://www.illumos.org/issues/7280
  zdb is very handy for diagnosing problems with a pool in a safe and
  quick way. When a pool is in a bad shape, we often want to disable some
  fail-safes, or adjust some tunables in order to open them. In the
  kernel, this is done by changing public variables in mdb. The goal of
  this feature is to add the same capability to zdb and ztest, so that
  they can change libzpool tuneables from the command line.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Pavel Zakharov <pavel.zakharov@delphix.com>

MFC after:	3 weeks
2017-10-02 22:02:04 +00:00
Andriy Gapon
620c2c801b MFV r323913: 8600 ZFS channel programs - snapshot
illumos/illumos-gate@2840dce1a0
2840dce1a0

https://www.illumos.org/issues/8600
  ZFS channel programs should be able to create snapshots.
  In addition to the base snapshot functionality, this will likely entail adding
  extra logic to handle edge cases which were formerly not possible, such as
  creating then destroying a snapshot in the same transaction sync.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Chris Williamson <chris.williamson@delphix.com>

MFC after:	5 weeks
X-MFC after:	r324163
2017-10-02 11:32:08 +00:00
Andriy Gapon
a1f65a15ce MFV r323912: 8592 ZFS channel programs - rollback
illumos/illumos-gate@000cce6b6f
000cce6b6f

https://www.illumos.org/issues/8592
  ZFS channel programs should be able to perform a rollback. This logic will
  probably look pretty similar to zfs.sync.destroy().

Reviewed by: Chris Williamson <chris.williamson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Brad Lewis <brad.lewis@delphix.com>

MFC after:	5 weeks
X-MFC after:	r324163
2017-10-02 11:23:31 +00:00
Andriy Gapon
3e52a05570 MFV r323794: 8605 zfs channel programs: zfs.exists undocumented and non-working
illumos/illumos-gate@5f39f884e2
5f39f884e2

https://www.illumos.org/issues/8605
  zfs.exists() in channel programs doesn't return any result, and should have a
  man page entry.

Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Chris Williamson <chris.williamson@delphix.com>

MFC after:	5 weeks
X-MFC after:	r324163
2017-10-01 16:51:05 +00:00
Andriy Gapon
bda88d07d9 MFV r323530,r323533,r323534: 7431 ZFS Channel Programs, and followups
7431 ZFS Channel Programs

illumos/illumos-gate@dfc115332c
dfc115332c

https://www.illumos.org/issues/7431
  ZFS channel programs (ZCP) adds support for performing compound ZFS
  administrative actions via Lua scripts in a sandboxed environment (with time
  and memory limits).
  This initial commit includes both base support for running ZCP scripts, and a
  small initial library of API calls which support getting properties and
  listing, destroying, and promoting datasets.
  Testing: in addition to the included unit tests, channel programs have been in
  use at Delphix for several months for batch destroying filesystems. The
  dsl_destroy_snaps_nvl() call has also been replaced with

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Approved by: Garrett D'Amore <garrett@damore.org>
Author: Chris Williamson <chris.williamson@delphix.com>

8552 ZFS LUA code uses floating point math

illumos/illumos-gate@916c8d8811
916c8d8811

https://www.illumos.org/issues/8552
  In the LUA interpreter used by "zfs program", the lua format() function
  accidentally includes support for '%f' and friends, which can cause compilation
  problems when building on platforms that don't support floating-point math in
  the kernel (e.g. sparc). Support for '%f' friends (%f %e %E %g %G) should be
  removed, since there's no way to supply a floating-point value anyway (all
  numbers in ZFS LUA are int64_t's).

Reviewed by: Yuri Pankov <yuripv@gmx.com>
Reviewed by: Igor Kozhukhov <igor@dilos.org>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>

8590 memory leak in dsl_destroy_snapshots_nvl()

illumos/illumos-gate@e6ab4525d1
e6ab4525d1

https://www.illumos.org/issues/8590
  In dsl_destroy_snapshots_nvl(), "snaps_normalized" is not freed after it is
  added to "arg".

Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Steve Gonczi <steve.gonczi@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>

FreeBSD notes:
- zfs-program.8 manual page is taken almost as is from the vendor repository,
  no FreeBSD-ification done
- fixed multiple instances of NULL being used where an integer is expected
- replaced ETIME and ECHRNG with ETIMEDOUT and EDOM respectively

This commit adds a modified version of Lua 5.2.4 under
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/lua, mirroring the
upstream.  See README.zfs in that directory for the description of Lua
customizations.
See zfs-program.8 on how to use the new feature.

MFC after:	5 weeks
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D12528
2017-10-01 16:11:07 +00:00
Andriy Gapon
c13f1d82c8 MFV r323535: 8585 improve batching done in zil_commit()
FreeBSD notes:
- this MFV reverts FreeBSD commit r314549 to make the merge easier
- at present our emulation of cv_timedwait_hires is rather poor,
  so I elected to use cv_timedwait_sbt directly
Please see the differential revision for details.
Unfortunately, I did not get any positive reviews, so there could be
bugs in the FreeBSD-specific piece of the merge.
Hence, the long MFC timeout.

illumos/illumos-gate@1271e4b10d
1271e4b10d

https://www.illumos.org/issues/8585
  The current implementation of zil_commit() can introduce significant
  latency, beyond what is inherent due to the latency of the underlying
  storage. The additional latency comes from two main problems:
  1. When there's outstanding ZIL blocks being written (i.e. there's
      already a "writer thread" in progress), then any new calls to
      zil_commit() will block waiting for the currently oustanding ZIL
      blocks to complete. The blocks written for each "writer thread" is
      coined a "batch", and there can only ever be a single "batch" being
      written at a time. When a batch is being written, any new ZIL
      transactions will have to wait for the next batch to be written,
      which won't occur until the current batch finishes.
  As a result, the underlying storage may not be used as efficiently
      as possible. While "new" threads enter zil_commit() and are blocked
      waiting for the next batch, it's possible that the underlying
      storage isn't fully utilized by the current batch of ZIL blocks. In
      that case, it'd be better to allow these new threads to generate
      (and issue) a new ZIL block, such that it could be serviced by the
      underlying storage concurrently with the other ZIL blocks that are
      being serviced.
  2. Any call to zil_commit() must wait for all ZIL blocks in its "batch"
      to complete, prior to zil_commit() returning. The size of any given
      batch is proportional to the number of ZIL transaction in the queue
      at the time that the batch starts processing the queue; which
      doesn't occur until the previous batch completes. Thus, if there's a
      lot of transactions in the queue, the batch could be composed of
      many ZIL blocks, and each call to zil_commit() will have to wait for
      all of these writes to complete (even if the thread calling
      zil_commit() only cared about one of the transactions in the batch).

Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Prakash Surya <prakash.surya@delphix.com>

MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D12355
2017-09-26 11:04:08 +00:00
Alan Somers
76f9ab7444 Close a memory leak when using zpool_read_all_labels
MFC after:	3 weeks
X-MFC-With:	322854
Sponsored by:	Spectra Logic Corp
2017-09-25 20:44:40 +00:00
Andriy Gapon
65b5f7c743 MFV r323790: 8567 Inconsistent return value in zpool_read_label
illumos/illumos-gate@c861bfbd77
c861bfbd77

https://www.illumos.org/issues/8567
  If fstat64 fails, pread64 fails, or the label is unintelligible,
  zpool_read_label will return 0. But if malloc fails, it will return -1. For
  consistency, it should always return -1 on failure or 0 on success.

Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Alan Somers <asomers@gmail.com>

MFC after:	2 weeks
2017-09-20 07:23:50 +00:00
Bryan Drewery
b3b6d7b406 Fix the raise tests.
- The exit probe was not appropriately filtered to only the known pid so it
  was firing on any random process that would exit rather the only the one
  we cared about.
- The dtest script executes the tst.raise*.exe in the background from
  POSIX sh without jobs control.  POSIX mandates that SIGINT be set to
  SIG_IGN in this case.  The test executable never actually tested that
  SIGINT could be caught despite trying to block and delay the signal.
  So the SIGINT sent from raise() is never actually received since it
  is ignored.  This could be fixed by calling 'trap - INT' from dtest
  before running the executable but I've opted to just use SIGUSR1
  instead in these specific tests rather than adding more logic to
  test that SIGINT is not ignored at startup.

These 2 issues meant that the tests would randomly work but only if a process
coincidentally exited during the test.

Reviewed by:	markj
MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
2017-09-15 19:48:48 +00:00
Andriy Gapon
8ac2314e00 MFV r323527: 5815 libzpool's panic function doesn't set global panicstr, ::status not as useful
illumos/illumos-gate@fae6347731
fae6347731

https://www.illumos.org/issues/5815
  When panic() is called from within ztest, the mdb ::status command isn't as
  useful as it could be since the global panicstr variable isn't updated. We
  should modify the function to make sure panicstr is set, so ::status can
  present the error message just like it does on a failed assertion.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed by: Gordon Ross <gordon.ross@nexenta.com>
Reviewed by: Rich Lowe <richlowe@richlowe.net>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Prakash Surya <prakash.surya@delphix.com>
MFC after:	4 weeks
2017-09-13 10:34:31 +00:00
Andriy Gapon
a345c0b23b MFV r323523: 8331 zfs_unshare returns wrong error code for smb unshare failure
illumos/illumos-gate@4f4378cc54
4f4378cc54

https://www.illumos.org/issues/8331
  zfs_unshare returns EZFS_UNSHARENFSFAILED on error for all share types.

Reviewed by: Marcel Telka <marcel@telka.sk>
Reviewed by: Toomas Soome <tsoome@me.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Andrew Stormont <astormont@racktopsystems.com>

MFC after:	4 weeks
2017-09-13 10:23:55 +00:00
Andriy Gapon
c9f7a4ab32 MFV r316932: 6280 libzfs: unshare_one() could fail with EZFS_SHARENFSFAILED
illumos/illumos-gate@d1672efb6f
d1672efb6f

https://www.illumos.org/issues/6280
  The unshare_one() in libzfs could fail with EZFS_SHARENFSFAILED at line 834
  here:
  831    /* make sure libshare initialized */
  832    if ((err = zfs_init_libshare(hdl, SA_INIT_SHARE_API)) != SA_OK) {
  833        free(mntpt);    /* don't need the copy anymore */
  834        return (zfs_error_fmt(hdl, EZFS_SHARENFSFAILED,
  835            dgettext(TEXT_DOMAIN, "cannot unshare '%s': %s"),
  836            name, _sa_errorstr(err)));
  837    }
  The correct error should be EZFS_UNSHARENFSFAILED instead.

Reviewed by: Toomas Soome <tsoome@me.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Gordon Ross <gwr@nexenta.com>
Author: Marcel Telka <marcel.telka@nexenta.com>

MFC after:	4 weeks
2017-09-13 10:22:09 +00:00
Li-Wen Hsu
da3086df18 Fix DTrace test tst_inet_ntop_d: remove definitions are already in libdtrace
We have D definitions for the named values in socket.h after r323253.  Remove
them in test script to prevent compiling failure.

Reviewed by:	markj, gnn
Differential Revision:	https://reviews.freebsd.org/D12334
2017-09-12 16:00:51 +00:00
Mark Johnston
b934b56429 Add a O_CLOEXEC use missed in r323166.
PR:		199810
Reported by:	Jukka A. Ukkonen <jau789@gmail.com>
MFC after:	3 days
2017-09-12 14:38:10 +00:00
Jonathan Anderson
bab31140a2 Remove redundant source and object files.
Reviewed by:	bdrewery, ngie
MFC after:	1 week
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D12208
2017-09-09 13:18:32 +00:00
Andriy Gapon
90354c3200 MFV r323107: 8414 Implemented zpool scrub pause/resume
illumos/illumos-gate@1702cce751
1702cce751

FreeBSD note:  rather than merging the zpool.8 update I copied the zpool
scrub section from the illumos zpool.1m to FreeBSD zpool.8 almost
verbatim.  Now that the illumos page uses the mdoc format, it was an
easier option.  Perhaps the change is not in perfect compliance with the
FreeBSD style, but I think that it is acceptible.

https://www.illumos.org/issues/8414
  This issue tracks the port of scrub pause from ZoL: https://github.com/zfsonlinux/zfs/pull/6167
  Currently, there is no way to pause a scrub. Pausing may be useful when
  the pool is busy with other I/O to preserve bandwidth.

  Description

  This patch adds the ability to pause and resume scrubbing.  This is achieved
  by maintaining a persistent on-disk scrub state.  While the state is 'paused'
  we do not scrub any more blocks.  We do however perform regular scan
  housekeeping such as freeing async destroyed and deadlist blocks while paused.

  Motivation and Context

  Scrub pausing can be an I/O intensive operation and people have been asking
  for the ability to pause a scrub for a while. This allows one to preserve scrub
  progress while freeing up bandwidth for other I/O.

Reviewed by: George Melikov <mail@gmelikov.ru>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Alek Pinchuk <apinchuk@datto.com>

MFC after:	2 weeks
2017-09-09 11:00:07 +00:00
George V. Neville-Neil
958f4928e6 Add D definitions for the named values in socket.h
Reviewed by:	markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D12241
2017-09-07 03:05:16 +00:00
Alan Somers
0a7b460e70 Honor all options of "zfs mount -o"
The existing code in zmount incorrectly parses the comma-delimited option
string. The result is that nmount only honors the last option. AFAICT the
parsing has been broken ever since ZFS's initial import in change 168404.

PR:		222078
Reviewed by:	avg
MFC after:	3 weeks
Sponsored by:	Spectra Logic Corp
Differential Revision:	https://reviews.freebsd.org/D12232
2017-09-05 19:28:35 +00:00
Mark Johnston
afd2f355fb Use O_CLOEXEC when opening persistent handles in libdtrace.
PR:		199810
Submitted by:	jau@iki.fi
MFC after:	1 week
2017-09-05 00:11:06 +00:00
Alan Somers
e6c8b3c98a zfsd(8): Close a race condition when onlining a disk paritition
When inserting a partitioned disk, devfs and geom will announce the whole
disk before they announce the partition. If the partition containing ZFS
extends to one of the disk's extents, then zfsd will see a ZFS label on the
whole disk and attempt to online it. ZFS is smart enough to activate the
partition instead of the whole disk, but only if GEOM has already created
the partition's provider.

cddl/contrib/opensolaris/lib/libzfs/common/libzfs.h
cddl/contrib/opensolaris/lib/libzfs/common/libzfs_import.c
	Add a zpool_read_all_labels method. It's similar to
	zpool_read_label, but it will return the number of labels found.

cddl/usr.sbin/zfsd/zfsd_event.cc
	When processing a DevFS CREATE event, only online a VDEV if we can
	read all four ZFS labels.

Reviewed by:	mav
MFC after:	3 weeks
Sponsored by:	Spectra Logic Corp
Differential Revision:	https://reviews.freebsd.org/D11920
2017-08-24 19:48:41 +00:00
Mark Johnston
9b0ddce6bc Use an updated copy of the CDDL header boilerplate from illumos.
Reported by:	Yuri Pankov <yuripv@gmx.com>
X-MFC with:	r322774
2017-08-21 22:26:49 +00:00
Mark Johnston
a3a7b74e18 Add a regression test for r322773.
MFC after:	1 week
2017-08-21 21:58:42 +00:00
Mark Johnston
ce4da6fccc Fix an off-by-two in the llquantize() action parameter validation.
The aggregation created by llquantize() partitions values into buckets; the
lower bound of the bucket containing the largest values is b^{m+1}, where
b and m are the second and fourth parameters to the action, respectively.
Bucket bounds are stored in a 64-bit integer, and so the llquantize()
validation checks need to verify that b^{m+1} fits in 64 bits. However, it
was only verifying that b^{m-1} fits in 64 bits, so certain parameter
combinations could trigger assertion failures in libdtrace.

PR:		219451
MFC after:	1 week
2017-08-21 21:56:02 +00:00
Andriy Gapon
9c48e95dd9 MFV r322229: 7600 zfs rollback should pass target snapshot to kernel
illumos/illumos-gate@77b171372e
77b171372e

https://www.illumos.org/issues/7600
  At present, the kernel side code seems to blindly rollback to whatever happens
  to be the latest snapshot at the time when the rollback task is processed.
  The expected target's name should be passed to the kernel driver and the sync
  task should validate that the target exists and that it is the latest snapshot
  indeed.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Andriy Gapon <avg@FreeBSD.org>

MFC after:	3 weeks
2017-08-08 10:52:01 +00:00
Andriy Gapon
b4e4140d13 MFV r322223: 8378 crash due to bp in-memory modification of nopwrite block
illumos/illumos-gate@b7edcb9408
b7edcb9408

https://www.illumos.org/issues/8378
  The problem is that zfs_get_data() supplies a stale zgd_bp to dmu_sync(), which
  we then nopwrite against.
  zfs_get_data() doesn't hold any DMU-related locks, so after it copies db_blkptr
  to zgd_bp, dbuf_write_ready()
  could change db_blkptr, and dbuf_write_done() could remove the dirty record.
  dmu_sync() then sees the stale
  BP and that the dbuf it not dirty, so it is eligible for nop-writing.
  The fix is for dmu_sync() to copy db_blkptr to zgd_bp after acquiring the
  db_mtx. We could still see a stale
  db_blkptr, but if it is stale then the dirty record will still exist and thus
  we won't attempt to nopwrite.

Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>

MFC after:	2 weeks
2017-08-08 10:46:51 +00:00
Andriy Gapon
8b30f189d1 MFV r322217: 8418 zfs_prop_get_table() call in zfs_validate_name() is a no-op
illumos/illumos-gate@e09ba01dcd
e09ba01dcd

https://www.illumos.org/issues/8418
  The following line in zfs_validate_name() is just a no-op and it
  should be removed:
      108    (void) zfs_prop_get_table();

Reviewed by: Vitaliy Gusev <gusev.vitaliy@icloud.com>
Approved by: Matthew Ahrens <mahrens@delphix.com>
Author: Marcel Telka <marcel@telka.sk>

MFC after:	2 weeks
2017-08-08 10:30:49 +00:00
Ruslan Bukin
ca20f8ec29 o Replace __riscv__ with __riscv
o Replace __riscv64 with (__riscv && __riscv_xlen == 64)

This is required to support new GCC 7.1 compiler.
This is compatible with current GCC 6.1 compiler.

RISC-V is extensible ISA and the idea here is to have built-in define
per each extension, so together with __riscv we will have some subset
of these as well (depending on -march string passed to compiler):

__riscv_compressed
__riscv_atomic
__riscv_mul
__riscv_div
__riscv_muldiv
__riscv_fdiv
__riscv_fsqrt
__riscv_float_abi_soft
__riscv_float_abi_single
__riscv_float_abi_double
__riscv_cmodel_medlow
__riscv_cmodel_medany
__riscv_cmodel_pic
__riscv_xlen

Reviewed by:	ngie
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D11901
2017-08-07 14:09:57 +00:00
Mark Johnston
22e406c80b Rework and simplify the ksyms(4) implementation.
- Store the symbol table contents in an anonymous swap-backed object. Have
  mmap(/dev/ksyms) map that object, and stop mapping the symbol table into
  the calling process in ksyms_open(). Previously we would cache a pointer
  to the pmap of the opening process, and mmap(/dev/ksyms) would create a
  mapping using the physical address found by a pmap lookup at the initial
  mapping address. However, this assumes that the cached pmap is valid,
  which may not be the case. [1]
- Remove the ksyms ioctl interface. It appears to have been added to work
  around a limitation in libelf that no longer exists; see r321842.
  Moreover, the interface is difficult to support and isn't present in
  illumos. Since ksyms was added specifically to support lockstat(1), it
  is expected that this removal won't have any real impact.
- Simplify ksyms_read() to avoid unnecessary copying.
- Don't call the device handle destructor if we fail to capture a snapshot
  of the kernel's symbol table. devfs will do that for us.

Reported by:	Ilja van Sprundel <ivansprundel@ioactive.com> [1]
Reviewed by:	kib (previous revision)
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D11789
2017-08-03 00:38:13 +00:00
Alexander Motin
f2d3f6918e Add compat shim part missed at r305197.
This fixes compatibility between old kernel and new ZFS tools.
It seems to be tradition to forget it. :(

PR:		221112
MFC after:	3 days
2017-08-02 10:33:47 +00:00
Enji Cooper
4b330699f8 Convert traditional ${MK_TESTS} conditional idiom for including test
directories to SUBDIR.${MK_TESTS} idiom

This is being done to pave the way for future work (and homogenity) in
^/projects/make-check-sandbox .

No functional change intended.

MFC after:	1 weeks
2017-08-02 08:35:51 +00:00
Mark Johnston
3c9308e82b Remove local variables missed in r321842.
X-MFC with:	r321842
2017-08-01 04:52:03 +00:00
Mark Johnston
90ec6fd4ba Let lockstat use ksyms(4)'s mmap interface.
The workaround described in the deleted comment is no longer needed.

MFC after:	1 week
2017-08-01 04:49:54 +00:00
Li-Wen Hsu
ad89d783a6 Add an auxiliary subroutine to generate some events for testing
This test is also timeout on a quiet system because there is nobody triggering
read probefunc while test execution.

Reviewed by:	gnn, markj, ngie
Differential Revision:	https://reviews.freebsd.org/D11731
2017-07-26 12:07:46 +00:00
Li-Wen Hsu
288ebd813a The test case common.funcs.t_dtrace_contrib.tst_basename_d generates a
verifying script which needs being run to complete the test.

While here, add missing shebang.

Reviewed by:	gnn, markj, ngie
Differential Revision:	https://reviews.freebsd.org/D11716
2017-07-25 13:18:28 +00:00
Li-Wen Hsu
637ba06516 Modify glob patterns and expected output to match FreeBSD's implementation.
Reviewed by:	gnn, markj, ngie
Differential Revision:	https://reviews.freebsd.org/D11713
2017-07-25 13:14:02 +00:00
Li-Wen Hsu
5604d0f997 Make this test case accepts basename() in D script returns "" or "."
In Solaris, basename(1) and basename(3) both return "." while being given an
empty string (""), while in BSD (and Linux) basename(1) returns "" and
basename(3) returns "."

While here, also change #!/usr/bin/ksh to #!/usr/bin/env ksh to find ksh in
$PATH

Reviewed by:	gnn, markj (earlier version), ngie (earlier version)
Differential Revision:	https://reviews.freebsd.org/D11707
2017-07-25 13:11:20 +00:00
Li-Wen Hsu
4ca0dfa6b0 Explicitly set dynamic variable buffer size.
We added too many variable assignments in BEGIN block, which will run out of
default auto-configured variable buffer space.  The test VM has 4G RAM which
should be enough for most cases so it's reasonable to increase limitation to
these case.

Reviewed by:	gnn
Differential Revision:	https://reviews.freebsd.org/D11676
2017-07-25 13:07:06 +00:00
Li-Wen Hsu
23833df483 Explicitly set dynamic variable buffer size.
We added too many variable assignments in BEGIN block, which will run out of
default auto-configured variable buffer space.  The test VM has 4G RAM which
should be enough for most cases so it's reasonable to increase limitation to
these case.

Reviewed by:	gnn, markj, ngie
Differential Revision:	https://reviews.freebsd.org/D11674
2017-07-25 13:04:24 +00:00
Li-Wen Hsu
070a148127 Add an auxiliary subroutine to generate read(2) event while testing.
Reviewed by:	gnn, ngie
Differential Revision:	https://reviews.freebsd.org/D11673
2017-07-25 13:01:10 +00:00
Li-Wen Hsu
d83c70758a Add a simple script which calls open(2) and others to generate events for
testing.

This test times-out on a quiet system because there is nobody triggers
syscall::open:entry or syscall::: probe while test execution.

Reviewed by:	gnn, markj (earlier version)
Differential Revision:	https://reviews.freebsd.org/D11671
2017-07-25 12:58:03 +00:00
Li-Wen Hsu
b9de3393dd Add a simple program which calls sigtimedwait(2) to generate events for testing
This test timeout on a quiet system because there is nobody triggers
'syscall::*wait*:entry' probe while test execution.

Reviewed by:	gnn, markj, ngie
Differential Revision:	https://reviews.freebsd.org/D11668
2017-07-25 12:52:32 +00:00
Enji Cooper
31ed01a2de Fix whitespace on a line in fix(..) accidentally missed in r321424
MFC after:	1 month
MFC with:	r321424
2017-07-24 17:29:56 +00:00
Enji Cooper
f3305cae02 Style cleanup: delete spurious trailing whitespace
MFC after:	1 month
2017-07-24 17:27:21 +00:00
Enji Cooper
aa52ad5489 Don't use incorrect hardcoded path to ksh -- use /usr/bin/env
to find ksh instead

MFC after:	1 month
2017-07-23 17:57:00 +00:00
Alan Somers
19d04786c7 zfsd(8): Remove pidfile on shutdown
MFC after:	3 weeks
Sponsored by:	Spectra Logic Corp
2017-06-20 19:45:02 +00:00
Andriy Gapon
f9cdbaba8d MFV r318946: 8021 ARC buf data scatter-ization
illumos/illumos-gate@770499e185
770499e185

https://www.illumos.org/issues/8021
  The ARC buf data project (known simply as "ABD" since its genesis in the ZoL
  community) changes the way the ARC allocates `b_pdata` memory from using linear
  `void *` buffers to using scatter/gather lists of fixed-size 1KB chunks. This
  improves ZFS's performance by helping to defragment the address space occupied
  by the ARC, in particular for cases where compressed ARC is enabled. It could
  also ease future work to allocate pages directly from `segkpm` for minimal-
  overhead memory allocations, bypassing the `kmem` subsystem.
  This is essentially the same change as the one which recently landed in ZFS on
  Linux, although they made some platform-specific changes while adapting this
  work to their codebase:
  1. Implemented the equivalent of the `segkpm` suggestion for future work
  mentioned above to bypass issues that they've had with the Linux kernel memory
  allocator.
  2. Changed the internal representation of the ABD's scatter/gather list so it
  could be used to pass I/O directly into Linux block device drivers. (This
  feature is not available in the illumos block device interface yet.)

FreeBSD notes:
- the actual (default) chunk size is 4KB (despite the text above saying 1KB)
- we can try to reimplement ABDs, so that they are not permanently
  mapped into the KVA unless explicitly requested, especially on
  platforms with scarce KVA
- we can try to use unmapped I/O and avoid intermediate allocation of a
  linear, virtual memory mapped buffer
- we can try to avoid extra data copying by referring to chunks / pages
  in the original ABD

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Chris Williamson <chris.williamson@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Dan Kimmel <dan.kimmel@delphix.com>

MFC after:	3 weeks
2017-06-20 17:39:24 +00:00
Bryan Drewery
c99b67a794 Utilize SYSROOT from r320119 in places where DESTDIR may be wanting WORLDTMP.
Since buildenv exports SYSROOT all of these uses will now look in
WORLDTMP by default.

sys/boot/efi/loader/Makefile
        A LIBSTAND hack is no longer required for buildenv.

MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
2017-06-19 20:47:24 +00:00
Andriy Gapon
b8d341fe26 MFV r319945,r319946: 8264 want support for promoting datasets in libzfs_core
illumos/illumos-gate@a4b8c9aa65
a4b8c9aa65

https://www.illumos.org/issues/8264
  Oddly there is a lzc_clone function, but no lzc_promote function.

Reviewed by: Andriy Gapon <avg@FreeBSD.org>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan McDonald <danmcd@kebe.com>
Approved by: Dan McDonald <danmcd@kebe.com>
Author: Andrew Stormont <astormont@racktopsystems.com>
MFC after:	1 week
2017-06-14 16:31:36 +00:00
Mark Johnston
c645060dbb Override the locale so that file lists get a consistent sort order.
Reported by:	avg
MFC after:	1 week
2017-06-10 14:47:01 +00:00
Andriy Gapon
6bd205a529 follow up to r319746: add the new test files to the make file
Reported by:	markj
MFC after:	2 days
X-MFC with:	r319746
2017-06-10 06:13:52 +00:00
Andriy Gapon
9260925dcd MFV r319740: 8168 NULL pointer dereference in zfs_create()
illumos/illumos-gate@690031d326
690031d326

https://www.illumos.org/issues/8168
  If we manage to export the pool on which we are creating a dataset (filesystem
  or zvol) between entering libzfs`zfs_create() and libzfs`zpool_open() call (for
  which we never check the return value) we end up dereferencing a NULL pointer
  in libzfs`zpool_close().
  This was discovered on ZFS on Linux. The same issue can be reproduced on
  Illumos running in parallel:
    while :; do zpool import -d /tmp testpool ; zpool export testpool ; done
    while :; do zfs create testpool/fs; zfs destroy testpool/fs ; done
  Eventually this will result in several core dumps like this one:
  [root@52-54-00-d3-7a-01 /cores]# mdb core.zfs.4244
  Loading modules: [ libumem.so.1 libc.so.1 libtopo.so.1 libavl.so.1
  libnvpair.so.1 ld.so.1 ]
  > ::stack
  libzfs.so.1`zpool_close+0x17(0, 0, 0, 8047450)
  libzfs.so.1`zfs_create+0x1bb(8090548, 8047e6f, 1, 808cba8)
  zfs_do_create+0x545(2, 8047d74, 80778a0, 801, 0, 3)
  main+0x22c(8047d2c, fef5c6e8, 8047d64, 8055a17, 3, 8047d70)
  _start+0x83(3, 8047e64, 8047e68, 8047e6f, 0, 8047e7b)
  >
  Fix and reproducer (systemtap): https://github.com/zfsonlinux/zfs/pull/6096

Reviewed by: Matt Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: loli10K <ezomori.nozomu@gmail.com>
MFC after:	2 weeks
2017-06-09 15:30:41 +00:00
Andriy Gapon
ad2b1a296f MFV r319744,r319745: 8269 dtrace stddev aggregation is normalized incorrectly
illumos/illumos-gate@79809f9cf4
79809f9cf4

https://www.illumos.org/issues/8269
  It seems that currently normalization of stddev aggregation is done
  incorrectly.
  We divide both the sum of values and the sum of their squares by the
  normalization factor. But we should divide the sum of squares by the
  normalization factor squared to scale the original values properly.

FreeBSD note: the actual change was committed in r316853, this commit
adds the test files and record merge information.

Reviewed by: Bryan Cantrill <bryan@joyent.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Andriy Gapon <avg@FreeBSD.org>
MFC after:	1 week
Sponsored by:	Panzura
2017-06-09 15:16:39 +00:00
Allan Jude
39b0b876dc New sentences start on new lines, fix two violations
Reviewed by:	bcr
Sponsored by:	BSDCan Dev Summit
2017-06-08 01:39:17 +00:00
Allan Jude
dc379eca14 SHA-512 and Skein have been supported by the boot loader for some time.
Submitted by:	lifanov
Reviewed by:	bcr
Sponsored by:	BSDCan Dev Summit
2017-06-08 01:29:24 +00:00
Andriy Gapon
457711b02b MFV r316922: 5380 receive of a send -p stream doesn't need to try renaming snapshots
illumos/illumos-gate@471a88e499
471a88e499

https://www.illumos.org/issues/5380
  A stream created with zfs send -p -I contains properties of all snapshots of a
  given dataset as opposed to only properties of snapshots in a given range.
  Not only this is suboptimal but the receive code also does not filter
  properties by the range. So, properties of earlier snapshots would be updated
  even though the snapshots themselves are not in the stream (just their
  properties).
  Given that modifying the snapshot properties requires a TXG sync and that the
  snapshots are updated one by one the described behavior may lead to a sever
  performance penalty.

Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Andriy Gapon <avg@FreeBSD.org>
MFC after:	3 weeks
2017-05-24 22:30:21 +00:00
Andriy Gapon
b6be31c7ca MFC r316908: 7541 zpool import/tryimport ioctl returns ENOMEM because provided buffer is too small for config
illumos/illumos-gate@8b65a70b76
8b65a70b76

https://www.illumos.org/issues/7541
  When calling zpool import, zpool does a few ioctls to ZFS.
  zpool allocates a buffer in userland and passes it to the kernel so that ZFS
  can copy info into it. ZFS will use it to put the nvlist that describes the
  pool configuration.
  If the allocated buffer is too small, ZFS will return ENOMEM and the call will
  have to be redone. This wastes CPU time and slows down the import process. This
  happens very often for the ZFS_IOC_POOL_TRYIMPORT call.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Pavel Zakharov <pavel.zakharov@delphix.com>
MFC after:	2 weeks
2017-05-24 21:32:35 +00:00
Andriy Gapon
5644318970 MFC r316904: 7729 libzfs_core`lzc_rollback() leaks result nvl
illumos/illumos-gate@ac428481f9
ac428481f9

https://www.illumos.org/issues/7729
  libzfs_core`lzc_rollback() doesn't free the result nvl after lzc_ioctl() call.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Yuri Pankov <yuri.pankov@nexenta.com>

MFC after:	2 weeks
2017-05-24 20:53:01 +00:00
Andriy Gapon
c65389d367 MFV r316860: 7545 zdb should disable reference tracking
illumos/illumos-gate@4dd77f9e38
4dd77f9e38

https://www.illumos.org/issues/7545
  When evicting from the ARC, we manipulate some refcount_t's, e.g. arcs_size.
  When using zdb to examine a large amount of data (e.g. zdb -bb on a large pool
  with small blocks), the ARC may have a large number of entries. If reference
  tracking is enabled, there will be ~1 reference for each block in the ARC. When
  evicting, we decrement the refcount and have to search all the references to
  find the one that we are removing, which is very slow.
  Since zdb is typically used to find problems with the on-disk format, and not
  with the code it is running, we should disable reference tracking in zdb.

Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Steve Gonczi <steve.gonczi@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>

MFC after:	2 weeks
2017-05-24 20:41:26 +00:00
Konstantin Belousov
6992112349 Commit the 64-bit inode project.
Extend the ino_t, dev_t, nlink_t types to 64-bit ints.  Modify
struct dirent layout to add d_off, increase the size of d_fileno
to 64-bits, increase the size of d_namlen to 16-bits, and change
the required alignment.  Increase struct statfs f_mntfromname[] and
f_mntonname[] array length MNAMELEN to 1024.

ABI breakage is mitigated by providing compatibility using versioned
symbols, ingenious use of the existing padding in structures, and
by employing other tricks.  Unfortunately, not everything can be
fixed, especially outside the base system.  For instance, third-party
APIs which pass struct stat around are broken in backward and
forward incompatible ways.

Kinfo sysctl MIBs ABI is changed in backward-compatible way, but
there is no general mechanism to handle other sysctl MIBS which
return structures where the layout has changed. It was considered
that the breakage is either in the management interfaces, where we
usually allow ABI slip, or is not important.

Struct xvnode changed layout, no compat shims are provided.

For struct xtty, dev_t tty device member was reduced to uint32_t.
It was decided that keeping ABI compat in this case is more useful
than reporting 64-bit dev_t, for the sake of pstat.

Update note: strictly follow the instructions in UPDATING.  Build
and install the new kernel with COMPAT_FREEBSD11 option enabled,
then reboot, and only then install new world.

Credits: The 64-bit inode project, also known as ino64, started life
many years ago as a project by Gleb Kurtsou (gleb).  Kirk McKusick
(mckusick) then picked up and updated the patch, and acted as a
flag-waver.  Feedback, suggestions, and discussions were carried
by Ed Maste (emaste), John Baldwin (jhb), Jilles Tjoelker (jilles),
and Rick Macklem (rmacklem).  Kris Moore (kris) performed an initial
ports investigation followed by an exp-run by Antoine Brodin (antoine).
Essential and all-embracing testing was done by Peter Holm (pho).
The heavy lifting of coordinating all these efforts and bringing the
project to completion were done by Konstantin Belousov (kib).

Sponsored by:	The FreeBSD Foundation (emaste, kib)
Differential revision:	https://reviews.freebsd.org/D10439
2017-05-23 09:29:05 +00:00
Mark Johnston
b4a3f67bd6 Add a little helper program for tst.exitcore.ksh.
sleep(1) is capsicumized, which means that we cannot rely on it to dump
core as required by the test.

MFC after:	1 week
Sponsored by:	Dell EMC Isilon
2017-05-22 20:34:51 +00:00
Alexander Motin
c9bcbf3800 Fix time handling in cv_timedwait_hires().
pthread_cond_timedwait() receives absolute time, not relative.  Passing
wrong time there caused two threads of zdb to spin in a tight loop.

MFC after:	1 week
2017-05-19 05:12:58 +00:00
Mark Johnston
9ebaf5f802 Remove the EXFAIL annotation for tests which pass as of r309596.
Reported by:	bdrewery
Sponsored by:	Dell EMC Isilon
2017-05-19 00:25:09 +00:00
Josh Paetzel
c78abb8b50 MFV 316894
7252 7628 compressed zfs send / receive

illumos/illumos-gate@5602294fda
5602294fda

https://www.illumos.org/issues/7252
  This feature includes code to allow a system with compressed ARC enabled to
  send data in its compressed form straight out of the ARC, and receive data in
  its compressed form directly into the ARC.

https://www.illumos.org/issues/7628
  We should have longer, more readable versions of the ZFS send / recv options.

7628 create long versions of ZFS send / receive options

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed by: David Quigley <dpquigl@davequigley.com>
Reviewed by: Thomas Caputi <tcaputi@datto.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Dan Kimmel <dan.kimmel@delphix.com>
2017-04-25 17:57:43 +00:00
Josh Paetzel
ef18459108 MFV 316891
7386 zfs get does not work properly with bookmarks

illumos/illumos-gate@edb901aab9
edb901aab9

https://www.illumos.org/issues/7386
  The zfs get command does not work with the bookmark parameter while it works
  properly with both filesystem and snapshot:
  # zfs get -t all -r creation rpool/test
  NAME               PROPERTY  VALUE                  SOURCE
  rpool/test         creation  Fri Sep 16 15:00 2016  -
  rpool/test@snap    creation  Fri Sep 16 15:00 2016  -
  rpool/test#bkmark  creation  Fri Sep 16 15:00 2016  -
  # zfs get -t all -r creation rpool/test@snap
  NAME             PROPERTY  VALUE                  SOURCE
  rpool/test@snap  creation  Fri Sep 16 15:00 2016  -
  # zfs get -t all -r creation rpool/test#bkmark
  cannot open 'rpool/test#bkmark': invalid dataset name
  #
  The zfs get command should be modified to work properly with bookmarks too.

Reviewed by: Simon Klinkert <simon.klinkert@gmail.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Approved by: Matthew Ahrens <mahrens@delphix.com>
Author: Marcel Telka <marcel@telka.sk>
2017-04-21 19:53:52 +00:00
Alan Somers
07bb15b440 MFV 316855
7900 zdb shouldn't print the path of a znode at verbosity < 5

Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Alan Somers <asomers@freebsd.org>

illumos/illumos-gate@e548d2fa41
https://www.illumos.org/issues/7900

MFC after:	3 weeks
Sponsored by:	Spectra Logic Corp
2017-04-14 16:30:37 +00:00
Andriy Gapon
5ad79d9b20 dtrace: fix normalization of stddev aggregation
To be upstreamed.

Discussed with:	Bryan Cantrill <bryancantrill@gmail.com>
MFC after:	2 weeks
Sponsored by:	Panzura
2017-04-14 15:31:04 +00:00
Pedro F. Giffuni
9e98194646 MFV r316693:
8046 Let calloc() do the multiplication in libzfs_fru_refresh

5697e03e6e

https://www.illumos.org/issues/8046

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author:	Pedro Giffuni <pfg@freebsd.org>

MFC after:	3 days
2017-04-10 22:56:38 +00:00
Alexander Motin
3aef5b286a MFV r315290, r315291: 7303 dynamic metaslab selection
illumos/illumos-gate@8363e80ae7
https://github.com/illumos/illumos-gate/commit/8363e80ae72609660f6090766ca8c2c18

https://www.illumos.org/issues/7303

  This change introduces a new weighting algorithm to improve metaslab selection.
  The new weighting algorithm relies on the SPACEMAP_HISTOGRAM feature. As a result,
  the metaslab weight now encodes the type of weighting algorithm used
  (size-based vs segment-based).

  This also introduce a new allocation tracing facility and two new dcmds to help
  debug allocation problems. Each zio now contains a zio_alloc_list_t structure
  that is populated as the zio goes through the allocations stage. Here's an
  example of how to use the tracing facility:

> c5ec000::print zio_t io_alloc_list | ::walk list | ::metaslab_trace
  MSID    DVA    ASIZE      WEIGHT             RESULT               VDEV
     -      0      400           0    NOT_ALLOCATABLE           ztest.0a
     -      0      400           0    NOT_ALLOCATABLE           ztest.0a
     -      0      400           0             ENOSPC           ztest.0a
     -      0      200           0    NOT_ALLOCATABLE           ztest.0a
     -      0      200           0    NOT_ALLOCATABLE           ztest.0a
     -      0      200           0             ENOSPC           ztest.0a
     1      0      400      1 x 8M            17b1a00           ztest.0a

> 1ff2400::print zio_t io_alloc_list | ::walk list | ::metaslab_trace
  MSID    DVA    ASIZE      WEIGHT             RESULT               VDEV
     -      0      200           0    NOT_ALLOCATABLE           mirror-2
     -      0      200           0    NOT_ALLOCATABLE           mirror-0
     1      0      200      1 x 4M            112ae00           mirror-1
     -      1      200           0    NOT_ALLOCATABLE           mirror-2
     -      1      200           0    NOT_ALLOCATABLE           mirror-0
     1      1      200      1 x 4M            112b000           mirror-1
     -      2      200           0    NOT_ALLOCATABLE           mirror-2

  If the metaslab is using segment-based weighting then the WEIGHT column will
  display the number of segments available in the bucket where the allocation
  attempt was made.

Author: George Wilson <george.wilson@delphix.com>
Reviewed by: Alex Reece <alex@delphix.com>
Reviewed by: Chris Siden <christopher.siden@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <paul.dagnelie@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Don Brady <don.brady@intel.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
2017-03-24 09:37:00 +00:00
Enji Cooper
acc37ca1c1 cddl: normalize paths using SRCTOP-relative paths or :H when possible
This simplifies make logic/output

While here, remove bogus CFLAGS which look for headers in cddl/lib/libumem.
There aren't any source files there (just Makefiles)

MFC after:	1 month
Sponsored by:	Dell EMC Isilon
2017-03-04 11:30:04 +00:00
Mark Johnston
d935f34b8f Fix memory leaks in error cases in libdtrace.
Submitted by:	Tom Rix <trix@juniper.net>
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D9705
2017-02-23 17:56:24 +00:00