172 Commits

Author SHA1 Message Date
Alexander Motin
903252254b 9523 Large alloc in zdb can cause trouble
16MB alloc in zdb_embedded_block() can cause cores in certain situations
(clang, gcc55).

OsX commit: ced236a5da
FreeBSD commit: https://svnweb.freebsd.org/base?view=revision&revision=326150
illumos/illumos-gate@03a4c2f4bf

Reviewed by: Igor Kozhukhov <igor@dilos.org>
Reviewed by: Andriy Gapon <avg@FreeBSD.org>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author:     Jorgen Lundman <lundman@lundman.net>
2018-08-02 20:37:53 +00:00
Alexander Motin
cf8fc527c9 9512 zfs remap poolname@snapname coredumps
Only filesystems and volumes are valid "zfs remap" parameters: when passed
a snapshot name zfs_remap_indirects() does not handle the EINVAL returned
from libzfs_core, which results in failing an assertion and consequently
crashing.

illumos/illumos-gate@0b2e825398

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: John Wren Kennedy <john.kennedy@delphix.com>
Reviewed by: Sara Hartse <sara.hartse@delphix.com>
Approved by: Matt Ahrens <mahrens@delphix.com>
Author:     loli10K <ezomori.nozomu@gmail.com>
2018-08-02 19:09:13 +00:00
Alexander Motin
2395a7f52d 8115 parallel zfs mount
Mounting of filesystems in "filesystem/local" is done using `zfs mount -a`,
which mounts each filesystems serially. The bottleneck for each mount is
the I/O done to load metadata for each filesystem. As such, mounting
filesystems using a parallel algorithm should be a big win, and bring down
the runtime of "filesystem/local"'s start method.

illumos/illumos-gate@591e0e133f

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Prashanth Sreenivasa <pks@delphix.com>
Approved by: Matt Ahrens <mahrens@delphix.com>
Author:     Sebastien Roy <seb@delphix.com>
2018-08-01 18:28:17 +00:00
Alexander Motin
b3e8109985 9421 zdb should detect and print out the number of "leaked" objects
9422 zfs diff and zdb should explicitly mark objects that are on the deleted queue

illumos/illumos-gate@20b5dafb42

Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Approved by: Matt Ahrens <mahrens@delphix.com>
Author:     Paul Dagnelie <pcd@delphix.com>
2018-07-31 21:31:24 +00:00
Alexander Motin
59ef839e67 9102 zfs should be able to initialize storage devices
The first access to a disk block can incur a performance penalty on some
platforms (e.g. AWS's EBS, VMware VMDKs). Therefore it is recommended that
volumes be "thick provisioned", where supported by the platform (VMware).
Thick provisioning is time consuming and often is ignored. If the thick
provision step is omitted, customers will see suboptimal performance until
we have written to all parts of the LUN. ZFS should be able to initialize
any unused storage to remove any first-write penalty that exists.

illumos/illumos-gate@094e47e980

Reviewed by: John Wren Kennedy <john.kennedy@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
2018-07-31 18:49:07 +00:00
Alexander Motin
4e77060898 9236 nuke spa_dbgmsg
We should use zfs_dbgmsg instead of spa_dbgmsg.  Or at least,
metaslab_condense() should call zfs_dbgmsg because it's important and rare
enough to always log. It's possible that the message in zio_dva_allocate()
would be too high-frequency for zfs_dbgmsg.

illumos/illumos-gate@21f7c81cc1

Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Matthew Ahrens <mahrens@delphix.com>
2018-07-31 00:42:31 +00:00
Alexander Motin
d7479e295b 9290 device removal reduces redundancy of mirrors
Mirrors are supposed to provide redundancy in the face of whole-disk failure
and silent damage (e.g. some data on disk is not right, but ZFS hasn't
detected the whole device as being broken). However, the current device
removal implementation bypasses some of the mirror's redundancy.

illumos/illumos-gate@3a4b1be953

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed by: Sara Hartse <sara.hartse@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Tim Chase <tim@chase2k.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Matthew Ahrens <mahrens@delphix.com>
2018-07-31 00:13:04 +00:00
Alexander Motin
af8040b47a 9238 ZFS Spacemap Encoding V2
The current space map encoding has the following disadvantages:
[1] Assuming 512 sector size each entry can represent at most 16MB for a segment.
This makes the encoding very inefficient for large regions of space.
[2] As vdev-wide space maps have started to be used by new features (i.e.
device removal, zpool checkpoint) we've started imposing limits in the
vdevs that can be used with them based on the maximum addressable offset
(currently 64PB for a top-level vdev).

The new remains backwards compatible with the old one. The introduced
two-word entry format, besides extending the limits imposed by the single-entry
layout, also includes a vdev field and some extra padding after its prefix.

The extra padding after the prefix should is reserved for future usage (e.g.
new prefixes for future encodings or new fields for flags). The new vdev field
not only makes the space maps more self-descriptive, but also opens the doors
for pool-wide space maps.

One final important note is that the number of bits used for vdevs is reduced
to 24 bits for blkptrs. That was decided as we don't know of any setups that
use more than 16M vdevs for the time being and
we wanted to fit the vdev field in the space map. In addition that gives us
some extra bits in dva_t.

illumos/illumos-gate@17f11284b4

Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <gwilson@zfsmail.com>
Approved by: Gordon Ross <gwr@nexenta.com>
Author: Serapheim Dimitropoulos <serapheim@delphix.com>
2018-07-30 22:56:24 +00:00
Alexander Motin
a622bac50f 9280 Assertion failure while running removal_with_ganging test with 4K devices
illumos/illumos-gate@243952c7ee

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Approved by: Garrett D'Amore <garrett@damore.org>
Author: Matt Ahrens <Matt.Ahrens@delphix.com>
2018-03-28 23:12:03 +00:00
Alexander Motin
d32e3273d2 9235 rename zpool_rewind_policy_t to zpool_load_policy_t
illumos/illumos-gate@5dafeea3eb

We want to be able to pass various settings during import/open of a pool,
which are not only related to rewind. Instead of adding a new policy and
duplicate a bunch of code, we should just rename rewind_policy to a more
generic term like load_policy.

For instance, we'd like to set spa->spa_import_flags from the nvlist,
rather from a flags parameter passed to spa_import as in some cases we want
those flags not only for the import case, but also for the open case. One
such flag could be ZFS_IMPORT_MISSING_LOG (as used in zdb) which would
allow zfs to open a pool when logs are missing.

Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Pavel Zakharov <pavel.zakharov@delphix.com>
2018-03-28 22:16:51 +00:00
Alexander Motin
b8a528e092 9166 zfs storage pool checkpoint
illumos/illumos-gate@8671400134

The idea of Storage Pool Checkpoint (aka zpool checkpoint) deals with
exactly that.  It can be thought of as a “pool-wide snapshot” (or a
variation of extreme rewind that doesn’t corrupt your data).  It remembers
the entire state of the pool at the point that it was taken and the user
can revert back to it later or discard it.  Its generic use case is an
administrator that is about to perform a set of destructive actions to ZFS
as part of a critical procedure.  She takes a checkpoint of the pool before
performing the actions, then rewinds back to it if one of them fails or puts
the pool into an unexpected state.  Otherwise, she discards it.  With the
assumption that no one else is making modifications to ZFS, she basically
wraps all these actions into a “high-level transaction”.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
2018-03-28 18:12:06 +00:00
Alexander Motin
6027cbd474 9213 zfs: sytem typo
illumos/illumos-gate@edc8ef7d92

Reviewed by: C Fraire <cfraire@me.com>
Reviewed by: Andy Fiddaman <omnios@citrus-it.co.uk>
Approved by: Joshua M. Clulow <josh@sysmgr.org>
Author: Toomas Soome <tsoome@me.com>
2018-03-23 02:28:37 +00:00
Alexander Motin
2e4bc6ee5c 9075 Improve ZFS pool import/load process and corrupted pool recovery
illumos/illumos-gate@6f7938128a

Some work has been done lately to improve the debugability of the ZFS pool
load (and import) process. This includes:

https://www.illumos.org/issues/7638: Refactor spa_load_impl into several functions
https://www.illumos.org/issues/8961: SPA load/import should tell us why it failed
https://www.illumos.org/issues/7277: zdb should be able to print zfs_dbgmsg's

To iterate on top of that, there's a few changes that were made to make the
import process more resilient and crash free. One of the first tasks during the
pool load process is to parse a config provided from userland that describes
what devices the pool is composed of. A vdev tree is generated from that config,
and then all the vdevs are opened.

The Meta Object Set (MOS) of the pool is accessed, and several metadata objects
that are necessary to load the pool are read. The exact configuration of the
pool is also stored inside the MOS. Since the configuration provided from
userland is external and might not accurately describe the vdev tree
of the pool at the txg that is being loaded, it cannot be relied upon to safely
operate the pool. For that reason, the configuration in the MOS is read early
on. In the past, the two configurations were compared together and if there was
a mismatch then the load process was aborted and an error was returned.

The latter was a good way to ensure a pool does not get corrupted, however it
made the pool load process needlessly fragile in cases where the vdev
configuration changed or the userland configuration was outdated. Since the MOS
is stored in 3 copies, the configuration provided by userland doesn't have to be
perfect in order to read its contents. Hence, a new approach has been adopted:
The pool is first opened with the untrusted userland configuration just so that
the real configuration can be read from the MOS. The trusted MOS configuration
is then used to generate a new vdev tree and the pool is re-opened.

When the pool is opened with an untrusted configuration, writes are disabled
to avoid accidentally damaging it. During reads, some sanity checks are
performed on block pointers to see if each DVA points to a known vdev;
when the configuration is untrusted, instead of panicking the system if those
checks fail we simply avoid issuing reads to the invalid DVAs.

This new two-step pool load process now allows rewinding pools accross
vdev tree changes such as device replacement, addition, etc. Loading a pool
from an external config file in a clustering environment also becomes much
safer now since the pool will import even if the config is outdated and didn't,
for instance, register a recent device addition.

With this code in place, it became relatively easy to implement a
long-sought-after feature: the ability to import a pool with missing top level
(i.e. non-redundant) devices. Note that since this almost guarantees some loss
Of data, this feature is for now restricted to a read-only import.

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
Approved by: Hans Rosenfeld <rosenfeld@grumpf.hope-2000.org>
Author: Pavel Zakharov <pavel.zakharov@delphix.com>
2018-02-22 02:21:03 +00:00
Alexander Motin
c59ff16a1e 8941 zpool add: assertion failed in get_replication() with nested interior VDEVs
illumos/illumos-gate@ac0215f4d6

When replacing a faulted device which was previously handled by a spare
multiple levels of nested interior VDEVs will be present in the pool
configuration: get_replication() needs to handle this situation gracefully
to let zpool add new devices to the pool

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: loli10K <ezomori.nozomu@gmail.com>
2018-02-22 01:17:32 +00:00
Alexander Motin
38fc65e027 8962 zdb should work on non-idle pools
illumos/illumos-gate@e144c4e6c9

Currently `zdb` consistently fails to examine non-idle pools as it fails
during the `spa_load()` process. The main problem seems to be that
`spa_load_verify()` fails as can be seen below:

$ sudo zdb -d -G dcenter
    zdb: can't open 'dcenter': I/O error

ZFS_DBGMSG(zdb):
    spa_open_common: opening dcenter
    spa_load(dcenter): LOADING
    disk vdev '/dev/dsk/c4t11d0s0': best uberblock found for spa dcenter. txg 40824950
    spa_load(dcenter): using uberblock with txg=40824950
    spa_load(dcenter): UNLOADING
    spa_load(dcenter): RELOADING
    spa_load(dcenter): LOADING
    disk vdev '/dev/dsk/c3t10d0s0': best uberblock found for spa dcenter. txg 40824952
    spa_load(dcenter): using uberblock with txg=40824952
    spa_load(dcenter): FAILED: spa_load_verify failed [error=5]
    spa_load(dcenter): UNLOADING

This change makes `spa_load_verify()` a dryrun when ran from `zdb`. This is
done by creating a global flag in zfs and then setting it in `zdb`.

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Andy Stormont <astormont@racktopsystems.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Pavel Zakharov <pavel.zakharov@delphix.com>
2018-02-22 00:09:15 +00:00
Alexander Motin
81ef5e369c 8809 libzpool should leverage work done in libfakekernel
illumos/illumos-gate@f06dce2c1f

Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Gordon Ross <gordon.w.ross@gmail.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Andrew Stormont <astormont@racktopsystems.com>
2018-02-21 21:04:46 +00:00
Alexander Motin
79a23a6944 7614 zfs device evacuation/removal
illumos/illumos-gate@5cabbc6b49

https://www.illumos.org/issues/7614:
This project allows top-level vdevs to be removed from the storage pool with
“zpool remove”, reducing the total amount of storage in the pool. This
operation copies all allocated regions of the device to be removed onto other
devices, recording the mapping from old to new location. After the removal is
complete, read and free operations to the removed (now “indirect”) vdev must
be remapped and performed at the new location on disk. The indirect mapping
table is kept in memory whenever the pool is loaded, so there is minimal
performance overhead when doing operations on the indirect vdev.

The size of the in-memory mapping table will be reduced when its entries
become “obsolete” because they are no longer used by any block pointers in
the pool. An entry becomes obsolete when all the blocks that use it are
freed. An entry can also become obsolete when all the snapshots that
reference it are deleted, and the block pointers that reference it have been
“remapped” in all filesystems/zvols (and clones). Whenever an indirect block
is written, all the block pointers in it will be “remapped” to their new
(concrete) locations if possible. This process can be accelerated by using
the “zfs remap” command to proactively rewrite all indirect blocks that
reference indirect (removed) vdevs.

Note that when a device is removed, we do not verify the checksum of the data
that is copied. This makes the process much faster, but if it were used on
redundant vdevs (i.e. mirror or raidz vdevs), it would be possible to copy
the wrong data, when we have the correct data on e.g. the other side of the
mirror. Therefore, mirror and raidz devices can not be removed.

Reviewed by: Alex Reece <alex@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Richard Laager <rlaager@wiktel.com>
Reviewed by: Tim Chase <tim@chase2k.com>
Approved by: Garrett D'Amore <garrett@damore.org>
Author: Prashanth Sreenivasa <pks@delphix.com>
2018-02-18 01:21:52 +00:00
Alexander Motin
d0fd40e7c9 8972 zfs holds: In scripted mode, do not pad columns with spaces
illumos/illumos-gate@e9b7d6e7f7

https://www.illumos.org/issues/8972:
'zfs holds -H' does not properly output content in scripted mode. It uses a
tab instead of two spaces, but it still pads column widths with spaces when
it should not.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Allan Jude <allanjude@freebsd.org>
2018-01-22 05:59:48 +00:00
Alexander Motin
1e2bad5ea0 8652 Tautological comparisons with ZPROP_INVAL
illumos/illumos-gate@4ae5f5f06c

https://www.illumos.org/issues/8652:
Clang and GCC prefer to use unsigned ints to store enums. With Clang, that
causes tautological comparison warnings when comparing a zfs_prop_t or
zpool_prop_t variable to the macro ZPROP_INVAL. It's likely that error
handling code is being silently removed as a result.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Igor Kozhukhov <igor@dilos.org>
Approved by: Gordon Ross <gwr@nexenta.com>
Author: Alan Somers <asomers@gmail.com>
2018-01-22 04:48:14 +00:00
Alexander Motin
619fd3c317 8677 Open-Context Channel Programs
illumos/illumos-gate@a3b2868063

https://www.illumos.org/issues/8677
  We want to be able to run channel programs outside of synching context.
  This would greatly improve performance of channel program that just gather
  information, as we won't have to wait for synching context anymore.

  This feature should introduce the following:
  - A new command line flag in "zfs program" to specify our intention to
  run in open context.
  - A new flag/option within the channel program ioctl which selects the
  context.
  - Appropriate error handling whenever we try a channel program in
  open-context that contains zfs.sync* expressions.
  - Documentation for the new feature in the manual pages.

Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Chris Williamson <chris.williamson@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author:	Serapheim Dimitropoulos <serapheim@delphix.com>
2018-01-21 19:26:38 +00:00
Andriy Gapon
fcf62a2809 640 number_to_scaled_string is duplicated in several commands
illumos/illumos-gate@0a0551200e
0a0551200e

https://www.illumos.org/issues/640
  du(1), df(1m), ls(1), and swap(1m) all include a copy (it appears literally
  copied) of the 'number_to_scaled_string' function in their source. This should
  be moved to a shared library and all 4 commands should use this instead.

Reviewed by: Sebastian Wiedenroth <wiedi@frubar.net>
Reviewed by: Robert Mustacchi <rm@joyent.com>
Reviewed by: Yuri Pankov <yuripv@gmx.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Jason King <jason.brian.king@gmail.com>
2017-10-26 16:20:47 +00:00
Andriy Gapon
e0cb71a82f 8081 Compiler warnings in zdb
illumos/illumos-gate@3f7978d02b
3f7978d02b

https://www.illumos.org/issues/8081
  zdb(8) is full of minor problems that generate compiler warnings. On FreeBSD,
  which uses -WError, the only way to build it is to disable all compiler
  warnings. This makes it much harder to detect newly introduced bugs. We should
  cleanup all the warnings.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Alan Somers <asomers@gmail.com>
2017-10-02 11:55:11 +00:00
Andriy Gapon
035e679e27 8567 Inconsistent return value in zpool_read_label
illumos/illumos-gate@c861bfbd77
c861bfbd77

https://www.illumos.org/issues/8567
  If fstat64 fails, pread64 fails, or the label is unintelligible,
  zpool_read_label will return 0. But if malloc fails, it will return -1. For
  consistency, it should always return -1 on failure or 0 on success.

Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Alan Somers <asomers@gmail.com>
2017-09-20 07:18:09 +00:00
Andriy Gapon
f3dbcb8c81 8585 improve batching done in zil_commit()
illumos/illumos-gate@1271e4b10d
1271e4b10d

https://www.illumos.org/issues/8585
  The current implementation of zil_commit() can introduce significant
  latency, beyond what is inherent due to the latency of the underlying
  storage. The additional latency comes from two main problems:
  1. When there's outstanding ZIL blocks being written (i.e. there's
      already a "writer thread" in progress), then any new calls to
      zil_commit() will block waiting for the currently oustanding ZIL
      blocks to complete. The blocks written for each "writer thread" is
      coined a "batch", and there can only ever be a single "batch" being
      written at a time. When a batch is being written, any new ZIL
      transactions will have to wait for the next batch to be written,
      which won't occur until the current batch finishes.
  As a result, the underlying storage may not be used as efficiently
      as possible. While "new" threads enter zil_commit() and are blocked
      waiting for the next batch, it's possible that the underlying
      storage isn't fully utilized by the current batch of ZIL blocks. In
      that case, it'd be better to allow these new threads to generate
      (and issue) a new ZIL block, such that it could be serviced by the
      underlying storage concurrently with the other ZIL blocks that are
      being serviced.
  2. Any call to zil_commit() must wait for all ZIL blocks in its "batch"
      to complete, prior to zil_commit() returning. The size of any given
      batch is proportional to the number of ZIL transaction in the queue
      at the time that the batch starts processing the queue; which
      doesn't occur until the previous batch completes. Thus, if there's a
      lot of transactions in the queue, the batch could be composed of
      many ZIL blocks, and each call to zil_commit() will have to wait for
      all of these writes to complete (even if the thread calling
      zil_commit() only cared about one of the transactions in the batch).

Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Prakash Surya <prakash.surya@delphix.com>
2017-09-13 10:59:36 +00:00
Andriy Gapon
bdc44f62ff 7431 ZFS Channel Programs
illumos/illumos-gate@dfc115332c
dfc115332c

https://www.illumos.org/issues/7431
  ZFS channel programs (ZCP) adds support for performing compound ZFS
  administrative actions via Lua scripts in a sandboxed environment (with time
  and memory limits).
  This initial commit includes both base support for running ZCP scripts, and a
  small initial library of API calls which support getting properties and
  listing, destroying, and promoting datasets.
  Testing: in addition to the included unit tests, channel programs have been in
  use at Delphix for several months for batch destroying filesystems. The
  dsl_destroy_snaps_nvl() call has also been replaced with

  For reference, the new zfs-program manpage is included below.
  ZFS-PROGRAM(1M)                       1M                       ZFS-PROGRAM(1M)

  NAME
       zfs program – executes ZFS channel programs

  SYNOPSIS
       zfs program [-t timeout] [-m memory-limit] pool script

  DESCRIPTION
       The ZFS channel program interface allows ZFS administrative operations to
       be run programmatically as a Lua script. The entire script is executed
       atomically, with no other administrative operations taking effect
       concurrently. A library of ZFS calls is made available to channel program
       scripts. Channel programs may only be run with root privileges.

       A modified version of the Lua 5.2 interpreter is used to run channel
       program scripts. The Lua 5.2 manual can be found at:

             http://www.lua.org/manual/5.2/
  ...

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Approved by: Garrett D'Amore <garrett@damore.org>
Author: Chris Williamson <chris.williamson@delphix.com>
2017-09-13 10:45:49 +00:00
Andriy Gapon
efd3d79ea3 8414 Implemented zpool scrub pause/resume
illumos/illumos-gate@1702cce751
1702cce751

https://www.illumos.org/issues/8414
  This issue tracks the port of scrub pause from ZoL: https://github.com/zfsonlinux/zfs/pull/6167
  Currently, there is no way to pause a scrub. Pausing may be useful when
 the pool is busy with other I/O to preserve bandwidth.

  Description

  This patch adds the ability to pause and resume scrubbing.  This is achieved
  by maintaining a persistent on-disk scrub state.  While the state is 'paused'
  we do not scrub any more blocks.  We do however perform regular scan
  housekeeping such as freeing async destroyed and deadlist blocks while paused.

  If you're testing this change, you probably want to include the patch from #6164

  Motivation and Context

  Scrub pausing can be an I/O intensive operation and people have been asking
  for the ability to pause a scrub for a while. This allows one to preserve scrub
  progress while freeing up bandwidth for other I/O.

  How Has This Been Tested?

  Unit testing and zfs-tests.  to the pool.  This patch will also include the
  patch from https://github.com/zfsonlinux/zfs/ pull/6164 In certain cases
  (dsl_scan_sync() is one), we may end up calling

Reviewed by: George Melikov <mail@gmelikov.ru>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Alek Pinchuk <apinchuk@datto.com>
2017-09-01 17:43:08 +00:00
Andriy Gapon
33f8e79f6a 8067 zdb should be able to dump literal embedded block pointer
illumos/illumos-gate@4923c69fdd
4923c69fdd

https://www.illumos.org/issues/8067
  Add an option to zdb to print a literal embedded block pointer supplied on the
  command line:
  zdb -E [-A] word0:word1:...:word15

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Alex Reece <alex@delphix.com>
Reviewed by: Yuri Pankov <yuri.pankov@gmail.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>
2017-08-08 11:10:37 +00:00
Andriy Gapon
28bfd86184 8378 crash due to bp in-memory modification of nopwrite block
illumos/illumos-gate@b7edcb9408
b7edcb9408

https://www.illumos.org/issues/8378
  The problem is that zfs_get_data() supplies a stale zgd_bp to dmu_sync(), which
  we then nopwrite against.
  zfs_get_data() doesn't hold any DMU-related locks, so after it copies db_blkptr
  to zgd_bp, dbuf_write_ready()
  could change db_blkptr, and dbuf_write_done() could remove the dirty record.
  dmu_sync() then sees the stale
  BP and that the dbuf it not dirty, so it is eligible for nop-writing.
  The fix is for dmu_sync() to copy db_blkptr to zgd_bp after acquiring the
  db_mtx. We could still see a stale
  db_blkptr, but if it is stale then the dirty record will still exist and thus
  we won't attempt to nopwrite.

Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>
2017-08-08 10:44:48 +00:00
Andriy Gapon
acb89578f1 fix up r319744, add new files
8269 dtrace stddev aggregation is normalized incorrectly

illumos/illumos-gate@79809f9cf4
79809f9cf4

https://www.illumos.org/issues/8269
  It seems that currently normalization of stddev aggregation is done
  incorrectly.
  We divide both the sum of values and the sum of their squares by the
  normalization factor. But we should divide the sum of squares by the
  normalization factor squared to scale the original values properly.

Reviewed by: Bryan Cantrill <bryan@joyent.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Andriy Gapon <avg@FreeBSD.org>
2017-06-09 15:06:50 +00:00
Andriy Gapon
27ba1b79ca 8108 zdb -l fails to read labels 2 and 3
illumos/illumos-gate@22c8b9583d
22c8b9583d

https://www.illumos.org/issues/8108

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Toomas Soome <tsoome@me.com>
Reviewed by: Igor Kozhukhov <igor@dilos.org>
Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Yuri Pankov <yuri.pankov@nexenta.com>
2017-06-09 15:03:07 +00:00
Andriy Gapon
3899d91a09 7578 Fix/improve some aspects of ZIL writing.
illumos/illumos-gate@c5ee46810f
c5ee46810f

https://www.illumos.org/issues/7578
  After some ZIL changes 6 years ago zil_slog_limit got partially broken
  due to zl_itx_list_sz not updated when async itx'es upgraded to sync.
  Actually because of other changes about that time zl_itx_list_sz is not
  really required to implement the functionality, so this patch removes
  some unneeded broken code and variables.
  Original idea of zil_slog_limit was to reduce chance of SLOG abuse by
  single heavy logger, that increased latency for other (more latency critical)
  loggers, by pushing heavy log out into the main pool instead of SLOG. Beside
  huge latency increase for heavy writers, this implementation caused double
  write of all data, since the log records were explicitly prepared for SLOG.
  Since we now have I/O scheduler, I've found it can be much more efficient
  to reduce priority of heavy logger SLOG writes from ZIO_PRIORITY_SYNC_WRITE
  to ZIO_PRIORITY_ASYNC_WRITE, while still leave them on SLOG.
  Existing ZIL implementation had problem with space efficiency when it
  has to write large chunks of data into log blocks of limited size. In some
  cases efficiency stopped to almost as low as 50%. In case of ZIL stored on
  spinning rust, that also reduced log write speed in half, since head had to
  uselessly fly over allocated but not written areas. This change improves
  the situation by offloading problematic operations from z*_log_write() to
  zil_lwb_commit(), which knows real situation of log blocks allocation and
  can split large requests into pieces much more efficiently. Also as side
  effect it removes one of two data copy operations done by ZIL code WR_COPIED
  case.
  While there, untangle and unify code of z*_log_write() functions.
  Also zfs_log_write() alike to zvol_log_write() can now handle writes crossing
  block boundary, that may also improve efficiency if ZPL is made to do that.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Andriy Gapon <avg@FreeBSD.org>
Reviewed by: Steven Hartland <steven.hartland@multiplay.co.uk>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Alexander Motin <mav@FreeBSD.org>
2017-05-26 12:13:58 +00:00
Andriy Gapon
d7f3871103 8021 ARC buf data scatter-ization
8100 8021 seems to cause random BAD TRAP: type=d (#gp General protection)

illumos/illumos-gate@770499e185
770499e185

https://www.illumos.org/issues/8021
  The ARC buf data project (known simply as "ABD" since its genesis in the ZoL
  community) changes the way the ARC allocates `b_pdata` memory from using linear
  `void *` buffers to using scatter/gather lists of fixed-size 1KB chunks. This
  improves ZFS's performance by helping to defragment the address space occupied
  by the ARC, in particular for cases where compressed ARC is enabled. It could
  also ease future work to allocate pages directly from `segkpm` for minimal-
  overhead memory allocations, bypassing the `kmem` subsystem.
  This is essentially the same change as the one which recently landed in ZFS on
  Linux, although they made some platform-specific changes while adapting this
  work to their codebase:
  1. Implemented the equivalent of the `segkpm` suggestion for future work
  mentioned above to bypass issues that they've had with the Linux kernel memory
  allocator.
  2. Changed the internal representation of the ABD's scatter/gather list so it
  could be used to pass I/O directly into Linux block device drivers. (This
  feature is not available in the illumos block device interface yet.)

https://www.illumos.org/issues/8100
  My supermicro system is getting random BAD TRAP: type=d (#gp General
  protection) at about the stage where ZFS filesystems are mounted - usually
  console login prompt is already present but the services are still starting.
  After backing out 8021, the boot is completed and no panics do occur.
  Machine does dump, however savecore fails:
  savecore: bad magic number baddcafe
  I can get more data out with boot -k, if needed.
  # psrinfo -vp
  The physical processor has 4 cores and 8 virtual processors (0-7)
    The core has 2 virtual processors (0 4)
    The core has 2 virtual processors (1 5)
    The core has 2 virtual processors (2 6)
    The core has 2 virtual processors (3 7)
      x86 (GenuineIntel 306C3 family 6 model 60 step 3 clock 3500 MHz)
        Intel(r) Xeon(r) CPU E3-1246 v3 @ 3.50GHz

  # prtconf -m
  32657

  $ zpool status
    pool: rpool
   state: ONLINE
    scan: none requested
  config:

          NAME        STATE     READ WRITE CKSUM
          rpool       ONLINE       0     0     0
            raidz1-0  ONLINE       0     0     0
              c3t0d0  ONLINE       0     0     0
              c3t1d0  ONLINE       0     0     0

Reviewed by: Matthew Ahrens mahrens@delphix.com
Reviewed by: George Wilson george.wilson@delphix.com
Reviewed by: Paul Dagnelie pcd@delphix.com
Reviewed by: John Kennedy john.kennedy@delphix.com
Reviewed by: Prakash Surya prakash.surya@delphix.com
Reviewed by: Prashanth Sreenivasa pks@delphix.com
Reviewed by: Pavel Zakharov pavel.zakharov@delphix.com
Reviewed by: Chris Williamson chris.williamson@delphix.com
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Dan Kimmel <dan.kimmel@delphix.com>
2017-05-26 12:13:27 +00:00
Andriy Gapon
9df2d6f729 7446 zpool create should support efi system partition
illumos/illumos-gate@7855d95b30
7855d95b30

https://www.illumos.org/issues/7446
  Since we support whole-disk configuration for boot pool, we also will need
  whole disk support with UEFI boot and for this, zpool create should create efi-
  system partition.
  I have borrowed the idea from oracle solaris, and introducing zpool create -
  B switch to provide an way to specify that boot partition should be created.
  However, there is still an question, how big should the system partition be.
  For time being, I have set default size 256MB (thats minimum size for FAT32
  with 4k blocks). To support custom size, the set on creation "bootsize"
  property is created and so the custom size can be set as: zpool create B -
  o bootsize=34MB rpool c0t0d0
  After pool is created, the "bootsize" property is read only. When -B switch is
  not used, the bootsize defaults to 0 and is shown in zpool get output with
  value ''. Older zfs/zpool implementations are ignoring this property.
  https://www.illumos.org/rb/r/219/

Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
Reviewed by: Yuri Pankov <yuri.pankov@gmail.com>
Approved by: Dan McDonald <danmcd@kebe.com>
Author: Toomas Soome <tsoome@me.com>
2017-05-26 12:02:14 +00:00
Andriy Gapon
847b286425 7955 libshare needs to initialize only those datasets being modified by the consumer
illumos/illumos-gate@8a981c3356
8a981c3356

https://www.illumos.org/issues/7955
  Libshare currently initializes all available filesystems when doing any
  libshare operation. This requires iterating through all the filesystem
  multiple times, which is a huge performance problem for sharing and
  unsharing operations.

Reviewed by: Steve Gonczi <steve.gonczi@delphix.com>
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Yuri Pankov <yuri.pankov@gmail.com>
Approved by: Gordon Ross <gordon.w.ross@gmail.com>
Author: Daniel Hoffman <dj.hoffman@delphix.com>
2017-04-14 18:34:03 +00:00
Andriy Gapon
b3264caf7b 7252 7628 compressed zfs send / receive
illumos/illumos-gate@5602294fda
5602294fda

https://www.illumos.org/issues/7252
  This feature includes code to allow a system with compressed ARC enabled to
  send data in its compressed form straight out of the ARC, and receive data in
  its compressed form directly into the ARC.

https://www.illumos.org/issues/7628
  We should have longer, more readable versions of the ZFS send / recv options.

7628 create long versions of ZFS send / receive options

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed by: David Quigley <dpquigl@davequigley.com>
Reviewed by: Thomas Caputi <tcaputi@datto.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Dan Kimmel <dan.kimmel@delphix.com>
2017-04-14 18:07:43 +00:00
Andriy Gapon
cfaa3c3d8f 7386 zfs get does not work properly with bookmarks
illumos/illumos-gate@edb901aab9
edb901aab9

https://www.illumos.org/issues/7386
  The zfs get command does not work with the bookmark parameter while it works
  properly with both filesystem and snapshot:
  # zfs get -t all -r creation rpool/test
  NAME               PROPERTY  VALUE                  SOURCE
  rpool/test         creation  Fri Sep 16 15:00 2016  -
  rpool/test@snap    creation  Fri Sep 16 15:00 2016  -
  rpool/test#bkmark  creation  Fri Sep 16 15:00 2016  -
  # zfs get -t all -r creation rpool/test@snap
  NAME             PROPERTY  VALUE                  SOURCE
  rpool/test@snap  creation  Fri Sep 16 15:00 2016  -
  # zfs get -t all -r creation rpool/test#bkmark
  cannot open 'rpool/test#bkmark': invalid dataset name
  #
  The zfs get command should be modified to work properly with bookmarks too.

Reviewed by: Simon Klinkert <simon.klinkert@gmail.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Approved by: Matthew Ahrens <mahrens@delphix.com>
Author: Marcel Telka <marcel@telka.sk>
2017-04-14 18:01:43 +00:00
Andriy Gapon
8ba67dd37f 7502 ztest should run zdb with -G (debug mode)
illumos/illumos-gate@c3c65d17f7
c3c65d17f7

https://www.illumos.org/issues/7502
  Right now ztest executes zdb without -G, so when it has errors, the messages
  are often not very helpful:
  Executing zdb -bccsv -d -U /rpool/tmp/zpool.cache ztest
  zdb: can't open 'ztest': Operation not supported
  ztest: '/usr/sbin/amd64/zdb -bccsv -d -U /rpool/tmp/zpool.cache ztest' exit
  code 1
  With -G, we'd have:
  /usr/sbin/amd64/zdb -bccsv -d -U /rpool/tmp/zpool.cache -G ztest
  zdb: can't open 'ztest': Operation not supported

  ZFS_DBGMSG(zdb):
  spa_open_common: opening ztest
  spa_load(ztest): LOADING
  spa_load(ztest): FAILED: unable to parse config [error=48]
  spa_load(ztest): UNLOADING
  Which indicates where the error came from

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Approved by: Gordon Ross <gordon.w.ross@gmail.com>
Author: Pavel Zakharov <pavel.zakharov@delphix.com>
2017-04-14 17:20:26 +00:00
Andriy Gapon
9bad09e9a8 6392 zdb: introduce -V for verbatim import
illumos/illumos-gate@dfd5965f7e
dfd5965f7e

https://www.illumos.org/issues/6392
  When given a pool name via -e, zdb would attempt an import. If it
  failed, then it would attempt a verbatim import. This behavior is
  not always desirable so a -V switch is added to zdb to control the
  behavior. When specified, a verbatim import is done. Otherwise,
  the behavior is as it was previously, except no verbatim import
  is done on failure.
  a5778ea242
  Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
  Reviewed by: Matthew Ahrens <mahrens@delphix.com>

Reviewed by: Yuri Pankov <yuri.pankov@gmail.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Richard Yao <ryao@gentoo.org>
2017-04-14 17:01:00 +00:00
Andriy Gapon
e8fe3151e2 3871 fix issues introduced by 3604
illumos/illumos-gate@de05b58863
de05b58863

https://www.illumos.org/issues/3871
  GCC 4.5.3 on Gentoo Linux did not like a few of the changes made in the issue
  3604 patch. It printed an error and a couple of warnings:
  ../../cmd/zdb/zdb.c: In function 'dump_bpobj':
  ../../cmd/zdb/zdb.c:1257:3: error: 'for' loop initial declarations are only
  allowed in C99 mode
  ../../cmd/zdb/zdb.c:1257:3: note: use option -std=c99 or -std=gnu99 to compile
  your code
  ../../cmd/zdb/zdb.c: In function 'dump_deadlist':
  ../../cmd/zdb/zdb.c:1323:8: warning: too many arguments for format
  ../../cmd/zdb/zdb.c:1323:8: warning: too many arguments for format

Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Richard Yao <ryao@gentoo.org>
2017-04-14 16:59:53 +00:00
Andriy Gapon
e5cf3d39f9 6410 teach zdb to perform object lookups by path
illumos/illumos-gate@ed61ec1da9
ed61ec1da9

https://www.illumos.org/issues/6410
  This is primarily intended to ease debugging & testing ZFS when one is only
  interested in things like the on-disk location of a specific object's blocks,
  but doesn't know their object id. This allows doing things like the following
  (FreeBSD-based example):
          # zpool create -f foo da0
          # dd if=/dev/zero of=/foo/1 bs=1M count=4 >/dev/null 2>&1
          # zpool export foo
          # zdb -vvvvv -o "ZFS plain file" foo /1
          Object  lvl   iblk   dblk  dsize  lsize   %full  type
               8    2    16K   128K  3.99M     4M  100.00  ZFS plain file (K=inherit) (Z=inherit)
                                              168   bonus  System attributes
          dnode flags: USED_BYTES USERUSED_ACCOUNTED
          dnode maxblkid: 31
          path    /1
          uid     0
          gid     0
          atime   Thu Apr 23 22:45:32 2015
          mtime   Thu Apr 23 22:45:32 2015
          ctime   Thu Apr 23 22:45:32 2015
          crtime  Thu Apr 23 22:45:32 2015
          gen     7
          mode    100644
          size    4194304
          parent  4
          links   1
          pflags  40800000004
          Indirect blocks:
                0 L1  DVA[0]=<0:c19200:600> DVA[1]=<0:10800019200:600> [L1 ZFS
  plain file] fletcher4 lz4 LE contiguous unique double size=4000L/200P birth=7L/

Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Will Andrews <will@freebsd.org>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Yuri Pankov <yuri.pankov@nexenta.com>
2017-04-14 16:58:41 +00:00
Andriy Gapon
b91038ef7a 6866 zdb -l should return non-zero if it fails to find any label
illumos/illumos-gate@64723e3611
64723e3611

https://www.illumos.org/issues/6866
  Need this for #6865.
  To be generally more scripting-friendly, overload this issue with adding '-q'
  option which should skip printing any label information.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Yuri Pankov <yuri.pankov@nexenta.com>
2017-04-14 16:57:06 +00:00
Andriy Gapon
60d1053c22 7545 zdb should disable reference tracking
illumos/illumos-gate@4dd77f9e38
4dd77f9e38

https://www.illumos.org/issues/7545
  When evicting from the ARC, we manipulate some refcount_t's, e.g. arcs_size.
  When using zdb to examine a large amount of data (e.g. zdb -bb on a large pool
  with small blocks), the ARC may have a large number of entries. If reference
  tracking is enabled, there will be ~1 reference for each block in the ARC. When
  evicting, we decrement the refcount and have to search all the references to
  find the one that we are removing, which is very slow.
  Since zdb is typically used to find problems with the on-disk format, and not
  with the code it is running, we should disable reference tracking in zdb.

Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Steve Gonczi <steve.gonczi@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>
2017-04-14 16:55:15 +00:00
Andriy Gapon
be0769e638 7280 Allow changing global libzpool variables in zdb and ztest through command line
illumos/illumos-gate@0e60744c98
0e60744c98

https://www.illumos.org/issues/7280
  zdb is very handy for diagnosing problems with a pool in a safe and
  quick way. When a pool is in a bad shape, we often want to disable some
  fail-safes, or adjust some tunables in order to open them. In the
  kernel, this is done by changing public variables in mdb. The goal of
  this feature is to add the same capability to zdb and ztest, so that
  they can change libzpool tuneables from the command line.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Pavel Zakharov <pavel.zakharov@delphix.com>
2017-04-14 16:53:34 +00:00
Alan Somers
1d381d5c2c 7900 zdb shouldn't print the path of a znode at verbosity < 5
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Alan Somers <asomers@freebsd.org>

illumos/illumos-gate@e548d2fa41
https://www.illumos.org/issues/7900
2017-04-14 16:18:53 +00:00
Josh Paetzel
137146f48c 7303 dynamic metaslab selection
illumos/illumos-gate@8363e80ae7
https://github.com/illumos/illumos-gate/commit/8363e80ae72609660f6090766ca8c2c18aa53f0

https://www.illumos.org/issues/7303

  This change introduces a new weighting algorithm to improve metaslab selection.
  The new weighting algorithm relies on the SPACEMAP_HISTOGRAM feature. As a result,
  the metaslab weight now encodes the type of weighting algorithm used
  (size-based vs segment-based).

  This also introduce a new allocation tracing facility and two new dcmds to help
  debug allocation problems. Each zio now contains a zio_alloc_list_t structure
  that is populated as the zio goes through the allocations stage. Here's an
  example of how to use the tracing facility:

> c5ec000::print zio_t io_alloc_list | ::walk list | ::metaslab_trace
  MSID    DVA    ASIZE      WEIGHT             RESULT               VDEV
     -      0      400           0    NOT_ALLOCATABLE           ztest.0a
     -      0      400           0    NOT_ALLOCATABLE           ztest.0a
     -      0      400           0             ENOSPC           ztest.0a
     -      0      200           0    NOT_ALLOCATABLE           ztest.0a
     -      0      200           0    NOT_ALLOCATABLE           ztest.0a
     -      0      200           0             ENOSPC           ztest.0a
     1      0      400      1 x 8M            17b1a00           ztest.0a

> 1ff2400::print zio_t io_alloc_list | ::walk list | ::metaslab_trace
  MSID    DVA    ASIZE      WEIGHT             RESULT               VDEV
     -      0      200           0    NOT_ALLOCATABLE           mirror-2
     -      0      200           0    NOT_ALLOCATABLE           mirror-0
     1      0      200      1 x 4M            112ae00           mirror-1
     -      1      200           0    NOT_ALLOCATABLE           mirror-2
     -      1      200           0    NOT_ALLOCATABLE           mirror-0
     1      1      200      1 x 4M            112b000           mirror-1
     -      2      200           0    NOT_ALLOCATABLE           mirror-2

  If the metaslab is using segment-based weighting then the WEIGHT column will
  display the number of segments available in the bucket where the allocation
  attempt was made.

Author: George Wilson <george.wilson@delphix.com>
Reviewed by: Alex Reece <alex@delphix.com>
Reviewed by: Chris Siden <christopher.siden@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <paul.dagnelie@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Don Brady <don.brady@intel.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
2017-03-15 04:16:08 +00:00
Mark Johnston
2e27b48e02 7297 clear() on llquantize aggregation causes dtrace to exit
7298 printa() of multiple aggregations can fail for llquantize()

illumos/illumos-gate@0ddc0ebb74

Reviewed by: Patrick Mooney <patrick.mooney@joyent.com>
Reviewed by: Robert Mustacchi <rm@joyent.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Reviewed by: Adam Leventhal <adam.leventhal@gmail.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Bryan Cantrill <bryan@joyent.com>
2016-08-26 20:51:09 +00:00
Andriy Gapon
9d4cd9e2d9 7277 zdb should be able to print zfs_dbgmsg's
illumos/illumos-gate@29bdd2f916
29bdd2f916

https://www.illumos.org/issues/7277
  ztest always prints the debug messages (zfs_dbgmsg()) by calling
  zfs_dbgmsg_print(). We should add a flag to zdb to make it do this as well
  before exiting.

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Pavel Zakharov <pavel.zakharov@delphix.com>
2016-08-15 14:24:47 +00:00
Mark Johnston
8a411dff07 7085 add support for "if" and "else" statements in dtrace
illumos/illumos-gate@c3bd3abd88

Add syntactic sugar to dtrace: "if" and "else" statements. The sugar is
baked down to standard dtrace features by adding additional clauses with
the appropriate predicates.

Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Bryan Cantrill <bryan@joyent.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Matthew Ahrens <mahrens@delphix.com>
2016-08-13 19:57:36 +00:00
Andriy Gapon
053b1b4d59 7164 zdb should be able to open the root dataset
illumos/illumos-gate@b702644a6e
b702644a6e

https://www.illumos.org/issues/7164
  If the pool/dataset command-line argument is specified with a trailing
  slash, for example, "tank/", we should interpret it as the topmost
  dataset (rather than the whole pool)

Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Tim Chase <tim@chase2k.com>
2016-07-20 09:58:10 +00:00
Andriy Gapon
0442be9aa8 6391 Override default SPA config location via environment
illumos/illumos-gate@ae24175b2b
ae24175b2b

https://www.illumos.org/issues/6391
  When using zdb with non-default SPA config file it is not convenient
  to add -U <non-default-config-file-path> all the time. This commit
  introduces support for setting/overriding SPA config location via
  environment variable 'SPA_CONFIG_PATH'.
  If -U flag is specified in the command line it will override any other
  value as usual.

64d7b6cf75

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Richard Yao <ryao@gentoo.org>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Will Andrews <will@freebsd.org>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Cyril Plisko <cyril.plisko@mountall.com>
2016-07-20 09:57:16 +00:00