32 Commits

Author SHA1 Message Date
Alexander Motin
2e4bc6ee5c 9075 Improve ZFS pool import/load process and corrupted pool recovery
illumos/illumos-gate@6f7938128a

Some work has been done lately to improve the debugability of the ZFS pool
load (and import) process. This includes:

https://www.illumos.org/issues/7638: Refactor spa_load_impl into several functions
https://www.illumos.org/issues/8961: SPA load/import should tell us why it failed
https://www.illumos.org/issues/7277: zdb should be able to print zfs_dbgmsg's

To iterate on top of that, there's a few changes that were made to make the
import process more resilient and crash free. One of the first tasks during the
pool load process is to parse a config provided from userland that describes
what devices the pool is composed of. A vdev tree is generated from that config,
and then all the vdevs are opened.

The Meta Object Set (MOS) of the pool is accessed, and several metadata objects
that are necessary to load the pool are read. The exact configuration of the
pool is also stored inside the MOS. Since the configuration provided from
userland is external and might not accurately describe the vdev tree
of the pool at the txg that is being loaded, it cannot be relied upon to safely
operate the pool. For that reason, the configuration in the MOS is read early
on. In the past, the two configurations were compared together and if there was
a mismatch then the load process was aborted and an error was returned.

The latter was a good way to ensure a pool does not get corrupted, however it
made the pool load process needlessly fragile in cases where the vdev
configuration changed or the userland configuration was outdated. Since the MOS
is stored in 3 copies, the configuration provided by userland doesn't have to be
perfect in order to read its contents. Hence, a new approach has been adopted:
The pool is first opened with the untrusted userland configuration just so that
the real configuration can be read from the MOS. The trusted MOS configuration
is then used to generate a new vdev tree and the pool is re-opened.

When the pool is opened with an untrusted configuration, writes are disabled
to avoid accidentally damaging it. During reads, some sanity checks are
performed on block pointers to see if each DVA points to a known vdev;
when the configuration is untrusted, instead of panicking the system if those
checks fail we simply avoid issuing reads to the invalid DVAs.

This new two-step pool load process now allows rewinding pools accross
vdev tree changes such as device replacement, addition, etc. Loading a pool
from an external config file in a clustering environment also becomes much
safer now since the pool will import even if the config is outdated and didn't,
for instance, register a recent device addition.

With this code in place, it became relatively easy to implement a
long-sought-after feature: the ability to import a pool with missing top level
(i.e. non-redundant) devices. Note that since this almost guarantees some loss
Of data, this feature is for now restricted to a read-only import.

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
Approved by: Hans Rosenfeld <rosenfeld@grumpf.hope-2000.org>
Author: Pavel Zakharov <pavel.zakharov@delphix.com>
2018-02-22 02:21:03 +00:00
Alexander Motin
c59ff16a1e 8941 zpool add: assertion failed in get_replication() with nested interior VDEVs
illumos/illumos-gate@ac0215f4d6

When replacing a faulted device which was previously handled by a spare
multiple levels of nested interior VDEVs will be present in the pool
configuration: get_replication() needs to handle this situation gracefully
to let zpool add new devices to the pool

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: loli10K <ezomori.nozomu@gmail.com>
2018-02-22 01:17:32 +00:00
Alexander Motin
79a23a6944 7614 zfs device evacuation/removal
illumos/illumos-gate@5cabbc6b49

https://www.illumos.org/issues/7614:
This project allows top-level vdevs to be removed from the storage pool with
“zpool remove”, reducing the total amount of storage in the pool. This
operation copies all allocated regions of the device to be removed onto other
devices, recording the mapping from old to new location. After the removal is
complete, read and free operations to the removed (now “indirect”) vdev must
be remapped and performed at the new location on disk. The indirect mapping
table is kept in memory whenever the pool is loaded, so there is minimal
performance overhead when doing operations on the indirect vdev.

The size of the in-memory mapping table will be reduced when its entries
become “obsolete” because they are no longer used by any block pointers in
the pool. An entry becomes obsolete when all the blocks that use it are
freed. An entry can also become obsolete when all the snapshots that
reference it are deleted, and the block pointers that reference it have been
“remapped” in all filesystems/zvols (and clones). Whenever an indirect block
is written, all the block pointers in it will be “remapped” to their new
(concrete) locations if possible. This process can be accelerated by using
the “zfs remap” command to proactively rewrite all indirect blocks that
reference indirect (removed) vdevs.

Note that when a device is removed, we do not verify the checksum of the data
that is copied. This makes the process much faster, but if it were used on
redundant vdevs (i.e. mirror or raidz vdevs), it would be possible to copy
the wrong data, when we have the correct data on e.g. the other side of the
mirror. Therefore, mirror and raidz devices can not be removed.

Reviewed by: Alex Reece <alex@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Richard Laager <rlaager@wiktel.com>
Reviewed by: Tim Chase <tim@chase2k.com>
Approved by: Garrett D'Amore <garrett@damore.org>
Author: Prashanth Sreenivasa <pks@delphix.com>
2018-02-18 01:21:52 +00:00
Alexander Motin
1e2bad5ea0 8652 Tautological comparisons with ZPROP_INVAL
illumos/illumos-gate@4ae5f5f06c

https://www.illumos.org/issues/8652:
Clang and GCC prefer to use unsigned ints to store enums. With Clang, that
causes tautological comparison warnings when comparing a zfs_prop_t or
zpool_prop_t variable to the macro ZPROP_INVAL. It's likely that error
handling code is being silently removed as a result.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Igor Kozhukhov <igor@dilos.org>
Approved by: Gordon Ross <gwr@nexenta.com>
Author: Alan Somers <asomers@gmail.com>
2018-01-22 04:48:14 +00:00
Andriy Gapon
035e679e27 8567 Inconsistent return value in zpool_read_label
illumos/illumos-gate@c861bfbd77
c861bfbd77

https://www.illumos.org/issues/8567
  If fstat64 fails, pread64 fails, or the label is unintelligible,
  zpool_read_label will return 0. But if malloc fails, it will return -1. For
  consistency, it should always return -1 on failure or 0 on success.

Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Alan Somers <asomers@gmail.com>
2017-09-20 07:18:09 +00:00
Andriy Gapon
bdc44f62ff 7431 ZFS Channel Programs
illumos/illumos-gate@dfc115332c
dfc115332c

https://www.illumos.org/issues/7431
  ZFS channel programs (ZCP) adds support for performing compound ZFS
  administrative actions via Lua scripts in a sandboxed environment (with time
  and memory limits).
  This initial commit includes both base support for running ZCP scripts, and a
  small initial library of API calls which support getting properties and
  listing, destroying, and promoting datasets.
  Testing: in addition to the included unit tests, channel programs have been in
  use at Delphix for several months for batch destroying filesystems. The
  dsl_destroy_snaps_nvl() call has also been replaced with

  For reference, the new zfs-program manpage is included below.
  ZFS-PROGRAM(1M)                       1M                       ZFS-PROGRAM(1M)

  NAME
       zfs program – executes ZFS channel programs

  SYNOPSIS
       zfs program [-t timeout] [-m memory-limit] pool script

  DESCRIPTION
       The ZFS channel program interface allows ZFS administrative operations to
       be run programmatically as a Lua script. The entire script is executed
       atomically, with no other administrative operations taking effect
       concurrently. A library of ZFS calls is made available to channel program
       scripts. Channel programs may only be run with root privileges.

       A modified version of the Lua 5.2 interpreter is used to run channel
       program scripts. The Lua 5.2 manual can be found at:

             http://www.lua.org/manual/5.2/
  ...

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Approved by: Garrett D'Amore <garrett@damore.org>
Author: Chris Williamson <chris.williamson@delphix.com>
2017-09-13 10:45:49 +00:00
Andriy Gapon
efd3d79ea3 8414 Implemented zpool scrub pause/resume
illumos/illumos-gate@1702cce751
1702cce751

https://www.illumos.org/issues/8414
  This issue tracks the port of scrub pause from ZoL: https://github.com/zfsonlinux/zfs/pull/6167
  Currently, there is no way to pause a scrub. Pausing may be useful when
 the pool is busy with other I/O to preserve bandwidth.

  Description

  This patch adds the ability to pause and resume scrubbing.  This is achieved
  by maintaining a persistent on-disk scrub state.  While the state is 'paused'
  we do not scrub any more blocks.  We do however perform regular scan
  housekeeping such as freeing async destroyed and deadlist blocks while paused.

  If you're testing this change, you probably want to include the patch from #6164

  Motivation and Context

  Scrub pausing can be an I/O intensive operation and people have been asking
  for the ability to pause a scrub for a while. This allows one to preserve scrub
  progress while freeing up bandwidth for other I/O.

  How Has This Been Tested?

  Unit testing and zfs-tests.  to the pool.  This patch will also include the
  patch from https://github.com/zfsonlinux/zfs/ pull/6164 In certain cases
  (dsl_scan_sync() is one), we may end up calling

Reviewed by: George Melikov <mail@gmelikov.ru>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Alek Pinchuk <apinchuk@datto.com>
2017-09-01 17:43:08 +00:00
Andriy Gapon
9df2d6f729 7446 zpool create should support efi system partition
illumos/illumos-gate@7855d95b30
7855d95b30

https://www.illumos.org/issues/7446
  Since we support whole-disk configuration for boot pool, we also will need
  whole disk support with UEFI boot and for this, zpool create should create efi-
  system partition.
  I have borrowed the idea from oracle solaris, and introducing zpool create -
  B switch to provide an way to specify that boot partition should be created.
  However, there is still an question, how big should the system partition be.
  For time being, I have set default size 256MB (thats minimum size for FAT32
  with 4k blocks). To support custom size, the set on creation "bootsize"
  property is created and so the custom size can be set as: zpool create B -
  o bootsize=34MB rpool c0t0d0
  After pool is created, the "bootsize" property is read only. When -B switch is
  not used, the bootsize defaults to 0 and is shown in zpool get output with
  value ''. Older zfs/zpool implementations are ignoring this property.
  https://www.illumos.org/rb/r/219/

Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
Reviewed by: Yuri Pankov <yuri.pankov@gmail.com>
Approved by: Dan McDonald <danmcd@kebe.com>
Author: Toomas Soome <tsoome@me.com>
2017-05-26 12:02:14 +00:00
Andriy Gapon
febe58b078 6872 zfs libraries should not allow uninitialized variables
illumos/illumos-gate@f83b46baf9
f83b46baf9

https://www.illumos.org/issues/6872
  We compile the zfs libraries with -Wno-uninitialized. We should remove
  this. Change makefiles, fix new warnings, fix pbchk errors.

Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Paul Dagnelie <pcd@delphix.com>
2016-07-12 11:59:25 +00:00
Alexander Motin
41e0d6d109 6418 zpool should have a label clearing command
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Author: Will Andrews <will@firepipe.net>

Closes #83
Closes #32
2016-04-09 19:49:40 +00:00
Alexander Motin
cd18d83a2b 6551 cmd/zpool: cleanup gcc warnings
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Andy Stormont <astormont@racktopsystems.com>
Approved by: Robert Mustacchi <rm@joyent.com>

illumos/illumos-gate@b327cd3f3b
2016-03-08 18:37:34 +00:00
Alexander Motin
117b5ba78e 6659 nvlist_free(NULL) is a no-op
Reviewed by: Toomas Soome <tsoome@me.com>
Reviewed by: Marcel Telka <marcel@telka.sk>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Josef 'Jeff' Sipek <jeffpc@josefsipek.net>

illumos/illumos-gate@aab83bb83b
2016-03-08 18:08:33 +00:00
Alexander Motin
fabafcb14e 6328 Fix cstyle errors in zfs codebase
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Alex Reece <alex@delphix.com>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed by: Jorgen Lundman <lundman@lundman.net>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Paul Dagnelie <pcd@delphix.com>

illumos/illumos-gate@9a686fbc18
2015-10-19 08:16:46 +00:00
Alexander Motin
aff51553d2 5767 fix several problems with zfs test suite
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Approved by: Gordon Ross <gwr@nexenta.com>
Author: John Wren Kennedy <john.kennedy@delphix.com>

illumos/illumos-gate@52244c0958
2015-10-18 18:56:48 +00:00
Alexander Motin
ad688e7c0e 5959 clean up per-dataset feature count code
Reviewed by: Toomas Soome <tsoome@me.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Alex Reece <alex@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Matthew Ahrens <mahrens@delphix.com>

illumos/illumos-gate@ca0cc3918a

A ZFS feature flags (large blocks) tracks its refcounts as the number of
datasets that have ever used the feature. Several features of this type
are planned to be added (new checksum functions). This code should be made
common infrastructure rather than duplicating the code for each feature.
2015-08-12 23:38:58 +00:00
Andriy Gapon
a7459e5770 5669 altroot not set in zpool create when specified with -o
Author: Xin Li <delphij@freebsd.org>
Reviewed by: George Wilson <george@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>

illumos/illumos-gate@c423721f9b
2015-06-05 17:05:36 +00:00
Xin LI
8b8b61b2cc 5147 zpool list -v should show individual disk capacity
Reviewed by: Adam Leventhal <adam.leventhal@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Matthew Ahrens <matthew.ahrens@delphix.com>
Reviewed by: Richard Elling <richard.elling@gmail.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author:	George Wilson <george.wilson@delphix.com>

illumos/illumos-gate@7a09f97bc0
2014-10-04 07:24:35 +00:00
Xin LI
0a0bd9b9b8 5118 When verifying or creating a storage pool, error messages only show one device
Reviewed by: Adam Leventhal <adam.leventhal@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Boris Protopopov <boris.protopopov@me.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Basil Crow <basil.crow@delphix.com>

illumos/illumos-gate@75fbdf9b9f
2014-09-07 12:18:12 +00:00
Xin LI
d6fb141e08 4976 zfs should only avoid writing to a failing non-redundant top-level vdev
4977 mdb error in ::spa_space from space_cb() if a metaslab's ms_sm is NULL
4978 ztest fails in get_metaslab_refcount()
4979 extend free space histogram to device and pool
4980 metaslabs should have a fragmentation metric
4981 remove fragmented ops vector from block allocator
4982 space_map object should proactively upgrade when feature is enabled
4983 need to collect metaslab information via mdb
4984 device selection should use fragmentation metric
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Adam Leventhal <adam.leventhal@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Approved by: Garrett D'Amore <garrett@damore.org>

illumos/illumos-gate@2e4c998613
2014-07-23 08:00:34 +00:00
Xin LI
4f9c9765a0 4970 need controls on i/o issued by zpool import -XF
4971 zpool import -T should accept hex values
4972 zpool import -T implies extreme rewind, and thus a scrub
4973 spa_load_retry retries the same txg
4974 spa_load_verify() reads all data twice
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>

illumos/illumos-gate@e42d205944
2014-07-15 20:35:56 +00:00
Xin LI
cec4501421 4966 zpool list iterator does not update output
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Approved by: Garrett D'Amore <garrett@damore.org>

illumos/illumos-gate@cd67d23d32
2014-07-09 08:20:08 +00:00
Xin LI
31c94b8d0b 3993 zpool(1M) and zfs(1M) should support -p for "list" and "get"
4700 "zpool get" doesn't support -H or -o options

illumos/illumos-gate@c58b352673
2014-03-28 22:46:55 +00:00
Andriy Gapon
a1d7a45c43 4171 clean up spa_feature_*() interfaces
4172 implement extensible_dataset feature for use by other zpool
features

illumos/illumos-gate@2acef22db7
2013-11-20 10:54:06 +00:00
Xin LI
d264a033cd Update vendor/illumos/dist to 14160:734110b9882f:
Illumos ZFS issues:
  1765 assert triggered in libzfs_import.c trying to import pool
       name beginning with a number
2013-08-23 23:47:59 +00:00
Xin LI
df8f2946a7 Update vendor/illumos/dist to illumos-gate 14049:4a7f6353bcf0
Illumos ZFS issues:
  3745 zpool create should treat -O mountpoint and -m the same
2013-06-11 18:39:38 +00:00
Martin Matuska
5a69c66648 Update vendor/illumos/dist to illumos-gate 13983:fe80600e1f8e
Illumos ZFS issues:
  3604 zdb should print bpobjs more verbosely (fix zdb hang)
  3606 zpool status -x shouldn't warn about old on-disk format
2013-03-14 08:25:10 +00:00
Martin Matuska
fb6c5b06d9 Update vendor/illumos/dist and vendor/illumos-sys/dist
to illumos-gate 13871:a9c12c2c1647
(zfs changes, illumos issues #3306, #3321)
2012-11-08 01:38:30 +00:00
Martin Matuska
7e4318b7e2 Update vendor/illumos/dist to illumos-gate 13811:4dadf1a8e003
(zfs, illumos issue #3064)
2012-09-11 08:49:22 +00:00
Martin Matuska
7f7eef1f2e Update vendor/illumos to illumos-gate 13753:2aba784c276b
Obtained from:	ssh://anonhg@hg.illumos.org/illumos-gate
2012-07-23 20:20:00 +00:00
Martin Matuska
5b5bfb7422 Update vendor-sys/illumos/dist to illumos-gate 13752:9f5f6c52ba19
(zfs part)

Obtained from:	ssh://anonhg@hg.illumos.org/illumos-gate
2012-07-18 10:58:07 +00:00
Martin Matuska
51ab6c0afd Update vendor/illumos/dist to pre libzfs_core state (zfs part)
illumos-gate revision 13742:b6bbdd77139c

Obtained from:	ssh://anonhg@hg.illumos.org/illumos-gate
2012-07-18 10:19:51 +00:00
Martin Matuska
93a00b0821 Update vendor/opensolaris to last OpenSolaris state (13149:b23a4dab3d50)
Add ZFS bits to vendor/opensolaris

Obtained from:	https://hg.openindiana.org/upstream/oracle/onnv-gate
2012-07-18 07:48:04 +00:00