illumos/illumos-gate@9a551dd645https://www.illumos.org/issues/8897:
# zpool online -e test mirror-1
Assertion failed: nvlist_lookup_string(tgt, "path", &pathname) == 0, file ../common/libzfs_pool.c, line 2558, function zpool_vdev_online
Abort (core dumped)
Not a big deal per se, but should be handled gracefully, same way as 'offline' and 'online' without '-e'.
Also reported as: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221408
Reviewed by: Toomas Soome <tsoome@me.com>
Reviewed by: Igor Kozhukhov <igor@dilos.org>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Yuri Pankov <yuri.pankov@nexenta.com>
illumos/illumos-gate@a3b2868063https://www.illumos.org/issues/8677
We want to be able to run channel programs outside of synching context.
This would greatly improve performance of channel program that just gather
information, as we won't have to wait for synching context anymore.
This feature should introduce the following:
- A new command line flag in "zfs program" to specify our intention to
run in open context.
- A new flag/option within the channel program ioctl which selects the
context.
- Appropriate error handling whenever we try a channel program in
open-context that contains zfs.sync* expressions.
- Documentation for the new feature in the manual pages.
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Chris Williamson <chris.williamson@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Serapheim Dimitropoulos <serapheim@delphix.com>
illumos/illumos-gate@0a0551200e0a0551200ehttps://www.illumos.org/issues/640
du(1), df(1m), ls(1), and swap(1m) all include a copy (it appears literally
copied) of the 'number_to_scaled_string' function in their source. This should
be moved to a shared library and all 4 commands should use this instead.
Reviewed by: Sebastian Wiedenroth <wiedi@frubar.net>
Reviewed by: Robert Mustacchi <rm@joyent.com>
Reviewed by: Yuri Pankov <yuripv@gmx.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Jason King <jason.brian.king@gmail.com>
illumos/illumos-gate@3f7978d02b3f7978d02bhttps://www.illumos.org/issues/8081
zdb(8) is full of minor problems that generate compiler warnings. On FreeBSD,
which uses -WError, the only way to build it is to disable all compiler
warnings. This makes it much harder to detect newly introduced bugs. We should
cleanup all the warnings.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Alan Somers <asomers@gmail.com>
illumos/illumos-gate@2840dce1a02840dce1a0https://www.illumos.org/issues/8600
ZFS channel programs should be able to create snapshots.
In addition to the base snapshot functionality, this will likely entail adding
extra logic to handle edge cases which were formerly not possible, such as
creating then destroying a snapshot in the same transaction sync.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Chris Williamson <chris.williamson@delphix.com>
illumos/illumos-gate@1c18e8fbd81c18e8fbd8https://www.illumos.org/issues/8502
The code in lib/libzfs/common/libzfs_mount.c already basically handles
the case when libshare is not installed. We just need to not fail in
zfs_init_libshare_impl. I tested this in lx and things work as
expected. I also tested there trying to set sharenfs and sharesmb on
the delegated dataset. Neither is allowed from within a zone. The
spew of msgs from a native zone is not ZFS specific. I see the same
spew simply running the share command.
Reviewed by: Robert Mustacchi <rm@joyent.com>
Reviewed by: Yuri Pankov <yuripv@gmx.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Jerry Jelinek <jerry.jelinek@joyent.com>
illumos/illumos-gate@c861bfbd77c861bfbd77https://www.illumos.org/issues/8567
If fstat64 fails, pread64 fails, or the label is unintelligible,
zpool_read_label will return 0. But if malloc fails, it will return -1. For
consistency, it should always return -1 on failure or 0 on success.
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Alan Somers <asomers@gmail.com>
illumos/illumos-gate@dfc115332cdfc115332chttps://www.illumos.org/issues/7431
ZFS channel programs (ZCP) adds support for performing compound ZFS
administrative actions via Lua scripts in a sandboxed environment (with time
and memory limits).
This initial commit includes both base support for running ZCP scripts, and a
small initial library of API calls which support getting properties and
listing, destroying, and promoting datasets.
Testing: in addition to the included unit tests, channel programs have been in
use at Delphix for several months for batch destroying filesystems. The
dsl_destroy_snaps_nvl() call has also been replaced with
For reference, the new zfs-program manpage is included below.
ZFS-PROGRAM(1M) 1M ZFS-PROGRAM(1M)
NAME
zfs program – executes ZFS channel programs
SYNOPSIS
zfs program [-t timeout] [-m memory-limit] pool script
DESCRIPTION
The ZFS channel program interface allows ZFS administrative operations to
be run programmatically as a Lua script. The entire script is executed
atomically, with no other administrative operations taking effect
concurrently. A library of ZFS calls is made available to channel program
scripts. Channel programs may only be run with root privileges.
A modified version of the Lua 5.2 interpreter is used to run channel
program scripts. The Lua 5.2 manual can be found at:
http://www.lua.org/manual/5.2/
...
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Approved by: Garrett D'Amore <garrett@damore.org>
Author: Chris Williamson <chris.williamson@delphix.com>
illumos/illumos-gate@fae6347731fae6347731https://www.illumos.org/issues/5815
When panic() is called from within ztest, the mdb ::status command isn't as
useful as it could be since the global panicstr variable isn't updated. We
should modify the function to make sure panicstr is set, so ::status can
present the error message just like it does on a failed assertion.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed by: Gordon Ross <gordon.ross@nexenta.com>
Reviewed by: Rich Lowe <richlowe@richlowe.net>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Prakash Surya <prakash.surya@delphix.com>
illumos/illumos-gate@4f4378cc544f4378cc54https://www.illumos.org/issues/8331
zfs_unshare returns EZFS_UNSHARENFSFAILED on error for all share types.
Reviewed by: Marcel Telka <marcel@telka.sk>
Reviewed by: Toomas Soome <tsoome@me.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Andrew Stormont <astormont@racktopsystems.com>
illumos/illumos-gate@1702cce7511702cce751https://www.illumos.org/issues/8414
This issue tracks the port of scrub pause from ZoL: https://github.com/zfsonlinux/zfs/pull/6167
Currently, there is no way to pause a scrub. Pausing may be useful when
the pool is busy with other I/O to preserve bandwidth.
Description
This patch adds the ability to pause and resume scrubbing. This is achieved
by maintaining a persistent on-disk scrub state. While the state is 'paused'
we do not scrub any more blocks. We do however perform regular scan
housekeeping such as freeing async destroyed and deadlist blocks while paused.
If you're testing this change, you probably want to include the patch from #6164
Motivation and Context
Scrub pausing can be an I/O intensive operation and people have been asking
for the ability to pause a scrub for a while. This allows one to preserve scrub
progress while freeing up bandwidth for other I/O.
How Has This Been Tested?
Unit testing and zfs-tests. to the pool. This patch will also include the
patch from https://github.com/zfsonlinux/zfs/ pull/6164 In certain cases
(dsl_scan_sync() is one), we may end up calling
Reviewed by: George Melikov <mail@gmelikov.ru>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Alek Pinchuk <apinchuk@datto.com>
illumos/illumos-gate@77b171372e77b171372ehttps://www.illumos.org/issues/7600
At present, the kernel side code seems to blindly rollback to whatever happens
to be the latest snapshot at the time when the rollback task is processed.
The expected target's name should be passed to the kernel driver and the sync
task should validate that the target exists and that it is the latest snapshot
indeed.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Andriy Gapon <avg@FreeBSD.org>
illumos/illumos-gate@e09ba01dcde09ba01dcdhttps://www.illumos.org/issues/8418
The following line in zfs_validate_name() is just a no-op and it should be
removed:
108 (void) zfs_prop_get_table();
Reviewed by: Vitaliy Gusev <gusev.vitaliy@icloud.com>
Approved by: Matthew Ahrens <mahrens@delphix.com>
Author: Marcel Telka <marcel@telka.sk>
illumos/illumos-gate@a4b8c9aa65a4b8c9aa65https://www.illumos.org/issues/8264
Oddly there is a lzc_clone function, but no lzc_promote function.
Reviewed by: Andriy Gapon <avg@FreeBSD.org>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan McDonald <danmcd@kebe.com>
Approved by: Dan McDonald <danmcd@kebe.com>
Author: Andrew Stormont <astormont@racktopsystems.com>
illumos/illumos-gate@79809f9cf479809f9cf4https://www.illumos.org/issues/8269
It seems that currently normalization of stddev aggregation is done
incorrectly.
We divide both the sum of values and the sum of their squares by the
normalization factor. But we should divide the sum of squares by the
normalization factor squared to scale the original values properly.
Reviewed by: Bryan Cantrill <bryan@joyent.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Andriy Gapon <avg@FreeBSD.org>
illumos/illumos-gate@690031d326690031d326https://www.illumos.org/issues/8168
If we manage to export the pool on which we are creating a dataset (filesystem
or zvol) between entering libzfs`zfs_create() and libzfs`zpool_open() call (for
which we never check the return value) we end up dereferencing a NULL pointer
in libzfs`zpool_close().
This was discovered on ZFS on Linux. The same issue can be reproduced on
Illumos running in parallel:
while :; do zpool import -d /tmp testpool ; zpool export testpool ; done
while :; do zfs create testpool/fs; zfs destroy testpool/fs ; done
Eventually this will result in several core dumps like this one:
[root@52-54-00-d3-7a-01 /cores]# mdb core.zfs.4244
Loading modules: [ libumem.so.1 libc.so.1 libtopo.so.1 libavl.so.1
libnvpair.so.1 ld.so.1 ]
> ::stack
libzfs.so.1`zpool_close+0x17(0, 0, 0, 8047450)
libzfs.so.1`zfs_create+0x1bb(8090548, 8047e6f, 1, 808cba8)
zfs_do_create+0x545(2, 8047d74, 80778a0, 801, 0, 3)
main+0x22c(8047d2c, fef5c6e8, 8047d64, 8055a17, 3, 8047d70)
_start+0x83(3, 8047e64, 8047e68, 8047e6f, 0, 8047e7b)
>
Fix and reproducer (systemtap): https://github.com/zfsonlinux/zfs/pull/6096
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: loli10K <ezomori.nozomu@gmail.com>
8100 8021 seems to cause random BAD TRAP: type=d (#gp General protection)
illumos/illumos-gate@770499e185770499e185https://www.illumos.org/issues/8021
The ARC buf data project (known simply as "ABD" since its genesis in the ZoL
community) changes the way the ARC allocates `b_pdata` memory from using linear
`void *` buffers to using scatter/gather lists of fixed-size 1KB chunks. This
improves ZFS's performance by helping to defragment the address space occupied
by the ARC, in particular for cases where compressed ARC is enabled. It could
also ease future work to allocate pages directly from `segkpm` for minimal-
overhead memory allocations, bypassing the `kmem` subsystem.
This is essentially the same change as the one which recently landed in ZFS on
Linux, although they made some platform-specific changes while adapting this
work to their codebase:
1. Implemented the equivalent of the `segkpm` suggestion for future work
mentioned above to bypass issues that they've had with the Linux kernel memory
allocator.
2. Changed the internal representation of the ABD's scatter/gather list so it
could be used to pass I/O directly into Linux block device drivers. (This
feature is not available in the illumos block device interface yet.)
https://www.illumos.org/issues/8100
My supermicro system is getting random BAD TRAP: type=d (#gp General
protection) at about the stage where ZFS filesystems are mounted - usually
console login prompt is already present but the services are still starting.
After backing out 8021, the boot is completed and no panics do occur.
Machine does dump, however savecore fails:
savecore: bad magic number baddcafe
I can get more data out with boot -k, if needed.
# psrinfo -vp
The physical processor has 4 cores and 8 virtual processors (0-7)
The core has 2 virtual processors (0 4)
The core has 2 virtual processors (1 5)
The core has 2 virtual processors (2 6)
The core has 2 virtual processors (3 7)
x86 (GenuineIntel 306C3 family 6 model 60 step 3 clock 3500 MHz)
Intel(r) Xeon(r) CPU E3-1246 v3 @ 3.50GHz
# prtconf -m
32657
$ zpool status
pool: rpool
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
c3t0d0 ONLINE 0 0 0
c3t1d0 ONLINE 0 0 0
Reviewed by: Matthew Ahrens mahrens@delphix.com
Reviewed by: George Wilson george.wilson@delphix.com
Reviewed by: Paul Dagnelie pcd@delphix.com
Reviewed by: John Kennedy john.kennedy@delphix.com
Reviewed by: Prakash Surya prakash.surya@delphix.com
Reviewed by: Prashanth Sreenivasa pks@delphix.com
Reviewed by: Pavel Zakharov pavel.zakharov@delphix.com
Reviewed by: Chris Williamson chris.williamson@delphix.com
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Dan Kimmel <dan.kimmel@delphix.com>
illumos/illumos-gate@7855d95b307855d95b30https://www.illumos.org/issues/7446
Since we support whole-disk configuration for boot pool, we also will need
whole disk support with UEFI boot and for this, zpool create should create efi-
system partition.
I have borrowed the idea from oracle solaris, and introducing zpool create -
B switch to provide an way to specify that boot partition should be created.
However, there is still an question, how big should the system partition be.
For time being, I have set default size 256MB (thats minimum size for FAT32
with 4k blocks). To support custom size, the set on creation "bootsize"
property is created and so the custom size can be set as: zpool create B -
o bootsize=34MB rpool c0t0d0
After pool is created, the "bootsize" property is read only. When -B switch is
not used, the bootsize defaults to 0 and is shown in zpool get output with
value ''. Older zfs/zpool implementations are ignoring this property.
https://www.illumos.org/rb/r/219/
Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
Reviewed by: Yuri Pankov <yuri.pankov@gmail.com>
Approved by: Dan McDonald <danmcd@kebe.com>
Author: Toomas Soome <tsoome@me.com>
illumos/illumos-gate@bde3d612a7bde3d612a7https://www.illumos.org/issues/5704
libzfs uses fopen(), at least in libzfs_init(). If there are more than 255
filedescriptors open, fopen() will fail unless you give 'F' as the last mode
character. The fix would be to give 'rF' instead of 'r' as mode to fopen().
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Simon Klinkert <simon.klinkert@gmail.com>
illumos/illumos-gate@ed4e7a6a5ced4e7a6a5chttps://www.illumos.org/issues/7340
When -o origin=<snapshot> is specified as part of a ZFS receive, that origin
should override the automatic detection in libzfs.
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Paul Dagnelie <pcd@delphix.com>
illumos/illumos-gate@d5f26ad812d5f26ad812https://www.illumos.org/issues/5142
the current libzfs only allows simple disk and mirror setup for boot pool, as
loader does support booting from raidz, this feature will remove raidz
restriction from boot pool setup.
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
Reviewed by: Albert Lee <trisk@omniti.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Toomas Soome <tsoome@me.com>
illumos/illumos-gate@d1672efb6fd1672efb6fhttps://www.illumos.org/issues/6280
The unshare_one() in libzfs could fail with EZFS_SHARENFSFAILED at line 834
here:
831 /* make sure libshare initialized */
832 if ((err = zfs_init_libshare(hdl, SA_INIT_SHARE_API)) != SA_OK) {
833 free(mntpt); /* don't need the copy anymore */
834 return (zfs_error_fmt(hdl, EZFS_SHARENFSFAILED,
835 dgettext(TEXT_DOMAIN, "cannot unshare '%s': %s"),
836 name, _sa_errorstr(err)));
837 }
The correct error should be EZFS_UNSHARENFSFAILED instead.
Reviewed by: Toomas Soome <tsoome@me.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Gordon Ross <gwr@nexenta.com>
Author: Marcel Telka <marcel.telka@nexenta.com>
illumos/illumos-gate@aab04418a7aab04418a7https://www.illumos.org/issues/6268
The zfs diff command presents a description of the changes that have occurred
to files within a filesystem between two snapshots. If a file is renamed, the
tool is capable of reporting this, e.g.:
cd /some/zfs/dataset/subdir
mv file0 file1
Will result in a diff record like:
R /some/zfs/dataset/subdir/file0 -> /some/zfs/dataset/subdir/file1
Unfortunately, it seems that rename detection only uses the base filename to
determine if a file has been renamed or simply modified. This leads to
misreporting only the original filename, omitting the more relevant destination
filename entirely. For example:
cd /some/zfs/dataset/subdir
mv file0 ../otherdir/file0
Will result in a diff entry:
M /some/zfs/dataset/subdir/file0
But it should really emit:
R /some/zfs/dataset/subdir/file0 -> /some/zfs/dataset/otherdir/file0
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Justin Gibbs <gibbs@scsiguy.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Joshua M. Clulow <josh@sysmgr.org>
illumos/illumos-gate@8a981c33568a981c3356https://www.illumos.org/issues/7955
Libshare currently initializes all available filesystems when doing any
libshare operation. This requires iterating through all the filesystem
multiple times, which is a huge performance problem for sharing and
unsharing operations.
Reviewed by: Steve Gonczi <steve.gonczi@delphix.com>
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Yuri Pankov <yuri.pankov@gmail.com>
Approved by: Gordon Ross <gordon.w.ross@gmail.com>
Author: Daniel Hoffman <dj.hoffman@delphix.com>
illumos/illumos-gate@471a88e499471a88e499https://www.illumos.org/issues/5380
A stream created with zfs send -p -I contains properties of all snapshots of a
given dataset as opposed to only properties of snapshots in a given range.
Not only this is suboptimal but the receive code also does not filter
properties by the range. So, properties of earlier snapshots would be updated
even though the snapshots themselves are not in the stream (just their
properties).
Given that modifying the snapshot properties requires a TXG sync and that the
snapshots are updated one by one the described behavior may lead to a sever
performance penalty.
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Andriy Gapon <avg@FreeBSD.org>
illumos/illumos-gate@d8584ba6fbd8584ba6fbhttps://www.illumos.org/issues/7990
The snapspec_cb() callback function in libzfs does not need to call zfs_strdup().
Reviewed by: Yuri Pankov <yuri.pankov@gmail.com>
Reviewed by: Toomas Soome <tsoome@me.com>
Approved by: Matthew Ahrens <mahrens@delphix.com>
Author: Marcel Telka <marcel@telka.sk>
illumos/illumos-gate@46d46cd4fa46d46cd4fahttps://www.illumos.org/issues/7803
Make get_devid() from libzfs a public function in libdevid, as its pretty
usable in other places and duplicating all the logic required to get string
encoded devid from path seems counter-productive.
Reviewed by: Dan McDonald <danmcd@omniti.com>
Reviewed by: Jason King <jason.brian.king@gmail.com>
Reviewed by: Marcel Telka <marcel@telka.sk>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Yuri Pankov <yuri.pankov@nexenta.com>
illumos/illumos-gate@8b65a70b768b65a70b76https://www.illumos.org/issues/7541
When calling zpool import, zpool does a few ioctls to ZFS.
zpool allocates a buffer in userland and passes it to the kernel so that ZFS
can copy info into it. ZFS will use it to put the nvlist that describes the
pool configuration.
If the allocated buffer is too small, ZFS will return ENOMEM and the call will
have to be redone. This wastes CPU time and slows down the import process. This
happens very often for the ZFS_IOC_POOL_TRYIMPORT call.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Pavel Zakharov <pavel.zakharov@delphix.com>
illumos/illumos-gate@7c13517fff7c13517fffhttps://www.illumos.org/issues/7745
The problem is that consumers of `libZFS_Core` that forget to call
`libzfs_core_init()` before calling any other function of the library
are having a hard time realizing their mistake. The library's internal
file descriptor is declared as global static, which is ok, but it is not
initialized explicitly; therefore, it defaults to 0, which is a valid
file descriptor. If `libzfs_core_init()`, which explicitly initializes
the correct fd, is skipped, the ioctl functions return errors that do
not have anything to do with `libZFS_Core`, where the problem is
actually located.
Even though assertions for that existed within `libZFS_Core` for debug
builds, they were never enabled because the `-DDEBUG` flag was missing
from the compiler flags.
This patch applies the following changes:
1. It adds `-DDEBUG` for debug builds of `libZFS_Core` and `libzfs`,
to enable their assertions on debug builds.
2. It corrects an assertion within `libzfs`, where a function had
been spelled incorrectly (`zpool_prop_unsupported()`) and nobody
knew because the `-DDEBUG` flag was missing, and the preprocessor
was taking that part of the code away.
3. The library's internal fd is initialized to `-1` and `VERIFY`
assertions have been placed to check that the fd is not equal to
`-1` before issuing any ioctl. It is important here to note, that
the `VERIFY` assertions exist in both debug and non-debug builds.
4. In `libzfs_core_fini` we make sure to never increment the
refcount of our fd below 0, and also reset the fd to `-1` when no
one refers to it. The reason for this, is for the rare case that
the consumer closes all references but then calls one of the
library's functions without using `libzfs_core_init()` first, and
in the mean time, a previous call to `open()` decided to reuse
our previous fd. This scenario would have passed our assertion in
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Serapheim Dimitropoulos <serapheim@delphix.com>
illumos/illumos-gate@5602294fda5602294fdahttps://www.illumos.org/issues/7252
This feature includes code to allow a system with compressed ARC enabled to
send data in its compressed form straight out of the ARC, and receive data in
its compressed form directly into the ARC.
https://www.illumos.org/issues/7628
We should have longer, more readable versions of the ZFS send / recv options.
7628 create long versions of ZFS send / receive options
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed by: David Quigley <dpquigl@davequigley.com>
Reviewed by: Thomas Caputi <tcaputi@datto.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Dan Kimmel <dan.kimmel@delphix.com>
illumos/illumos-gate@4d86c0eab24d86c0eab2https://www.illumos.org/issues/7604
If a zvol has the default setting for the "volblocksize" property, it is
8KB. However, it is displayed as "-" (not present), rather than "8K".
The problem was introduced by:
commit 25228e830e86924a41243343b1de9daf2d7dd43a
Author: Matthew Ahrens <mahrens@delphix.com>
Date: Thu Nov 17 14:37:24 2016 -0800
7571 non-present readonly numeric ZFS props do not have default value
which changed changed get_numeric_property() to indicate that readonly
default properties are not present. However, zfs_prop_readonly() returns
TRUE for both readonly and set-once properties (e.g. volblocksize).
Amusingly, that commit essentially reverted:
6900484 default volblocksize is no longer being reported correctly
from November 2009. However, that change was not correct either; the
correct solution is to only do this check for "truly readonly" (i.e. not
setonce) properties.
$ zfs list -t volume -o name,volblocksize
NAME
VOLBLOCK
domain0/group-100/appdata_container-101/appdata_windows_timeflow-102/
archive -
domain0/group-100/appdata_container-101/appdata_windows_timeflow-102/
datafile -
domain0/group-100/appdata_container-101/appdata_windows_timeflow-102/
external -
rpool/dump
128K
rpool/swap
4K
rpool/swap1
===============================================================================
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>
illumos/illumos-gate@edb901aab9edb901aab9https://www.illumos.org/issues/7386
The zfs get command does not work with the bookmark parameter while it works
properly with both filesystem and snapshot:
# zfs get -t all -r creation rpool/test
NAME PROPERTY VALUE SOURCE
rpool/test creation Fri Sep 16 15:00 2016 -
rpool/test@snap creation Fri Sep 16 15:00 2016 -
rpool/test#bkmark creation Fri Sep 16 15:00 2016 -
# zfs get -t all -r creation rpool/test@snap
NAME PROPERTY VALUE SOURCE
rpool/test@snap creation Fri Sep 16 15:00 2016 -
# zfs get -t all -r creation rpool/test#bkmark
cannot open 'rpool/test#bkmark': invalid dataset name
#
The zfs get command should be modified to work properly with bookmarks too.
Reviewed by: Simon Klinkert <simon.klinkert@gmail.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Approved by: Matthew Ahrens <mahrens@delphix.com>
Author: Marcel Telka <marcel@telka.sk>
illumos/illumos-gate@ad2760acbdad2760acbdhttps://www.illumos.org/issues/7571
ZFS displays the default value for non-present readonly numeric (and index)
properties. However, these properties default values are not meaningful.
Instead, we should display a "-", indicating that they are not present. For
example, on a version-12 pool, the usedby* properties are not available, but
they show up as the incorrect value "0":
1. zfs get all test12
...
test12 usedbysnapshots 0 -
test12 usedbydataset 0 -
test12 usedbychildren 0 -
test12 usedbyrefreservation 0 -
We will be introducing more sometimes-present numeric readonly properties, so
it would be nice to fix this.
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>
illumos/illumos-gate@09c9e6dc9b09c9e6dc9bhttps://www.illumos.org/issues/7542
libshare keeps a cached copy of the sharetab listing in memory, which can
become out of date if shares are destroyed or created while leaving a libzfs
handle open. This results in a spurious unmounting failure when an NFS share
exists but isn't in the stale libshare cache.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Matt Amdur <matt.amdur@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Chris Williamson <chris.williamson@delphix.com>
illumos/illumos-gate@873c4903a5873c4903a5https://www.illumos.org/issues/7336
We can run into a problem where we call into zfs_mount, which in turn calls
is_dir_empty, which opens the directory to try and make sure it's empty. The
issue with the current approach is that it holds the directory open while it
traverses it with readdir, which, due to subtle interaction with the Java JVM,
vfork, and exec can cause a tricky race condition resulting in zfs_mount
failures.
The approach to resolving the issue in this patch is to drop the usage of
readdir altogether, and instead rely on the fact that ZFS stores the number of
entries contained in a directory using the st_size field of the stat structure.
Thus, if the directory in question is a ZFS directory, we can check to see if
it's empty by calling stat() and inspecting the st_size field of structure
returned.
===============================================================================
The root cause appears to be an interesting race between vfork, exec, and
zfs_mount's usage of O_CLOEXEC when calling openat. Here's what is going on:
1. We call zfs_mount, and this in turn calls openat to check if the directory
is empty, which results in opening the directory we're trying to mount onto,
and increment v_count.
2. As we're in the middle of reading the directory, vfork is called by the JVM
and proceeds to exec the jspawnhelper utility. As a result of the vfork, we
take an additional hold on the directory, which increments v_count a second
time. The semantics of vfork mean the parent process will wait for the child
process to exit or exec before the parent can continue; at this point the
parent is in the middle of zfs_mount, reading the directory to determine if
it's empty or not.
3. The child process exec-ing jspawnhelper gets to the relvm call within
exec_args (which is called by exec_common). relvm is the function that releases
the parent process, allowing the parent to proceed. The problem is, at this
point of calling relvm, the child hasn't yet called close_exec which is
responsible for closing the file descriptors inherited from the parent process
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Robert Mustacchi <rm@joyent.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Prakash Surya <prakash.surya@delphix.com>
illumos/illumos-gate@d420209d9cd420209d9chttps://www.illumos.org/issues/7233
This fixes a race where one thread is executing zfs_mount() while another
thread forks and execs. If the fork occurs while the directory is open, the
child process will inherit (but not necessarily close immediately) the open fd
for the directory, preventing the mount.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Alex Reece <alex@delphix.com>
illumos/illumos-gate@0e60744c980e60744c98https://www.illumos.org/issues/7280
zdb is very handy for diagnosing problems with a pool in a safe and
quick way. When a pool is in a bad shape, we often want to disable some
fail-safes, or adjust some tunables in order to open them. In the
kernel, this is done by changing public variables in mdb. The goal of
this feature is to add the same capability to zdb and ztest, so that
they can change libzpool tuneables from the command line.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Pavel Zakharov <pavel.zakharov@delphix.com>
5697e03e6ehttps://www.illumos.org/issues/8046
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Pedro Giffuni <pfg@freebsd.org>
illumos/illumos-gate@8363e80ae7https://github.com/illumos/illumos-gate/commit/8363e80ae72609660f6090766ca8c2c18aa53f0https://www.illumos.org/issues/7303
This change introduces a new weighting algorithm to improve metaslab selection.
The new weighting algorithm relies on the SPACEMAP_HISTOGRAM feature. As a result,
the metaslab weight now encodes the type of weighting algorithm used
(size-based vs segment-based).
This also introduce a new allocation tracing facility and two new dcmds to help
debug allocation problems. Each zio now contains a zio_alloc_list_t structure
that is populated as the zio goes through the allocations stage. Here's an
example of how to use the tracing facility:
> c5ec000::print zio_t io_alloc_list | ::walk list | ::metaslab_trace
MSID DVA ASIZE WEIGHT RESULT VDEV
- 0 400 0 NOT_ALLOCATABLE ztest.0a
- 0 400 0 NOT_ALLOCATABLE ztest.0a
- 0 400 0 ENOSPC ztest.0a
- 0 200 0 NOT_ALLOCATABLE ztest.0a
- 0 200 0 NOT_ALLOCATABLE ztest.0a
- 0 200 0 ENOSPC ztest.0a
1 0 400 1 x 8M 17b1a00 ztest.0a
> 1ff2400::print zio_t io_alloc_list | ::walk list | ::metaslab_trace
MSID DVA ASIZE WEIGHT RESULT VDEV
- 0 200 0 NOT_ALLOCATABLE mirror-2
- 0 200 0 NOT_ALLOCATABLE mirror-0
1 0 200 1 x 4M 112ae00 mirror-1
- 1 200 0 NOT_ALLOCATABLE mirror-2
- 1 200 0 NOT_ALLOCATABLE mirror-0
1 1 200 1 x 4M 112b000 mirror-1
- 2 200 0 NOT_ALLOCATABLE mirror-2
If the metaslab is using segment-based weighting then the WEIGHT column will
display the number of segments available in the bucket where the allocation
attempt was made.
Author: George Wilson <george.wilson@delphix.com>
Reviewed by: Alex Reece <alex@delphix.com>
Reviewed by: Chris Siden <christopher.siden@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <paul.dagnelie@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Don Brady <don.brady@intel.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
illumos/illumos-gate@c079fa4d20c079fa4d20https://www.illumos.org/issues/6428
Scenario:
$ zfs create rpool/p
$ zfs set canmount=noauto rpool/p
$ zfs umount rpool/p
$ zfs create rpool/p/c
$ zfs get -r mounted,canmount rpool/p
NAME PROPERTY VALUE SOURCE
rpool/p mounted no -
rpool/p canmount noauto local
rpool/p/c mounted yes -
rpool/p/c canmount on default
In another shell ensure that rpool/p/c is in use, for example:
$ cd /rpool/p/c
Then:
$ zfs set canmount=off rpool/p
cannot unmount '/rpool/p/c': Device busy
But there is no reason to try to unmount rpool/p/c in this scenario.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Gordon Ross <gordon.w.ross@gmail.com>
Author: Andriy Gapon <andriy.gapon@clusterhq.com>
illumos/illumos-gate@9185393f299185393f29https://www.illumos.org/issues/6412
It seems that zfs receive -F -u would mount a received filesystem after
receiving a full stream if a destination filesystem already existed (and, thus,
got destroyed and re-created) and was mounted.
How to reproduce:
$ zfs create rpool/sandbox
$ zfs create rpool/sandbox/from
$ zfs create rpool/sandbox/to
$ zfs snap rpool/sandbox/from@snap
$ zfs send rpool/sandbox/from@snap | zfs recv -v -F -u rpool/sandbox/to
receiving full stream of rpool/sandbox/from@snap into rpool/sandbox/to@snap
received 41.7KB stream in 1 seconds (41.7KB/sec)
$ zfs get mounted rpool/sandbox/to
NAME PROPERTY VALUE SOURCE
rpool/tmp/sandbox/to mounted yes -
This behavior can be problematic if the mountpoint property changes either
because it had a non-inherited value or the stream contains properties because
it has been generated with either -R or -p.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Andriy Gapon <andriy.gapon@clusterhq.com>
illumos/illumos-gate@ee3c499ad1ee3c499ad1https://www.illumos.org/issues/5752
dump_nvlist() is not aware of the boolean array value type:
bad config type 24 for "foobar"
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Will Andrews <will@freebsd.org>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Andriy Gapon <andriy.gapon@clusterhq.com>
illumos/illumos-gate@620f322510620f322510https://www.illumos.org/issues/6051
Currently lzc_receive() requires that its snapname argument is a snapshot name
(contains '').
@zfs receive allows to specify just a dataset name and would try to deduce the
snapshot name from the stream.
I propose to allow lzc_receive() to do the same.
That seems to be quite easy to implement, it requires only a small amount of
logic, it does not require any additional system calls or any additional data
from the stream.
The benefit is that the new behavior would allow to keep the snapshot names the
same between the sender and receiver at zero cost, without a need to pass the
names out of band.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Andriy Gapon <avg@icyb.net.ua>
7298 printa() of multiple aggregations can fail for llquantize()
illumos/illumos-gate@0ddc0ebb74
Reviewed by: Patrick Mooney <patrick.mooney@joyent.com>
Reviewed by: Robert Mustacchi <rm@joyent.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Reviewed by: Adam Leventhal <adam.leventhal@gmail.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Bryan Cantrill <bryan@joyent.com>
illumos/illumos-gate@c3bd3abd88
Add syntactic sugar to dtrace: "if" and "else" statements. The sugar is
baked down to standard dtrace features by adding additional clauses with
the appropriate predicates.
Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Bryan Cantrill <bryan@joyent.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Matthew Ahrens <mahrens@delphix.com>