r323791 changed the return value of zpool_read_label. Error paths that
previously returned 0 began to return -1 instead. However, not all error
paths initialized errno. When adding vdevs to a very large pool, errno could
be prepopulated with ENOMEM, causing the operation to fail. Fix the bug by
setting errno=ENOENT in the case that no ZFS label is found.
PR: 226096
Submitted by: Nikita Kozlov
Reviewed by: avg
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D13088
+ Add missing signals SIGTHR (32) and SIGLIBRT (33)
+ Add inline for converting SIG* int to string
+ Add inline for converting CLD_* int to string
Reviewed by: markj
Sponsored by: Smule, Inc.
Differential Revision: https://reviews.freebsd.org/D14497
It gets called by dmu_buf_init_user, which is inline but not static. So it
needs global linkage itself.
Reported by: GCC-6
MFC after: 17 days
X-MFC-With: 329722
+ Add dev_type do devinfo_t
+ Add b_cmd to bufinfo_t
+ Add constants for BIO_* and DEVSTAT_TYPE_*
+ Add inline for converting BIO_* int to string
+ Add inline for converting DEVSTAT_TYPE_* int to string
+ Add mask for dev_type & DEVSTAT_TYPE_MASK to string
+ Add mask for dev_type & DEVSTAT_TYPE_IF_MASK to string
Reviewed by: markj
Sponsored by: Smule, Inc.
Differential Revision: https://reviews.freebsd.org/D14396
8940 Sending an intra-pool resumable send stream may result in EXDEV
illumos/illumos-gate@544132fce3
"zfs send -t <token>" for an incremental send should be able to resume
successfully when sending to the same pool: a subtle issue in
zfs_iter_children() doesn't currently allow this.
Because resuming from a token requires "guid" -> "dataset" mapping
(guid_to_name()), we have to walk the whole hierarchy to find the right
snapshots to send.
When resuming an incremental send both source and destination live in the
same pool and have the same guid: this is where zfs_iter_children() gets
confused and picks up the wrong snapshot, so we end up trying to send an
incremental "destination@snap1 -> source@snap2" stream instead of
"source@snap1 -> source@snap2": this fails with an "Invalid cross-device
link" (EXDEV) error.
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Hans Rosenfeld <rosenfeld@grumpf.hope-2000.org>
Author: loli10K <ezomori.nozomu@gmail.com>
9075 Improve ZFS pool import/load process and corrupted pool recovery
illumos/illumos-gate@6f7938128a
Some work has been done lately to improve the debugability of the ZFS pool
load (and import) process. This includes:
https://www.illumos.org/issues/7638: Refactor spa_load_impl into several functions
https://www.illumos.org/issues/8961: SPA load/import should tell us why it failed
https://www.illumos.org/issues/7277: zdb should be able to print zfs_dbgmsg's
To iterate on top of that, there's a few changes that were made to make the
import process more resilient and crash free. One of the first tasks during the
pool load process is to parse a config provided from userland that describes
what devices the pool is composed of. A vdev tree is generated from that config,
and then all the vdevs are opened.
The Meta Object Set (MOS) of the pool is accessed, and several metadata objects
that are necessary to load the pool are read. The exact configuration of the
pool is also stored inside the MOS. Since the configuration provided from
userland is external and might not accurately describe the vdev tree
of the pool at the txg that is being loaded, it cannot be relied upon to safely
operate the pool. For that reason, the configuration in the MOS is read early
on. In the past, the two configurations were compared together and if there was
a mismatch then the load process was aborted and an error was returned.
The latter was a good way to ensure a pool does not get corrupted, however it
made the pool load process needlessly fragile in cases where the vdev
configuration changed or the userland configuration was outdated. Since the MOS
is stored in 3 copies, the configuration provided by userland doesn't have to be
perfect in order to read its contents. Hence, a new approach has been adopted:
The pool is first opened with the untrusted userland configuration just so that
the real configuration can be read from the MOS. The trusted MOS configuration
is then used to generate a new vdev tree and the pool is re-opened.
When the pool is opened with an untrusted configuration, writes are disabled
to avoid accidentally damaging it. During reads, some sanity checks are
performed on block pointers to see if each DVA points to a known vdev;
when the configuration is untrusted, instead of panicking the system if those
checks fail we simply avoid issuing reads to the invalid DVAs.
This new two-step pool load process now allows rewinding pools accross
vdev tree changes such as device replacement, addition, etc. Loading a pool
from an external config file in a clustering environment also becomes much
safer now since the pool will import even if the config is outdated and didn't,
for instance, register a recent device addition.
With this code in place, it became relatively easy to implement a
long-sought-after feature: the ability to import a pool with missing top level
(i.e. non-redundant) devices. Note that since this almost guarantees some loss
Of data, this feature is for now restricted to a read-only import.
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
Approved by: Hans Rosenfeld <rosenfeld@grumpf.hope-2000.org>
Author: Pavel Zakharov <pavel.zakharov@delphix.com>
illumos/illumos-gate@add927f8c8
Reported on the ZFSonLinux https://github.com/zfsonlinux/zfs/issues/4843,
fixed by https://github.com/zfsonlinux/zfs/pull/6339:
If we are in the middle of an incremental zfs receive, the child .../%recv
will exist. If you concurrently run zfs promote .../%recv, it will "work",
but then zfs gets confused. For example, there's no obvious way to destroy
the containing filesystem (because it is now a clone of its invisible child).
Attempting to do this promote should be an error. We could fix this by
having zfs_ioc_promote() check if zc_name contains a %, similar to
zfs_ioc_rename().
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: loli10K <ezomori.nozomu@gmail.com>
illumos/illumos-gate@e144c4e6c9
Currently `zdb` consistently fails to examine non-idle pools as it fails
during the `spa_load()` process. The main problem seems to be that
`spa_load_verify()` fails as can be seen below:
$ sudo zdb -d -G dcenter
zdb: can't open 'dcenter': I/O error
ZFS_DBGMSG(zdb):
spa_open_common: opening dcenter
spa_load(dcenter): LOADING
disk vdev '/dev/dsk/c4t11d0s0': best uberblock found for spa dcenter. txg 40824950
spa_load(dcenter): using uberblock with txg=40824950
spa_load(dcenter): UNLOADING
spa_load(dcenter): RELOADING
spa_load(dcenter): LOADING
disk vdev '/dev/dsk/c3t10d0s0': best uberblock found for spa dcenter. txg 40824952
spa_load(dcenter): using uberblock with txg=40824952
spa_load(dcenter): FAILED: spa_load_verify failed [error=5]
spa_load(dcenter): UNLOADING
This change makes `spa_load_verify()` a dryrun when ran from `zdb`. This is
done by creating a global flag in zfs and then setting it in `zdb`.
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Andy Stormont <astormont@racktopsystems.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Pavel Zakharov <pavel.zakharov@delphix.com>
illumos/illumos-gate@36a64e6284
To prevent kmem_cache reaping from blocking other system resources, turn
kmem_cache_reap_now() (which blocks) into kmem_cache_reap_soon(). Callers
to kmem_cache_reap_soon() should use kmem_cache_reap_active(), which
exploits #9017's new taskq_empty().
Reviewed by: Bryan Cantrill <bryan@joyent.com>
Reviewed by: Dan McDonald <danmcd@joyent.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Yuri Pankov <yuripv@yuripv.net>
Author: Tim Kordas <tim.kordas@joyent.com>
FreeBSD does not use taskqueue for kmem caches reaping, so this change
is less dramatic then it is on Illumos, just limiting reaping to 1 time
per second. It may possibly be improved later, if needed.
illumos/illumos-gate@5cabbc6b49https://www.illumos.org/issues/7614:
This project allows top-level vdevs to be removed from the storage pool with
“zpool remove”, reducing the total amount of storage in the pool. This
operation copies all allocated regions of the device to be removed onto other
devices, recording the mapping from old to new location. After the removal is
complete, read and free operations to the removed (now “indirect”) vdev must
be remapped and performed at the new location on disk. The indirect mapping
table is kept in memory whenever the pool is loaded, so there is minimal
performance overhead when doing operations on the indirect vdev.
The size of the in-memory mapping table will be reduced when its entries
become “obsolete” because they are no longer used by any block pointers in
the pool. An entry becomes obsolete when all the blocks that use it are
freed. An entry can also become obsolete when all the snapshots that
reference it are deleted, and the block pointers that reference it have been
“remapped” in all filesystems/zvols (and clones). Whenever an indirect block
is written, all the block pointers in it will be “remapped” to their new
(concrete) locations if possible. This process can be accelerated by using
the “zfs remap” command to proactively rewrite all indirect blocks that
reference indirect (removed) vdevs.
Note that when a device is removed, we do not verify the checksum of the data
that is copied. This makes the process much faster, but if it were used on
redundant vdevs (i.e. mirror or raidz vdevs), it would be possible to copy
the wrong data, when we have the correct data on e.g. the other side of the
mirror. Therefore, mirror and raidz devices can not be removed.
Reviewed by: Alex Reece <alex@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Richard Laager <rlaager@wiktel.com>
Reviewed by: Tim Chase <tim@chase2k.com>
Approved by: Garrett D'Amore <garrett@damore.org>
Author: Prashanth Sreenivasa <pks@delphix.com>
illumos/illumos-gate@95643f75d295643f75d2https://www.illumos.org/issues/8520
lzc_rollback_to() should support rolling back to a clone's origin.
The current checks in zfs_ioc_rollback() would not allow that because the
origin snapshot belongs to a different filesystem.
The overly restrictive check was introduced in 7600, but it was not a
regression as none of the existing tools provided a way to rollback to the
origin.
https://www.illumos.org/issues/7198
EINVAL is returned when a dataset does not have any snapshots, so there is
nothing to roll back to.
Although the code in zfs_do_rollback checks for that condition in advance, it's
still possible that the snapshot(s) gets removed after the check and before the
rollback sync task is executed.
At the moment zfs command would crash when that happens.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Andriy Gapon <avg@FreeBSD.org>
MFC after: 2 weeks
illumos/illumos-gate@7855d95b307855d95b30https://www.illumos.org/issues/7446
Since we support whole-disk configuration for boot pool, we also will need
whole disk support with UEFI boot and for this, zpool create should create efi-
system partition.
I have borrowed the idea from oracle solaris, and introducing zpool create -
B switch to provide an way to specify that boot partition should be created.
However, there is still an question, how big should the system partition be.
For time being, I have set default size 256MB (thats minimum size for FAT32
with 4k blocks). To support custom size, the set on creation "bootsize"
property is created and so the custom size can be set as: zpool create B -
o bootsize=34MB rpool c0t0d0
After pool is created, the "bootsize" property is read only. When -B switch is
not used, the bootsize defaults to 0 and is shown in zpool get output with
value ''. Older zfs/zpool implementations are ignoring this property.
https://www.illumos.org/rb/r/219/
Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
Reviewed by: Yuri Pankov <yuri.pankov@gmail.com>
Approved by: Dan McDonald <danmcd@kebe.com>
Author: Toomas Soome <tsoome@me.com>
This commit makes no sense for FreeBSD, that is why I blocked the option,
but it should be good to stay closer to upstream.
illumos/illumos-gate@d8584ba6fbd8584ba6fbhttps://www.illumos.org/issues/7990
The snapspec_cb() callback function in libzfs does not need to call zfs_strdup().
Reviewed by: Yuri Pankov <yuri.pankov@gmail.com>
Reviewed by: Toomas Soome <tsoome@me.com>
Approved by: Matthew Ahrens <mahrens@delphix.com>
Author: Marcel Telka <marcel@telka.sk>
illumos/illumos-gate@7c13517fff7c13517fffhttps://www.illumos.org/issues/7745
The problem is that consumers of `libZFS_Core` that forget to call
`libzfs_core_init()` before calling any other function of the library
are having a hard time realizing their mistake. The library's internal
file descriptor is declared as global static, which is ok, but it is not
initialized explicitly; therefore, it defaults to 0, which is a valid
file descriptor. If `libzfs_core_init()`, which explicitly initializes
the correct fd, is skipped, the ioctl functions return errors that do
not have anything to do with `libZFS_Core`, where the problem is
actually located.
Even though assertions for that existed within `libZFS_Core` for debug
builds, they were never enabled because the `-DDEBUG` flag was missing
from the compiler flags.
This patch applies the following changes:
1. It adds `-DDEBUG` for debug builds of `libZFS_Core` and `libzfs`,
to enable their assertions on debug builds.
2. It corrects an assertion within `libzfs`, where a function had
been spelled incorrectly (`zpool_prop_unsupported()`) and nobody
knew because the `-DDEBUG` flag was missing, and the preprocessor
was taking that part of the code away.
3. The library's internal fd is initialized to `-1` and `VERIFY`
assertions have been placed to check that the fd is not equal to
`-1` before issuing any ioctl. It is important here to note, that
the `VERIFY` assertions exist in both debug and non-debug builds.
4. In `libzfs_core_fini` we make sure to never increment the
refcount of our fd below 0, and also reset the fd to `-1` when no
one refers to it. The reason for this, is for the rare case that
the consumer closes all references but then calls one of the
library's functions without using `libzfs_core_init()` first, and
in the mean time, a previous call to `open()` decided to reuse
our previous fd. This scenario would have passed our assertion in
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Serapheim Dimitropoulos <serapheim@delphix.com>
7604 if volblocksize property is the default, it displays as "-" rather than 8K
illumos/illumos-gate@4d86c0eab24d86c0eab2https://www.illumos.org/issues/7604
If a zvol has the default setting for the "volblocksize" property, it is
8KB. However, it is displayed as "-" (not present), rather than "8K".
The problem was introduced by:
commit 25228e830e86924a41243343b1de9daf2d7dd43a
Author: Matthew Ahrens <mahrens@delphix.com>
Date: Thu Nov 17 14:37:24 2016 -0800
7571 non-present readonly numeric ZFS props do not have default value
which changed changed get_numeric_property() to indicate that readonly
default properties are not present. However, zfs_prop_readonly() returns
TRUE for both readonly and set-once properties (e.g. volblocksize).
Amusingly, that commit essentially reverted:
6900484 default volblocksize is no longer being reported correctly
from November 2009. However, that change was not correct either; the
correct solution is to only do this check for "truly readonly" (i.e. not
setonce) properties.
$ zfs list -t volume -o name,volblocksize
NAME
VOLBLOCK
domain0/group-100/appdata_container-101/appdata_windows_timeflow-102/
archive -
domain0/group-100/appdata_container-101/appdata_windows_timeflow-102/
datafile -
domain0/group-100/appdata_container-101/appdata_windows_timeflow-102/
external -
rpool/dump
128K
rpool/swap
4K
rpool/swap1
===============================================================================
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>
illumos/illumos-gate@09c9e6dc9b09c9e6dc9bhttps://www.illumos.org/issues/7542
libshare keeps a cached copy of the sharetab listing in memory, which can
become out of date if shares are destroyed or created while leaving a libzfs
handle open. This results in a spurious unmounting failure when an NFS share
exists but isn't in the stale libshare cache.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Matt Amdur <matt.amdur@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Chris Williamson <chris.williamson@delphix.com>
illumos/illumos-gate@873c4903a5873c4903a5https://www.illumos.org/issues/7336
We can run into a problem where we call into zfs_mount, which in turn calls
is_dir_empty, which opens the directory to try and make sure it's empty. The
issue with the current approach is that it holds the directory open while it
traverses it with readdir, which, due to subtle interaction with the Java JVM,
vfork, and exec can cause a tricky race condition resulting in zfs_mount
failures.
The approach to resolving the issue in this patch is to drop the usage of
readdir altogether, and instead rely on the fact that ZFS stores the number of
entries contained in a directory using the st_size field of the stat structure.
Thus, if the directory in question is a ZFS directory, we can check to see if
it's empty by calling stat() and inspecting the st_size field of structure
returned.
===============================================================================
The root cause appears to be an interesting race between vfork, exec, and
zfs_mount's usage of O_CLOEXEC when calling openat. Here's what is going on:
1. We call zfs_mount, and this in turn calls openat to check if the directory
is empty, which results in opening the directory we're trying to mount onto,
and increment v_count.
2. As we're in the middle of reading the directory, vfork is called by the JVM
and proceeds to exec the jspawnhelper utility. As a result of the vfork, we
take an additional hold on the directory, which increments v_count a second
time. The semantics of vfork mean the parent process will wait for the child
process to exit or exec before the parent can continue; at this point the
parent is in the middle of zfs_mount, reading the directory to determine if
it's empty or not.
3. The child process exec-ing jspawnhelper gets to the relvm call within
exec_args (which is called by exec_common). relvm is the function that releases
the parent process, allowing the parent to proceed. The problem is, at this
point of calling relvm, the child hasn't yet called close_exec which is
responsible for closing the file descriptors inherited from the parent process
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Robert Mustacchi <rm@joyent.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Prakash Surya <prakash.surya@delphix.com>
illumos/illumos-gate@d420209d9cd420209d9chttps://www.illumos.org/issues/7233
This fixes a race where one thread is executing zfs_mount() while another
thread forks and execs. If the fork occurs while the directory is open, the
child process will inherit (but not necessarily close immediately) the open fd
for the directory, preventing the mount.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Alex Reece <alex@delphix.com>
illumos/illumos-gate@c3c65d17f7c3c65d17f7https://www.illumos.org/issues/7502
Right now ztest executes zdb without -G, so when it has errors, the messages
are often not very helpful:
Executing zdb -bccsv -d -U /rpool/tmp/zpool.cache ztest
zdb: can't open 'ztest': Operation not supported
ztest: '/usr/sbin/amd64/zdb -bccsv -d -U /rpool/tmp/zpool.cache ztest' exit
code 1
With -G, we'd have:
/usr/sbin/amd64/zdb -bccsv -d -U /rpool/tmp/zpool.cache -G ztest
zdb: can't open 'ztest': Operation not supported
ZFS_DBGMSG(zdb):
spa_open_common: opening ztest
spa_load(ztest): LOADING
spa_load(ztest): FAILED: unable to parse config [error=48]
spa_load(ztest): UNLOADING
Which indicates where the error came from
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Approved by: Gordon Ross <gordon.w.ross@gmail.com>
Author: Pavel Zakharov <pavel.zakharov@delphix.com>
illumos/illumos-gate@3f7978d02b3f7978d02bhttps://www.illumos.org/issues/8081
zdb(8) is full of minor problems that generate compiler warnings. On FreeBSD,
which uses -WError, the only way to build it is to disable all compiler
warnings. This makes it much harder to detect newly introduced bugs. We should
cleanup all the warnings.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Alan Somers <asomers@gmail.com>
8502 illumos#7955 broke delegated datasets when libshare is not present
illumos/illumos-gate@1c18e8fbd81c18e8fbd8https://www.illumos.org/issues/8502
The code in lib/libzfs/common/libzfs_mount.c already basically handles
the case when libshare is not installed. We just need to not fail in
zfs_init_libshare_impl. I tested this in lx and things work as
expected. I also tested there trying to set sharenfs and sharesmb on
the delegated dataset. Neither is allowed from within a zone. The
spew of msgs from a native zone is not ZFS specific. I see the same
spew simply running the share command.
Reviewed by: Robert Mustacchi <rm@joyent.com>
Reviewed by: Yuri Pankov <yuripv@gmx.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Jerry Jelinek <jerry.jelinek@joyent.com>
If there are no damaged pools, then ignore all GEOM events. We only use
them to fix damaged pools. However, still pay attention to ZFS events.
MFC after: 20 days
X-MFC-With: 329284
Sponsored by: Spectra Logic Corp
cddl/usr.sbin/zfsd/zfsd_event.cc
Remove the check for da and ada devices. This way zfsd can work on md,
geli, glabel, gstripe, etc devices. geli in particular is useful
combined with ZFS. gnop is also useful for simulating drive pulls in
the ZFSD test suite.
Also, eliminate the DevfsEvent class entirely. Move its
responsibilities into GeomEvent. We can get everything we need to know
just from listening to GEOM events.
lib/libdevdctl/event.cc
Fix GeomEvent::DevName for CREATE events. Oddly, the relevant field is
named "cdev" for CREATE events but "devname" for disk events.
MFC after: 3 weeks
Relnotes: Yes (probably worth mentioning the geli part)
Sponsored by: Spectra Logic Corp
Fix an assertion in zpool that causes a crash when running any "zpool add"
command on a spare that contains a replacing vdev with a spare child.
This likely affects Illumos, too.
PR: 225546
MFC after: 3 weeks
Sponsored by: Spectra Logic Corp
Differential Revision: https://reviews.freebsd.org/D14138
If a zfs pool contains a replacing vdev (either created manually by "zpool
replace" or by zfsd(8) via autoreplace by physical path) and then new spares
get added to the pool, zfsd shouldn't use one to replace the drive that is
already being replaced. That's a waste of resources that just slows down
the rebuild.
PR: 225547
MFC after: 3 weeks
Sponsored by: Spectra Logic Corp
illumos/illumos-gate@e9b7d6e7f7https://www.illumos.org/issues/8972:
'zfs holds -H' does not properly output content in scripted mode. It uses a
tab instead of two spaces, but it still pads column widths with spaces when
it should not.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Allan Jude <allanjude@freebsd.org>
illumos/illumos-gate@4ae5f5f06chttps://www.illumos.org/issues/8652:
Clang and GCC prefer to use unsigned ints to store enums. With Clang, that
causes tautological comparison warnings when comparing a zfs_prop_t or
zpool_prop_t variable to the macro ZPROP_INVAL. It's likely that error
handling code is being silently removed as a result.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Igor Kozhukhov <igor@dilos.org>
Approved by: Gordon Ross <gwr@nexenta.com>
Author: Alan Somers <asomers@gmail.com>
8641 "zpool clear" and "zinject" don't work on "spare" or "replacing" vdevs
illumos/illumos-gate@2ba5f978a4https://www.illumos.org/issues/8641:
"zpool clear" and "zinject -d" can both operate on specific vdevs, either
leaf or interior. However, due to an oversight, neither works on a "spare"
or "replacing" vdev. For example:
sudo zpool create foo raidz1 c1t5000CCA000081D61d0 c1t5000CCA000186235d0 spare c
1t5000CCA000094115d0
sudo zpool replace foo c1t5000CCA000186235d0 c1t5000CCA000094115d0
$ zpool status foo pool: foo
state: ONLINE
scan: resilvered 81.5K in 0h0m with 0 errors on Fri Sep 8 10:53:03 2017
config:
NAME STATE READ WRITE CKSUM
foo ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
c1t5000CCA000081D61d0 ONLINE 0 0 0
spare-1 ONLINE 0 0 0
c1t5000CCA000186235d0 ONLINE 0 0 0
c1t5000CCA000094115d0 ONLINE 0 0 0
spares
c1t5000CCA000094115d0 INUSE currently in use
$ sudo zinject -d spare-1 -A degrade foo
cannot find device 'spare-1' in pool 'foo'
$ sudo zpool clear foo spare-1
cannot clear errors for spare-1: no such device in pool
Even though there was nothing to clear, those commands shouldn't have
reported an error. by contrast, trying to clear "raidz1-0" works just fine:
$ sudo zpool clear foo raidz1-0
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Gordon Ross <gwr@nexenta.com>
Author: Alan Somers <asomers@gmail.com>
8898 creating fs with checksum=skein on the boot pools fails ungracefully
illumos/illumos-gate@9fa2266d9ahttps://www.illumos.org/issues/8898:
# zfs create -o checksum=skein rpool/test
internal error: Result too large
Abort (core dumped)
Not a big deal per se, but should be handled correctly.
Reviewed by: Toomas Soome <tsoome@me.com>
Reviewed by: Andy Stormont <astormont@racktopsystems.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Yuri Pankov <yuri.pankov@nexenta.com>
PR: 222199
illumos/illumos-gate@9a551dd645https://www.illumos.org/issues/8897:
# zpool online -e test mirror-1
Assertion failed: nvlist_lookup_string(tgt, "path", &pathname) == 0, file ../common/libzfs_pool.c, line 2558, function zpool_vdev_online
Abort (core dumped)
Not a big deal per se, but should be handled gracefully, same way as 'offline' and 'online' without '-e'.
Also reported as: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221408
Reviewed by: Toomas Soome <tsoome@me.com>
Reviewed by: Igor Kozhukhov <igor@dilos.org>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Yuri Pankov <yuri.pankov@nexenta.com>
illumos/illumos-gate@a3b2868063https://www.illumos.org/issues/8677
We want to be able to run channel programs outside of synching context.
This would greatly improve performance of channel program that just gather
information, as we won't have to wait for synching context anymore.
This feature should introduce the following:
- A new command line flag in "zfs program" to specify our intention to
run in open context.
- A new flag/option within the channel program ioctl which selects the
context.
- Appropriate error handling whenever we try a channel program in
open-context that contains zfs.sync* expressions.
- Documentation for the new feature in the manual pages.
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Chris Williamson <chris.williamson@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Serapheim Dimitropoulos <serapheim@delphix.com>
These return the jail ID and jail name for the traced process,
respectively, and are analogous to "zonename" on Solaris/illumos.
"zonename" is now aliased to "jailname".
Also add some stress tests for the new variables.
Submitted by: Domagoj Stolfa <domagoj.stolfa@gmail.com>
Reviewed by: dteske (previous version)
MFC after: 2 weeks
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D13877
The bug would cause incorrect behaviour when attempting to override
an already set environment variable with -x setenv, as long as the
variable is not the last one in the array.
Reported by: Samuel Lepetit <slepetit@apple.com>
MFC after: 2 weeks
This allows one to override the environment for processes created with
dtrace -c. By default, the environment is inherited.
This support was originally merged from illumos in r249367 but was lost
when the commit was later reverted and then brought back piecemeal.
Reported by: Samuel Lepetit <slepetit@apple.com>
MFC after: 2 weeks
Add basic command line parsing test coverage for these utilities. The tests
were automatically generated based on their man pages. These tests can be
expanded by hand for more thorough coverage. The aim is to generate very
basic amount of test coverage for all the utilities in the base system.
Tests generated via: https://github.com/shivansh/smoketestsuite/
Submitted by: shivansh
Reviewed by: asomers
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D12424
We can't link an executable using -m32 until the lib32 phase of a
buildworld, though the build works fine when executing make from
cddl/usr.sbin/dtrace/tests. Some other solution will need to be found.
SPA_MAXBLOCKSIZE is 16 MB and having such a large object on the stack is
not nice in general and it could cause some confusing failures in the
single-user mode where the default stack size of 8 MB is used.
I expect that the upstream would make the same change.
MFC after: 1 week
The test creates a D library with a "depends_on library" pragma
referencing a non-existent library, and expects compilation to fail.
However, as far as I can tell, libdtrace is supposed simply abort
compilation of the library in this case, and continue. This behaviour
is desirable when adding libraries which depend on optional KLDs, for
example.
MFC after: 1 week
When we encounter a USDT probe in a weak symbol, we emit an alias for
the probe function symbol. Such aliases are named differently from the
aliases we emit for probes in local functions, so make sure to take that
difference into account when resizing the output object file's string
table. Otherwise, we underrun the string table buffer.
PR: 223680
Previously some unimplemented libdtrace routines printed the function,
file and line number, followed by "DOODAD." That is not particularly
informative, so replace it with a message reporting the actual issue.
Sponsored by: The FreeBSD Foundation
illumos/illumos-gate@0a0551200e0a0551200ehttps://www.illumos.org/issues/640
du(1), df(1m), ls(1), and swap(1m) all include a copy (it appears literally
copied) of the 'number_to_scaled_string' function in their source. This should
be moved to a shared library and all 4 commands should use this instead.
FreeBSD note: of all libcmdutils functionality ZFS (and other illumos
contrib code) currently uses only nicenum() function (which is similar
to humanize_number but has some formatting differences). For this
reason I decided to not port the whole library. As a result, nicenum.c
from libcmdutils is compiled into libzfs and libzpool. This is a bit
ugly, but works. If one day we are forced to create libillumos, then
the file should be moved to that library.
Reviewed by: Sebastian Wiedenroth <wiedi@frubar.net>
Reviewed by: Robert Mustacchi <rm@joyent.com>
Reviewed by: Yuri Pankov <yuripv@gmx.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Jason King <jason.brian.king@gmail.com>
MFC after: 2 weeks
Previously, this didn't work because L2ARC devices' labels don't contain
pool GUIDs. Modify zfsd so that the pool GUID won't be required:
lib/libdevdctl/guid.h
Change INVALID_GUID from a uint64_t constant to a function that
returns an invalid Guid object. Remove the void constructor.
Nothing uses it, and it violates RAII.
cddl/usr.sbin/zfsd/case_file.h
cddl/usr.sbin/zfsd/case_file.cc
Allow CaseFile::Find to match a CaseFile based on Vdev GUID alone.
In CaseFile::ReEvaluate, attempt to online devices even if the newly
arrived device has no pool GUID.
cddl/usr.sbin/zfsd/vdev_iterator.cc
Iterate through a pool's cache devices as well as its regular
devices.
Reported by: avg
Reviewed by: avg
MFC after: 3 weeks
Sponsored by: Spectra Logic Corp
Differential Revision: https://reviews.freebsd.org/D12791
Previously, zpool_read_all_labels was trying to do 256KB reads, which are
greater than the default MAXPHYS and therefore must go through the slow,
unsafe AIO path. Shrink these reads to 112KB so they can use the safe, fast
AIO path instead.
MFC after: 1 week
X-MFC-With: 324568
Sponsored by: Spectra Logic Corp
- Fix a number of typos.
- Replace some illumos-specific references.
- Note that a type definition of kind CTF_K_FUNCTION may be followed by
a null type identifier in order to provide 4-byte alignment for the
next type definition.
MFC after: 2 weeks
illumos/illumos-gate@72d3dbb9ab72d3dbb9abhttps://www.illumos.org/issues/8300
Prior to integrating the mdocml update to 1.14.1, fix issues found by
new version, especially the "new sentence, new line" style rule.
FreeBSD note: this revision merges only the changes to the CTF manual
page. The changes to the ZFS pages cannot be applied directly.
Reviewed by: Robert Mustacchi <rm@joyent.com>
Reviewed by: Toomas Soome <tsoome@me.com>
Approved by: Gordon Ross <gwr@nexenta.com>
Author: Yuri Pankov <yuri.pankov@nexenta.com>
Discussed with: avg
MFC after: 2 weeks
The upstream has converted their manual page to the same format as we
have, so we can use the upstream version with minimal modifications.
The current modifications are:
- different zpool.cache path
- different manual sections for zdb, zfs, zpool commands
igor reports a few minor issues, it would be nice to fix them both in
FreeBSD and in the upstream.
MFC after: 3 weeks
illumos/illumos-gate@4923c69fdd4923c69fdd
FreeBSD note: the manual page is to be updated separately.
https://www.illumos.org/issues/8067
Add an option to zdb to print a literal embedded block pointer supplied on the
command line:
zdb -E [-A] word0:word1:...:word15
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Alex Reece <alex@delphix.com>
Reviewed by: Yuri Pankov <yuri.pankov@gmail.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>
MFC after: 3 weeks
illumos/illumos-gate@ed4e7a6a5ced4e7a6a5chttps://www.illumos.org/issues/7340
When -o origin=<snapshot> is specified as part of a ZFS receive, that origin
should override the automatic detection in libzfs.
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Paul Dagnelie <pcd@delphix.com>
MFC after: 3 weeks
illumos/illumos-gate@d5f26ad812d5f26ad812https://www.illumos.org/issues/5142
the current libzfs only allows simple disk and mirror setup for boot pool, as
loader does support booting from raidz, this feature will remove raidz
restriction from boot pool setup.
FreeBSD note: we have long supported this feature, this commit only
removes a small difference in libzfs.
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
Reviewed by: Albert Lee <trisk@omniti.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Toomas Soome <tsoome@me.com>
MFC after: 3 weeks
illumos/illumos-gate@aab04418a7aab04418a7https://www.illumos.org/issues/6268
The zfs diff command presents a description of the changes that have occurred
to files within a filesystem between two snapshots. If a file is renamed, the
tool is capable of reporting this, e.g.:
cd /some/zfs/dataset/subdir
mv file0 file1
Will result in a diff record like:
R /some/zfs/dataset/subdir/file0 -> /some/zfs/dataset/subdir/file1
Unfortunately, it seems that rename detection only uses the base filename to
determine if a file has been renamed or simply modified. This leads to
misreporting only the original filename, omitting the more relevant destination
filename entirely. For example:
cd /some/zfs/dataset/subdir
mv file0 ../otherdir/file0
Will result in a diff entry:
M /some/zfs/dataset/subdir/file0
But it should really emit:
R /some/zfs/dataset/subdir/file0 -> /some/zfs/dataset/otherdir/file0
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Justin Gibbs <gibbs@scsiguy.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Joshua M. Clulow <josh@sysmgr.org>
MFC after: 3 weeks
illumos/illumos-gate@ad2760acbdad2760acbdhttps://www.illumos.org/issues/7571
ZFS displays the default value for non-present readonly numeric (and index)
properties. However, these properties default values are not meaningful.
Instead, we should display a "-", indicating that they are not present. For
example, on a version-12 pool, the usedby* properties are not available, but
they show up as the incorrect value "0":
1. zfs get all test12
...
test12 usedbysnapshots 0 -
test12 usedbydataset 0 -
test12 usedbychildren 0 -
test12 usedbyrefreservation 0 -
We will be introducing more sometimes-present numeric readonly properties, so
it would be nice to fix this.
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>
MFC after: 3 weeks
illumos/illumos-gate@dfd5965f7edfd5965f7e
FreeBSD note: the manual page is to be updated separately.
https://www.illumos.org/issues/6392
When given a pool name via -e, zdb would attempt an import. If it
failed, then it would attempt a verbatim import. This behavior is
not always desirable so a -V switch is added to zdb to control the
behavior. When specified, a verbatim import is done. Otherwise,
the behavior is as it was previously, except no verbatim import
is done on failure.
a5778ea242
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Yuri Pankov <yuri.pankov@gmail.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Richard Yao <ryao@gentoo.org>
MFC after: 3 weeks
illumos/illumos-gate@ed61ec1da9ed61ec1da9
FreeBSD note: this commit does not update the manual page.
The original change includes conversion of the manual page from *roff format
to mandoc format. So, it is hard to extract the content change from
that. I am going to replace our zdb manual page, which is an earlier
independent conversion, with a slighly modified version of the upstream page.
https://www.illumos.org/issues/6410
This is primarily intended to ease debugging & testing ZFS when one is only
interested in things like the on-disk location of a specific object's blocks,
but doesn't know their object id. This allows doing things like the following
(FreeBSD-based example):
# zpool create -f foo da0
# dd if=/dev/zero of=/foo/1 bs=1M count=4 >/dev/null 2>&1
# zpool export foo
# zdb -vvvvv -o "ZFS plain file" foo /1
Object lvl iblk dblk dsize lsize %full type
8 2 16K 128K 3.99M 4M 100.00 ZFS plain file (K=inherit) (Z=inherit)
168 bonus System attributes
dnode flags: USED_BYTES USERUSED_ACCOUNTED
dnode maxblkid: 31
path /1
uid 0
gid 0
atime Thu Apr 23 22:45:32 2015
mtime Thu Apr 23 22:45:32 2015
ctime Thu Apr 23 22:45:32 2015
crtime Thu Apr 23 22:45:32 2015
gen 7
mode 100644
size 4194304
parent 4
links 1
pflags 40800000004
Indirect blocks:
0 L1 DVA[0]=<0:c19200:600> DVA[1]=<0:10800019200:600> [L1 ZFS
plain file] fletcher4 lz4 LE contiguous unique double size=4000L/200P birth=7L/
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Will Andrews <will@freebsd.org>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Yuri Pankov <yuri.pankov@nexenta.com>
MFC after: 3 weeks
illumos/illumos-gate@de05b58863de05b58863https://www.illumos.org/issues/3871
GCC 4.5.3 on Gentoo Linux did not like a few of the changes made in the issue
3604 patch. It printed an error and a couple of warnings:
../../cmd/zdb/zdb.c: In function 'dump_bpobj':
../../cmd/zdb/zdb.c:1257:3: error: 'for' loop initial declarations are only
allowed in C99 mode
../../cmd/zdb/zdb.c:1257:3: note: use option -std=c99 or -std=gnu99 to compile
your code
../../cmd/zdb/zdb.c: In function 'dump_deadlist':
../../cmd/zdb/zdb.c:1323:8: warning: too many arguments for format
../../cmd/zdb/zdb.c:1323:8: warning: too many arguments for format
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Richard Yao <ryao@gentoo.org>
MFC after: 3 weeks
illumos/illumos-gate@64723e361164723e3611https://www.illumos.org/issues/6866
Need this for #6865.
To be generally more scripting-friendly, overload this issue with adding '-q'
option which should skip printing any label information.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Yuri Pankov <yuri.pankov@nexenta.com>
MFC after: 3 weeks
7280 Allow changing global libzpool variables in zdb and ztest through command line
illumos/illumos-gate@0e60744c980e60744c98https://www.illumos.org/issues/7280
zdb is very handy for diagnosing problems with a pool in a safe and
quick way. When a pool is in a bad shape, we often want to disable some
fail-safes, or adjust some tunables in order to open them. In the
kernel, this is done by changing public variables in mdb. The goal of
this feature is to add the same capability to zdb and ztest, so that
they can change libzpool tuneables from the command line.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Pavel Zakharov <pavel.zakharov@delphix.com>
MFC after: 3 weeks
illumos/illumos-gate@2840dce1a02840dce1a0https://www.illumos.org/issues/8600
ZFS channel programs should be able to create snapshots.
In addition to the base snapshot functionality, this will likely entail adding
extra logic to handle edge cases which were formerly not possible, such as
creating then destroying a snapshot in the same transaction sync.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Chris Williamson <chris.williamson@delphix.com>
MFC after: 5 weeks
X-MFC after: r324163
illumos/illumos-gate@000cce6b6f000cce6b6fhttps://www.illumos.org/issues/8592
ZFS channel programs should be able to perform a rollback. This logic will
probably look pretty similar to zfs.sync.destroy().
Reviewed by: Chris Williamson <chris.williamson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Brad Lewis <brad.lewis@delphix.com>
MFC after: 5 weeks
X-MFC after: r324163
illumos/illumos-gate@5f39f884e25f39f884e2https://www.illumos.org/issues/8605
zfs.exists() in channel programs doesn't return any result, and should have a
man page entry.
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Chris Williamson <chris.williamson@delphix.com>
MFC after: 5 weeks
X-MFC after: r324163
7431 ZFS Channel Programs
illumos/illumos-gate@dfc115332cdfc115332chttps://www.illumos.org/issues/7431
ZFS channel programs (ZCP) adds support for performing compound ZFS
administrative actions via Lua scripts in a sandboxed environment (with time
and memory limits).
This initial commit includes both base support for running ZCP scripts, and a
small initial library of API calls which support getting properties and
listing, destroying, and promoting datasets.
Testing: in addition to the included unit tests, channel programs have been in
use at Delphix for several months for batch destroying filesystems. The
dsl_destroy_snaps_nvl() call has also been replaced with
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Approved by: Garrett D'Amore <garrett@damore.org>
Author: Chris Williamson <chris.williamson@delphix.com>
8552 ZFS LUA code uses floating point math
illumos/illumos-gate@916c8d8811916c8d8811https://www.illumos.org/issues/8552
In the LUA interpreter used by "zfs program", the lua format() function
accidentally includes support for '%f' and friends, which can cause compilation
problems when building on platforms that don't support floating-point math in
the kernel (e.g. sparc). Support for '%f' friends (%f %e %E %g %G) should be
removed, since there's no way to supply a floating-point value anyway (all
numbers in ZFS LUA are int64_t's).
Reviewed by: Yuri Pankov <yuripv@gmx.com>
Reviewed by: Igor Kozhukhov <igor@dilos.org>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>
8590 memory leak in dsl_destroy_snapshots_nvl()
illumos/illumos-gate@e6ab4525d1e6ab4525d1https://www.illumos.org/issues/8590
In dsl_destroy_snapshots_nvl(), "snaps_normalized" is not freed after it is
added to "arg".
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Steve Gonczi <steve.gonczi@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>
FreeBSD notes:
- zfs-program.8 manual page is taken almost as is from the vendor repository,
no FreeBSD-ification done
- fixed multiple instances of NULL being used where an integer is expected
- replaced ETIME and ECHRNG with ETIMEDOUT and EDOM respectively
This commit adds a modified version of Lua 5.2.4 under
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/lua, mirroring the
upstream. See README.zfs in that directory for the description of Lua
customizations.
See zfs-program.8 on how to use the new feature.
MFC after: 5 weeks
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D12528
FreeBSD notes:
- this MFV reverts FreeBSD commit r314549 to make the merge easier
- at present our emulation of cv_timedwait_hires is rather poor,
so I elected to use cv_timedwait_sbt directly
Please see the differential revision for details.
Unfortunately, I did not get any positive reviews, so there could be
bugs in the FreeBSD-specific piece of the merge.
Hence, the long MFC timeout.
illumos/illumos-gate@1271e4b10d1271e4b10dhttps://www.illumos.org/issues/8585
The current implementation of zil_commit() can introduce significant
latency, beyond what is inherent due to the latency of the underlying
storage. The additional latency comes from two main problems:
1. When there's outstanding ZIL blocks being written (i.e. there's
already a "writer thread" in progress), then any new calls to
zil_commit() will block waiting for the currently oustanding ZIL
blocks to complete. The blocks written for each "writer thread" is
coined a "batch", and there can only ever be a single "batch" being
written at a time. When a batch is being written, any new ZIL
transactions will have to wait for the next batch to be written,
which won't occur until the current batch finishes.
As a result, the underlying storage may not be used as efficiently
as possible. While "new" threads enter zil_commit() and are blocked
waiting for the next batch, it's possible that the underlying
storage isn't fully utilized by the current batch of ZIL blocks. In
that case, it'd be better to allow these new threads to generate
(and issue) a new ZIL block, such that it could be serviced by the
underlying storage concurrently with the other ZIL blocks that are
being serviced.
2. Any call to zil_commit() must wait for all ZIL blocks in its "batch"
to complete, prior to zil_commit() returning. The size of any given
batch is proportional to the number of ZIL transaction in the queue
at the time that the batch starts processing the queue; which
doesn't occur until the previous batch completes. Thus, if there's a
lot of transactions in the queue, the batch could be composed of
many ZIL blocks, and each call to zil_commit() will have to wait for
all of these writes to complete (even if the thread calling
zil_commit() only cared about one of the transactions in the batch).
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Prakash Surya <prakash.surya@delphix.com>
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D12355
illumos/illumos-gate@c861bfbd77c861bfbd77https://www.illumos.org/issues/8567
If fstat64 fails, pread64 fails, or the label is unintelligible,
zpool_read_label will return 0. But if malloc fails, it will return -1. For
consistency, it should always return -1 on failure or 0 on success.
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Alan Somers <asomers@gmail.com>
MFC after: 2 weeks
- The exit probe was not appropriately filtered to only the known pid so it
was firing on any random process that would exit rather the only the one
we cared about.
- The dtest script executes the tst.raise*.exe in the background from
POSIX sh without jobs control. POSIX mandates that SIGINT be set to
SIG_IGN in this case. The test executable never actually tested that
SIGINT could be caught despite trying to block and delay the signal.
So the SIGINT sent from raise() is never actually received since it
is ignored. This could be fixed by calling 'trap - INT' from dtest
before running the executable but I've opted to just use SIGUSR1
instead in these specific tests rather than adding more logic to
test that SIGINT is not ignored at startup.
These 2 issues meant that the tests would randomly work but only if a process
coincidentally exited during the test.
Reviewed by: markj
MFC after: 2 weeks
Sponsored by: Dell EMC Isilon
illumos/illumos-gate@fae6347731fae6347731https://www.illumos.org/issues/5815
When panic() is called from within ztest, the mdb ::status command isn't as
useful as it could be since the global panicstr variable isn't updated. We
should modify the function to make sure panicstr is set, so ::status can
present the error message just like it does on a failed assertion.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed by: Gordon Ross <gordon.ross@nexenta.com>
Reviewed by: Rich Lowe <richlowe@richlowe.net>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Prakash Surya <prakash.surya@delphix.com>
MFC after: 4 weeks
illumos/illumos-gate@4f4378cc544f4378cc54https://www.illumos.org/issues/8331
zfs_unshare returns EZFS_UNSHARENFSFAILED on error for all share types.
Reviewed by: Marcel Telka <marcel@telka.sk>
Reviewed by: Toomas Soome <tsoome@me.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Andrew Stormont <astormont@racktopsystems.com>
MFC after: 4 weeks
illumos/illumos-gate@d1672efb6fd1672efb6fhttps://www.illumos.org/issues/6280
The unshare_one() in libzfs could fail with EZFS_SHARENFSFAILED at line 834
here:
831 /* make sure libshare initialized */
832 if ((err = zfs_init_libshare(hdl, SA_INIT_SHARE_API)) != SA_OK) {
833 free(mntpt); /* don't need the copy anymore */
834 return (zfs_error_fmt(hdl, EZFS_SHARENFSFAILED,
835 dgettext(TEXT_DOMAIN, "cannot unshare '%s': %s"),
836 name, _sa_errorstr(err)));
837 }
The correct error should be EZFS_UNSHARENFSFAILED instead.
Reviewed by: Toomas Soome <tsoome@me.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Gordon Ross <gwr@nexenta.com>
Author: Marcel Telka <marcel.telka@nexenta.com>
MFC after: 4 weeks
We have D definitions for the named values in socket.h after r323253. Remove
them in test script to prevent compiling failure.
Reviewed by: markj, gnn
Differential Revision: https://reviews.freebsd.org/D12334
illumos/illumos-gate@1702cce7511702cce751
FreeBSD note: rather than merging the zpool.8 update I copied the zpool
scrub section from the illumos zpool.1m to FreeBSD zpool.8 almost
verbatim. Now that the illumos page uses the mdoc format, it was an
easier option. Perhaps the change is not in perfect compliance with the
FreeBSD style, but I think that it is acceptible.
https://www.illumos.org/issues/8414
This issue tracks the port of scrub pause from ZoL: https://github.com/zfsonlinux/zfs/pull/6167
Currently, there is no way to pause a scrub. Pausing may be useful when
the pool is busy with other I/O to preserve bandwidth.
Description
This patch adds the ability to pause and resume scrubbing. This is achieved
by maintaining a persistent on-disk scrub state. While the state is 'paused'
we do not scrub any more blocks. We do however perform regular scan
housekeeping such as freeing async destroyed and deadlist blocks while paused.
Motivation and Context
Scrub pausing can be an I/O intensive operation and people have been asking
for the ability to pause a scrub for a while. This allows one to preserve scrub
progress while freeing up bandwidth for other I/O.
Reviewed by: George Melikov <mail@gmelikov.ru>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Alek Pinchuk <apinchuk@datto.com>
MFC after: 2 weeks
The existing code in zmount incorrectly parses the comma-delimited option
string. The result is that nmount only honors the last option. AFAICT the
parsing has been broken ever since ZFS's initial import in change 168404.
PR: 222078
Reviewed by: avg
MFC after: 3 weeks
Sponsored by: Spectra Logic Corp
Differential Revision: https://reviews.freebsd.org/D12232
When inserting a partitioned disk, devfs and geom will announce the whole
disk before they announce the partition. If the partition containing ZFS
extends to one of the disk's extents, then zfsd will see a ZFS label on the
whole disk and attempt to online it. ZFS is smart enough to activate the
partition instead of the whole disk, but only if GEOM has already created
the partition's provider.
cddl/contrib/opensolaris/lib/libzfs/common/libzfs.h
cddl/contrib/opensolaris/lib/libzfs/common/libzfs_import.c
Add a zpool_read_all_labels method. It's similar to
zpool_read_label, but it will return the number of labels found.
cddl/usr.sbin/zfsd/zfsd_event.cc
When processing a DevFS CREATE event, only online a VDEV if we can
read all four ZFS labels.
Reviewed by: mav
MFC after: 3 weeks
Sponsored by: Spectra Logic Corp
Differential Revision: https://reviews.freebsd.org/D11920
The aggregation created by llquantize() partitions values into buckets; the
lower bound of the bucket containing the largest values is b^{m+1}, where
b and m are the second and fourth parameters to the action, respectively.
Bucket bounds are stored in a 64-bit integer, and so the llquantize()
validation checks need to verify that b^{m+1} fits in 64 bits. However, it
was only verifying that b^{m-1} fits in 64 bits, so certain parameter
combinations could trigger assertion failures in libdtrace.
PR: 219451
MFC after: 1 week
illumos/illumos-gate@77b171372e77b171372ehttps://www.illumos.org/issues/7600
At present, the kernel side code seems to blindly rollback to whatever happens
to be the latest snapshot at the time when the rollback task is processed.
The expected target's name should be passed to the kernel driver and the sync
task should validate that the target exists and that it is the latest snapshot
indeed.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Andriy Gapon <avg@FreeBSD.org>
MFC after: 3 weeks
illumos/illumos-gate@b7edcb9408b7edcb9408https://www.illumos.org/issues/8378
The problem is that zfs_get_data() supplies a stale zgd_bp to dmu_sync(), which
we then nopwrite against.
zfs_get_data() doesn't hold any DMU-related locks, so after it copies db_blkptr
to zgd_bp, dbuf_write_ready()
could change db_blkptr, and dbuf_write_done() could remove the dirty record.
dmu_sync() then sees the stale
BP and that the dbuf it not dirty, so it is eligible for nop-writing.
The fix is for dmu_sync() to copy db_blkptr to zgd_bp after acquiring the
db_mtx. We could still see a stale
db_blkptr, but if it is stale then the dirty record will still exist and thus
we won't attempt to nopwrite.
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>
MFC after: 2 weeks
illumos/illumos-gate@e09ba01dcde09ba01dcdhttps://www.illumos.org/issues/8418
The following line in zfs_validate_name() is just a no-op and it
should be removed:
108 (void) zfs_prop_get_table();
Reviewed by: Vitaliy Gusev <gusev.vitaliy@icloud.com>
Approved by: Matthew Ahrens <mahrens@delphix.com>
Author: Marcel Telka <marcel@telka.sk>
MFC after: 2 weeks
o Replace __riscv64 with (__riscv && __riscv_xlen == 64)
This is required to support new GCC 7.1 compiler.
This is compatible with current GCC 6.1 compiler.
RISC-V is extensible ISA and the idea here is to have built-in define
per each extension, so together with __riscv we will have some subset
of these as well (depending on -march string passed to compiler):
__riscv_compressed
__riscv_atomic
__riscv_mul
__riscv_div
__riscv_muldiv
__riscv_fdiv
__riscv_fsqrt
__riscv_float_abi_soft
__riscv_float_abi_single
__riscv_float_abi_double
__riscv_cmodel_medlow
__riscv_cmodel_medany
__riscv_cmodel_pic
__riscv_xlen
Reviewed by: ngie
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D11901
- Store the symbol table contents in an anonymous swap-backed object. Have
mmap(/dev/ksyms) map that object, and stop mapping the symbol table into
the calling process in ksyms_open(). Previously we would cache a pointer
to the pmap of the opening process, and mmap(/dev/ksyms) would create a
mapping using the physical address found by a pmap lookup at the initial
mapping address. However, this assumes that the cached pmap is valid,
which may not be the case. [1]
- Remove the ksyms ioctl interface. It appears to have been added to work
around a limitation in libelf that no longer exists; see r321842.
Moreover, the interface is difficult to support and isn't present in
illumos. Since ksyms was added specifically to support lockstat(1), it
is expected that this removal won't have any real impact.
- Simplify ksyms_read() to avoid unnecessary copying.
- Don't call the device handle destructor if we fail to capture a snapshot
of the kernel's symbol table. devfs will do that for us.
Reported by: Ilja van Sprundel <ivansprundel@ioactive.com> [1]
Reviewed by: kib (previous revision)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D11789
directories to SUBDIR.${MK_TESTS} idiom
This is being done to pave the way for future work (and homogenity) in
^/projects/make-check-sandbox .
No functional change intended.
MFC after: 1 weeks
This test is also timeout on a quiet system because there is nobody triggering
read probefunc while test execution.
Reviewed by: gnn, markj, ngie
Differential Revision: https://reviews.freebsd.org/D11731
verifying script which needs being run to complete the test.
While here, add missing shebang.
Reviewed by: gnn, markj, ngie
Differential Revision: https://reviews.freebsd.org/D11716
In Solaris, basename(1) and basename(3) both return "." while being given an
empty string (""), while in BSD (and Linux) basename(1) returns "" and
basename(3) returns "."
While here, also change #!/usr/bin/ksh to #!/usr/bin/env ksh to find ksh in
$PATH
Reviewed by: gnn, markj (earlier version), ngie (earlier version)
Differential Revision: https://reviews.freebsd.org/D11707
We added too many variable assignments in BEGIN block, which will run out of
default auto-configured variable buffer space. The test VM has 4G RAM which
should be enough for most cases so it's reasonable to increase limitation to
these case.
Reviewed by: gnn
Differential Revision: https://reviews.freebsd.org/D11676
We added too many variable assignments in BEGIN block, which will run out of
default auto-configured variable buffer space. The test VM has 4G RAM which
should be enough for most cases so it's reasonable to increase limitation to
these case.
Reviewed by: gnn, markj, ngie
Differential Revision: https://reviews.freebsd.org/D11674
testing.
This test times-out on a quiet system because there is nobody triggers
syscall::open:entry or syscall::: probe while test execution.
Reviewed by: gnn, markj (earlier version)
Differential Revision: https://reviews.freebsd.org/D11671
This test timeout on a quiet system because there is nobody triggers
'syscall::*wait*:entry' probe while test execution.
Reviewed by: gnn, markj, ngie
Differential Revision: https://reviews.freebsd.org/D11668
illumos/illumos-gate@770499e185770499e185https://www.illumos.org/issues/8021
The ARC buf data project (known simply as "ABD" since its genesis in the ZoL
community) changes the way the ARC allocates `b_pdata` memory from using linear
`void *` buffers to using scatter/gather lists of fixed-size 1KB chunks. This
improves ZFS's performance by helping to defragment the address space occupied
by the ARC, in particular for cases where compressed ARC is enabled. It could
also ease future work to allocate pages directly from `segkpm` for minimal-
overhead memory allocations, bypassing the `kmem` subsystem.
This is essentially the same change as the one which recently landed in ZFS on
Linux, although they made some platform-specific changes while adapting this
work to their codebase:
1. Implemented the equivalent of the `segkpm` suggestion for future work
mentioned above to bypass issues that they've had with the Linux kernel memory
allocator.
2. Changed the internal representation of the ABD's scatter/gather list so it
could be used to pass I/O directly into Linux block device drivers. (This
feature is not available in the illumos block device interface yet.)
FreeBSD notes:
- the actual (default) chunk size is 4KB (despite the text above saying 1KB)
- we can try to reimplement ABDs, so that they are not permanently
mapped into the KVA unless explicitly requested, especially on
platforms with scarce KVA
- we can try to use unmapped I/O and avoid intermediate allocation of a
linear, virtual memory mapped buffer
- we can try to avoid extra data copying by referring to chunks / pages
in the original ABD
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Chris Williamson <chris.williamson@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Dan Kimmel <dan.kimmel@delphix.com>
MFC after: 3 weeks
Since buildenv exports SYSROOT all of these uses will now look in
WORLDTMP by default.
sys/boot/efi/loader/Makefile
A LIBSTAND hack is no longer required for buildenv.
MFC after: 2 weeks
Sponsored by: Dell EMC Isilon
illumos/illumos-gate@a4b8c9aa65a4b8c9aa65https://www.illumos.org/issues/8264
Oddly there is a lzc_clone function, but no lzc_promote function.
Reviewed by: Andriy Gapon <avg@FreeBSD.org>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan McDonald <danmcd@kebe.com>
Approved by: Dan McDonald <danmcd@kebe.com>
Author: Andrew Stormont <astormont@racktopsystems.com>
MFC after: 1 week
illumos/illumos-gate@690031d326690031d326https://www.illumos.org/issues/8168
If we manage to export the pool on which we are creating a dataset (filesystem
or zvol) between entering libzfs`zfs_create() and libzfs`zpool_open() call (for
which we never check the return value) we end up dereferencing a NULL pointer
in libzfs`zpool_close().
This was discovered on ZFS on Linux. The same issue can be reproduced on
Illumos running in parallel:
while :; do zpool import -d /tmp testpool ; zpool export testpool ; done
while :; do zfs create testpool/fs; zfs destroy testpool/fs ; done
Eventually this will result in several core dumps like this one:
[root@52-54-00-d3-7a-01 /cores]# mdb core.zfs.4244
Loading modules: [ libumem.so.1 libc.so.1 libtopo.so.1 libavl.so.1
libnvpair.so.1 ld.so.1 ]
> ::stack
libzfs.so.1`zpool_close+0x17(0, 0, 0, 8047450)
libzfs.so.1`zfs_create+0x1bb(8090548, 8047e6f, 1, 808cba8)
zfs_do_create+0x545(2, 8047d74, 80778a0, 801, 0, 3)
main+0x22c(8047d2c, fef5c6e8, 8047d64, 8055a17, 3, 8047d70)
_start+0x83(3, 8047e64, 8047e68, 8047e6f, 0, 8047e7b)
>
Fix and reproducer (systemtap): https://github.com/zfsonlinux/zfs/pull/6096
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: loli10K <ezomori.nozomu@gmail.com>
MFC after: 2 weeks
illumos/illumos-gate@79809f9cf479809f9cf4https://www.illumos.org/issues/8269
It seems that currently normalization of stddev aggregation is done
incorrectly.
We divide both the sum of values and the sum of their squares by the
normalization factor. But we should divide the sum of squares by the
normalization factor squared to scale the original values properly.
FreeBSD note: the actual change was committed in r316853, this commit
adds the test files and record merge information.
Reviewed by: Bryan Cantrill <bryan@joyent.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Andriy Gapon <avg@FreeBSD.org>
MFC after: 1 week
Sponsored by: Panzura
illumos/illumos-gate@471a88e499471a88e499https://www.illumos.org/issues/5380
A stream created with zfs send -p -I contains properties of all snapshots of a
given dataset as opposed to only properties of snapshots in a given range.
Not only this is suboptimal but the receive code also does not filter
properties by the range. So, properties of earlier snapshots would be updated
even though the snapshots themselves are not in the stream (just their
properties).
Given that modifying the snapshot properties requires a TXG sync and that the
snapshots are updated one by one the described behavior may lead to a sever
performance penalty.
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Andriy Gapon <avg@FreeBSD.org>
MFC after: 3 weeks
illumos/illumos-gate@8b65a70b768b65a70b76https://www.illumos.org/issues/7541
When calling zpool import, zpool does a few ioctls to ZFS.
zpool allocates a buffer in userland and passes it to the kernel so that ZFS
can copy info into it. ZFS will use it to put the nvlist that describes the
pool configuration.
If the allocated buffer is too small, ZFS will return ENOMEM and the call will
have to be redone. This wastes CPU time and slows down the import process. This
happens very often for the ZFS_IOC_POOL_TRYIMPORT call.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Pavel Zakharov <pavel.zakharov@delphix.com>
MFC after: 2 weeks
illumos/illumos-gate@4dd77f9e384dd77f9e38https://www.illumos.org/issues/7545
When evicting from the ARC, we manipulate some refcount_t's, e.g. arcs_size.
When using zdb to examine a large amount of data (e.g. zdb -bb on a large pool
with small blocks), the ARC may have a large number of entries. If reference
tracking is enabled, there will be ~1 reference for each block in the ARC. When
evicting, we decrement the refcount and have to search all the references to
find the one that we are removing, which is very slow.
Since zdb is typically used to find problems with the on-disk format, and not
with the code it is running, we should disable reference tracking in zdb.
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Steve Gonczi <steve.gonczi@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>
MFC after: 2 weeks
Extend the ino_t, dev_t, nlink_t types to 64-bit ints. Modify
struct dirent layout to add d_off, increase the size of d_fileno
to 64-bits, increase the size of d_namlen to 16-bits, and change
the required alignment. Increase struct statfs f_mntfromname[] and
f_mntonname[] array length MNAMELEN to 1024.
ABI breakage is mitigated by providing compatibility using versioned
symbols, ingenious use of the existing padding in structures, and
by employing other tricks. Unfortunately, not everything can be
fixed, especially outside the base system. For instance, third-party
APIs which pass struct stat around are broken in backward and
forward incompatible ways.
Kinfo sysctl MIBs ABI is changed in backward-compatible way, but
there is no general mechanism to handle other sysctl MIBS which
return structures where the layout has changed. It was considered
that the breakage is either in the management interfaces, where we
usually allow ABI slip, or is not important.
Struct xvnode changed layout, no compat shims are provided.
For struct xtty, dev_t tty device member was reduced to uint32_t.
It was decided that keeping ABI compat in this case is more useful
than reporting 64-bit dev_t, for the sake of pstat.
Update note: strictly follow the instructions in UPDATING. Build
and install the new kernel with COMPAT_FREEBSD11 option enabled,
then reboot, and only then install new world.
Credits: The 64-bit inode project, also known as ino64, started life
many years ago as a project by Gleb Kurtsou (gleb). Kirk McKusick
(mckusick) then picked up and updated the patch, and acted as a
flag-waver. Feedback, suggestions, and discussions were carried
by Ed Maste (emaste), John Baldwin (jhb), Jilles Tjoelker (jilles),
and Rick Macklem (rmacklem). Kris Moore (kris) performed an initial
ports investigation followed by an exp-run by Antoine Brodin (antoine).
Essential and all-embracing testing was done by Peter Holm (pho).
The heavy lifting of coordinating all these efforts and bringing the
project to completion were done by Konstantin Belousov (kib).
Sponsored by: The FreeBSD Foundation (emaste, kib)
Differential revision: https://reviews.freebsd.org/D10439
pthread_cond_timedwait() receives absolute time, not relative. Passing
wrong time there caused two threads of zdb to spin in a tight loop.
MFC after: 1 week
7252 7628 compressed zfs send / receive
illumos/illumos-gate@5602294fda5602294fdahttps://www.illumos.org/issues/7252
This feature includes code to allow a system with compressed ARC enabled to
send data in its compressed form straight out of the ARC, and receive data in
its compressed form directly into the ARC.
https://www.illumos.org/issues/7628
We should have longer, more readable versions of the ZFS send / recv options.
7628 create long versions of ZFS send / receive options
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed by: David Quigley <dpquigl@davequigley.com>
Reviewed by: Thomas Caputi <tcaputi@datto.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Dan Kimmel <dan.kimmel@delphix.com>
7386 zfs get does not work properly with bookmarks
illumos/illumos-gate@edb901aab9edb901aab9https://www.illumos.org/issues/7386
The zfs get command does not work with the bookmark parameter while it works
properly with both filesystem and snapshot:
# zfs get -t all -r creation rpool/test
NAME PROPERTY VALUE SOURCE
rpool/test creation Fri Sep 16 15:00 2016 -
rpool/test@snap creation Fri Sep 16 15:00 2016 -
rpool/test#bkmark creation Fri Sep 16 15:00 2016 -
# zfs get -t all -r creation rpool/test@snap
NAME PROPERTY VALUE SOURCE
rpool/test@snap creation Fri Sep 16 15:00 2016 -
# zfs get -t all -r creation rpool/test#bkmark
cannot open 'rpool/test#bkmark': invalid dataset name
#
The zfs get command should be modified to work properly with bookmarks too.
Reviewed by: Simon Klinkert <simon.klinkert@gmail.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Approved by: Matthew Ahrens <mahrens@delphix.com>
Author: Marcel Telka <marcel@telka.sk>
8046 Let calloc() do the multiplication in libzfs_fru_refresh
5697e03e6ehttps://www.illumos.org/issues/8046
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Pedro Giffuni <pfg@freebsd.org>
MFC after: 3 days
illumos/illumos-gate@8363e80ae7https://github.com/illumos/illumos-gate/commit/8363e80ae72609660f6090766ca8c2c18https://www.illumos.org/issues/7303
This change introduces a new weighting algorithm to improve metaslab selection.
The new weighting algorithm relies on the SPACEMAP_HISTOGRAM feature. As a result,
the metaslab weight now encodes the type of weighting algorithm used
(size-based vs segment-based).
This also introduce a new allocation tracing facility and two new dcmds to help
debug allocation problems. Each zio now contains a zio_alloc_list_t structure
that is populated as the zio goes through the allocations stage. Here's an
example of how to use the tracing facility:
> c5ec000::print zio_t io_alloc_list | ::walk list | ::metaslab_trace
MSID DVA ASIZE WEIGHT RESULT VDEV
- 0 400 0 NOT_ALLOCATABLE ztest.0a
- 0 400 0 NOT_ALLOCATABLE ztest.0a
- 0 400 0 ENOSPC ztest.0a
- 0 200 0 NOT_ALLOCATABLE ztest.0a
- 0 200 0 NOT_ALLOCATABLE ztest.0a
- 0 200 0 ENOSPC ztest.0a
1 0 400 1 x 8M 17b1a00 ztest.0a
> 1ff2400::print zio_t io_alloc_list | ::walk list | ::metaslab_trace
MSID DVA ASIZE WEIGHT RESULT VDEV
- 0 200 0 NOT_ALLOCATABLE mirror-2
- 0 200 0 NOT_ALLOCATABLE mirror-0
1 0 200 1 x 4M 112ae00 mirror-1
- 1 200 0 NOT_ALLOCATABLE mirror-2
- 1 200 0 NOT_ALLOCATABLE mirror-0
1 1 200 1 x 4M 112b000 mirror-1
- 2 200 0 NOT_ALLOCATABLE mirror-2
If the metaslab is using segment-based weighting then the WEIGHT column will
display the number of segments available in the bucket where the allocation
attempt was made.
Author: George Wilson <george.wilson@delphix.com>
Reviewed by: Alex Reece <alex@delphix.com>
Reviewed by: Chris Siden <christopher.siden@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <paul.dagnelie@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Don Brady <don.brady@intel.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
This simplifies make logic/output
While here, remove bogus CFLAGS which look for headers in cddl/lib/libumem.
There aren't any source files there (just Makefiles)
MFC after: 1 month
Sponsored by: Dell EMC Isilon
Aliases are normally given names that include a key that's unique for each
input object file. This, for example, ensures that aliases for identically
named local symbols in different object files don't conflict. However, in
some cases the static linker will leave an undefined alias after merging
identical weak symbols, resulting in a link error. A non-unique name allows
the aliases to be merged as well.
PR: 216871
X-MFC With: r313262
libdtrace needs to append to the input object files' string and symbol
tables. Currently it does so by allocating a larger buffer, copying the
existing sections into them, and swapping pointers in the libelf data
descriptors. However, it also frees those buffers when its processing is
complete, which leads to a double free since the elftoolchain libelf
owns them and also frees them in elf_end(3). Instead, free the buffers
originally allocated by libelf.
MFC after: 2 weeks
When recording probe site addresses in the output DOF file, dtrace -G
needs to emit relocations for the .SUNW_dof section in order to obtain
the addresses of functions containing probe sites. DTrace expects the
addresses to be relative to the base address of the final ELF file,
and the amd64 USDT implementation was relying on some unspecified and
incorrect behaviour in the base system GNU ld to achieve this.
This change reimplements the probe site relocation handling to allow
USDT to be used with lld and newer GNU binutils. Specifically, it
makes use of R_X86_64_PC64/R_386_PC32 relocations to obtain the
probe site address relative to the DOF file address, and adds and uses a
new DOF relocation type which computes the final probe site address using
these relative offsets.
Reported by and discussed with: Rafael Espíndola
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D9374
It's pretty unlikely to actually hit this, but good to check it anyway
Reported by: Coverity
CID: 1362018
MFC after: 4 weeks
Sponsored by: Spectra Logic Corp
Also save only high bits in the ipv4_flags, because it is defined
as uint8_t. So now it will show DF and MF flags as 0x40 and 0x20.
Reviewed by: markj@
MFC after: 1 week
dtrace converts pairs of consecutive underscores in a probe name to dashes.
When dtrace -G processes relocations corresponding to USDT probe sites, it
performs this conversion on the corresponding symbol names prior to looking
up the resulting probe names in the USDT provider definition. However, in
so doing it would actually modify the input object's string table, which
breaks the string suffix merging done by recent binutils. Because we don't
care about the symbol name once the probe site is recorded, just perform the
probe lookup using a temporary copy.
Reported by: hrs, swills
MFC after: 3 weeks
Enumeration of return probes involves disassembling subroutines in the
target process, and ptrace(2) is currently used to read from the target
process. libproc could read from the backing file instead to avoid this
problem, but in the common case libdtrace will have a writeable handle
on the process anyway. In particular, a writeable handle is needed to list
USDT probes, and libdtrace will cache such a handle for processes that it
controls via dtrace -c and -p.
This change adds some handling for the equivalent of Solaris' PGRAB_*
flags. In particular, support for PGRAB_RDONLY is needed to avoid a
nasty deadlock: dtrace(1) may otherwise stop the master process for its
pseudo-terminal and end up blocking while writing to standard output.
When looking up an object by name, allow prefix matches if no direct match
is found. This allows one to, for example, match libc entry probes with:
# dtrace -n 'pid$target:libc.so::entry' -c ./foo
instead of requiring "libc.so.7" or a glob.
Also remove proc_obj2map() as it currently just duplicates the
functionality of proc_name2map(). It's supposed to take a Solaris
link-map ID as a paramter, but support for this isn't implemented and
isn't required to support DTrace's pid provider.
Fix a segfault in ctfmerge(1) due to a bug in GCC.
The change was correct and the bug real, but upstream didn't adopt it
and we want to remain in sync. When/if upstream does something about it
we can bring their version.
The bug in question was fixed in GCC 4.9 which is now the default in
FreeBSD's ports. Our native gcc-4.2, which is still in use in some Tier-2
platforms also has a workaround so no end-user should be harmed by the
revert.
due to zl_itx_list_sz not updated when async itx'es upgraded to sync.
Actually because of other changes about that time zl_itx_list_sz is not
really required to implement the functionality, so this patch removes
some unneeded broken code and variables.
Original idea of zil_slog_limit was to reduce chance of SLOG abuse by
single heavy logger, that increased latency for other (more latency critical)
loggers, by pushing heavy log out into the main pool instead of SLOG. Beside
huge latency increase for heavy writers, this implementation caused double
write of all data, since the log records were explicitly prepared for SLOG.
Since we now have I/O scheduler, I've found it can be much more efficient
to reduce priority of heavy logger SLOG writes from ZIO_PRIORITY_SYNC_WRITE
to ZIO_PRIORITY_ASYNC_WRITE, while still leave them on SLOG.
Existing ZIL implementation had problem with space efficiency when it
has to write large chunks of data into log blocks of limited size. In some
cases efficiency stopped to almost as low as 50%. In case of ZIL stored on
spinning rust, that also reduced log write speed in half, since head had to
uselessly fly over allocated but not written areas. This change improves
the situation by offloading problematic operations from z*_log_write() to
zil_lwb_commit(), which knows real situation of log blocks allocation and
can split large requests into pieces much more efficiently. Also as side
effect it removes one of two data copy operations done by ZIL code WR_COPIED
case.
While there, untangle and unify code of z*_log_write() functions.
Also zfs_log_write() alike to zvol_log_write() can now handle writes crossing
block boundary, that may also improve efficiency if ZPL is made to do that.
Sponsored by: iXsystems, Inc.
These are FreeBSD-specific and were added in r178576 to provide the ability
to pretty-print instances of compound types. However, the print action has
long since been augmented to provide this functionality with a simpler
interface.
Discussed with: gnn
Differential Revision: https://reviews.freebsd.org/D8478
illumos/illumos-gate@620f322510620f322510https://www.illumos.org/issues/6051
Currently lzc_receive() requires that its snapname argument is a snapshot name
(contains '@').
zfs receive allows to specify just a dataset name and would try to deduce the
snapshot name from the stream.
I propose to allow lzc_receive() to do the same.
That seems to be quite easy to implement, it requires only a small amount of
logic, it does not require any additional system calls or any additional data
from the stream.
The benefit is that the new behavior would allow to keep the snapshot names the
same between the sender and receiver at zero cost, without a need to pass the
names out of band.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Andriy Gapon <avg@icyb.net.ua>
MFC after: 2 weeks
(gpt)zfsboot will read one-time boot directives from a special ZFS pool
area. The area was previously described as "Boot Block Header", but
currently it is know as Pad2, marked as reserved and is zeroed out on
pool creation. The new code interprets data in this area, if any, using
the same format as boot.config. The area is immediately wiped out.
Failure to parse the directives results in a reboot right after the
cleanup. Otherwise the boot sequence proceeds as usual.
zfsbootcfg writes zfsboot arguments specified on its command line to the
Pad2 area of a disk identified by vfs.zfs.boot.primary_pool and
vfs.zfs.boot.primary_vdev kenv variables that are set by loader during
boot. Please see the manual page for more.
Thanks to all who reviewed, contributed and made suggestions! There are
many potential improvements to the feature, please see the review for
details.
Reviewed by: wblock (docs)
Discussed with: jhb, tsoome
MFC after: 3 weeks
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D7612
The untyped probe arguments have a width larger than int on such platforms,
so printing their value without a cast can give unexpected results.
MFC after: 1 week
illumos/illumos-gate@f9eb9fdf19https://github.com/illumos/illumos-gate/commit/f9eb9fdf196b6ed476e4ffc69cecd8b0d
a3cb7e7
https://www.illumos.org/issues/6451
Sometimes ztest fails because zdb detects checksum errors. e.g.:
Traversing all blocks to verify checksums and verify nothing leaked ...
zdb_blkptr_cb: Got error 50 reading <71, 47, 0, 8000160> DVA0=<0:1cc2000:
180000> [L0 other uint64[]] sha256 uncompressed LE contiguou
s unique single size=100000L/100000P birth=271L/271P fill=1
cksum=c5a3e27d1ed0f894:843bca3a5473c4bf:f76a19b6830a2e4:91292591613a12bf --
skipping
zdb_blkptr_cb: Got error 50 reading <71, 47, 0, 800000180> DVA0=<0:ce16800:
180000> [L0 other uint64[]] sha256 uncompressed LE contigu
ous unique single size=100000L/100000P birth=840L/840P fill=1
cksum=5d018f3d061e17f3:6d1584784587bf63:2805a74a0ce37369:ba68a214806c7e75
-- skipping
zdb_blkptr_cb: Got error 50 reading <71, 47, 0, 1000000360> DVA0=<0:10d37400:
180000> [L0 other uint64[]] sha256 uncompressed LE conti
guous unique single size=100000L/100000P birth=904L/904P fill=1
cksum=fa1e11d4138bd14b:86c9488c444473e3:f31e43c72e72e46b:e3446472d1174d
ba -- skipping
zdb_blkptr_cb: Got error 50 reading <71, 47, 0, 400000002c0> DVA0=<0:127ef400:
180000> [L0 other uint64[]] sha256 uncompressed LE cont
iguous dedup single size=100000L/100000P birth=549L/549P fill=1
cksum=30e14955ebf13522:66dc2ff8067e6810:4607e750abb9d3b3:6582b8af909fcb
58 -- skipping
zdb_blkptr_cb: Got error 50 reading <657, 5, 0, 1c0> DVA0=<0:1a180400:180000>
[L0 other uint64[]] fletcher4 uncompressed LE contiguou
s unique single size=100000L/100000P birth=1091L/1091P fill=1 cksum=a6cf1e50:
29b3bd01c57e5:36779b914035db9a:db61cdcf6bec56f0 -- skippin
g
The problem is that ztest_fault_inject() can inject multiple faults into the
same block. It is designed such that it can inject errors on all leafs of a
RAID-Z or mirror, but for a given range of offsets, it will only inject errors
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Jorgen Lundman <lundman@lundman.net>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Matthew Ahrens <mahrens@delphix.com>
7147 ztest: ztest_ddt_repair fails with ztest_pattern_match assertion
illumos/illumos-gate@aab8072633https://github.com/illumos/illumos-gate/commit/aab80726335c76a7cae32c7300890248d
73a51e3
https://www.illumos.org/issues/7147
Here's the dbuf we're currently reading:
966f200::dbuf
addr object lvl blkid holds os
966f200 4 0 0 1 ztest/ds_3
966f200::print dmu_buf_t db_data
db_data = 0x9ae0400
0x9ae0400/10J
0x9ae0400: c1c7ced932020d c1c7ced932020d c1c7ced932020d c1c7ced932020d
c1c7ced932020d c1c7ced932020d c1c7ced932020d c1c7ced932020d
c1c7ced932020d c1c7ced932020d
The pattern we're expecting is actually this: a34ae10b5f2db2. If we attempt to
read the block on disk we find that it has matches what ztest_ddt_repair()
would have written:
~c1c7ced932020d=J
ff3e383126cdfdf2
966f200::print dmu_buf_impl_t db_blkptr | ::blkptr
DVA0=<0:71d3c00:800>
[L0 UINT64_OTHER] SHA256 OFF LE contiguous dedup single
size=400L/400P birth=55L/55P fill=1
cksum=18486450d3ce8c6d:75a72f4bbf117b0f:2d3a226314eb5650:2eb0fd68648b1af0
1. zdb -U /rpool/tmp/zpool.cache -R ztest 0:71d3c00:800 | head
Found vdev type: mirror
0:71d3c00:800
0 1 2 3 4 5 6 7 8 9 a b c d e f 0123456789abcdef
000000: ff3e383126cdfdf2 ff3e383126cdfdf2 ...&18>....&18>.
000010: ff3e383126cdfdf2 ff3e383126cdfdf2 ...&18>....&18>.
000020: ff3e383126cdfdf2 ff3e383126cdfdf2 ...&18>....&18>.
000030: ff3e383126cdfdf2 ff3e383126cdfdf2 ...&18>....&18>.
000040: ff3e383126cdfdf2 ff3e383126cdfdf2 ...&18>....&18>.
000050: ff3e383126cdfdf2 ff3e383126cdfdf2 ...&18>....&18>.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: George Wilson <george.wilson@delphix.com>
illumos/illumos-gate@dcbf3bd6a1dcbf3bd6a1https://www.illumos.org/issues/6950
When reading compressed data from disk, the ARC should keep the compressed
block cached and only decompress it when consumers access the block. The
uncompressed data should be short-lived allowing the ARC to cache a much larger
amount of data. The DMU would also maintain a smaller cache of uncompressed
blocks to minimize the impact of decompressing frequently accessed blocks.
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Don Brady <don.brady@intel.com>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: George Wilson <george.wilson@delphix.com>
illumos/illumos-gate@759e89be35https://github.com/illumos/illumos-gate/commit/759e89be359f2af635e4122d147df56bc
e948773
https://www.illumos.org/issues/6447
I got a patch from someone who uses nvpair code outside of illumos. It fixes a
couple of gcc warnings/bugs for him.
1. silence uninitialized use warnings
2. add parentheses around assignment used as truth value
3. fix printf format specifier (ll is for integers only)
4. strstr, strspn, strcspn, and strcmp are declared in string.h, not
strings.h.
5. avoid scanning integer into boolean variable
Reviewed by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net>
Reviewed by: Andy Stormont <astormont@racktopsystems.com>
Reviewed by: Garrett D'Amore <garrett@damore.org>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Steve Dougherty <sdougherty@barracuda.com>
illumos/illumos-gate@9adfa60d48https://github.com/illumos/illumos-gate/commit/9adfa60d484ce2435f5af77cc99dcd4e6
92b6660
https://www.illumos.org/issues/6314
Callers of dsl_dataset_name pass a buffer of size ZFS_MAXNAMELEN, but
dsl_dataset_name copies the datasets' name PLUS the snapshot name to it,
resulting in a max of 2 * ZFS_MAXNAMELEN + '@'.
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Matthew Ahrens <mahrens@delphix.com>
illumos/illumos-gate@8808ac5daehttps://github.com/illumos/illumos-gate/commit/8808ac5dae118369991f158b6ab736cb2
691ecde
https://www.illumos.org/issues/4521
zfstest is trying to execute evil "zfs unmount -a", which fails (fortunately,
as it would otherwise leave me with my ~ missing):
03:44:11.86 cannot unmount '/export/home/yuri': Device busy cannot unmount '/
export/home': Device busy
03:44:11.86 ERROR: /usr/sbin/zfs unmount -a exited 1
This affects, at least, zfs_mount_009_neg and zfs_mount_all_001_pos, both
failing on that step. The pool containing the /export/home hierarchy is
included in KEEP variable, but it doesn't seem to affect anything here.
Reviewed by: Andriy Gapon <avg@FreeBSD.org>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Yuri Pankov <yuri.pankov@nexenta.com>
6879 incorrect endianness swap for drr_spill.drr_length in libzfs_sendrecv.c
illumos/illumos-gate@20fea7a474https://github.com/illumos/illumos-gate/commit/20fea7a47472aceb64d3ed48cc2a3ea26
8bc4795
https://www.illumos.org/issues/6879
In libzfs_sendrecv, there's a typo:
case DRR_SPILL:
if (byteswap) {
drr->drr_u.drr_write.drr_length =
BSWAP_64(drr->drr_u.drr_spill.drr_length);
}
Instead of drr_write.drr_length, we should be assigning the result of the
byteswap to drr_spill.drr_length.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Dan Kimmel <dan.kimmel@delphix.com>
illumos/illumos-gate@4a20c933b1https://github.com/illumos/illumos-gate/commit/4a20c933b148de8a1c1d3538391c64284
e636653
https://www.illumos.org/issues/6111
If you create a zfs child folder, zfs send returns an error when a recursive
incremental send is done between two snapshots made prior to the folder
creation.
The problem can be reproduced with the following steps.
root@zfs:/# zfs create pool/test
root@zfs:/# zfs snapshot pool/test@snap1
root@zfs:/# zfs snapshot pool/test@snap2
root@zfs:/# zfs create pool/test/child
root@zfs:/# zfs send -R -I pool/test@snap1 pool/test@snap2 > /dev/null
WARNING: could not send pool/test/child@snap2: does not exist
WARNING: could not send pool/test/child@snap2: does not exist
root@zfs:/# echo $?
1
root@zfs:/# zfs snapshot -r pool/test@snap3
root@zfs:/# zfs send -R -I pool/test@snap1 pool/test@snap3 > /dev/null
root@zfs:/# echo $?
0
root@zfs:/# zfs send -R -I pool/test@snap2 pool/test@snap3 > /dev/null
root@zfs:/# echo $?
0
Since pool/test/child was created after snap2, zfs send should not expect snap2
to be in pool/test/child when doing a recursive send. It should examine the
compare the creation time of the snapshot and each child folder to decide if
the folder will be sent. The next incremental send between snap2 and snap3
would properly create the child folder and snap3 which first appears in the
child folder.
The problem is identical if '-i' is used instead of '-I'.
Reviewed by: Alex Aizman alex.aizman@nexenta.com
Reviewed by: Alek Pinchuk alek.pinchuk@nexenta.com
Reviewed by: Roman Strashkin roman.strashkin@nexenta.com
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Approved by: Garrett D'Amore <garrett@damore.org>
Author: Alex Deiter <alex.deiter@nexenta.com>
6902 speed up listing of snapshots if requesting name only and sorting by name
This was our change from the beginning, so just reduce the upstream diff.
6876 Stack corruption after importing a pool with a too-long name
illumos/illumos-gate@c971037baac971037baahttps://www.illumos.org/issues/6876
Calling dsl_dataset_name on a dataset with a 256 byte buffer is asking for
trouble. We should check every dataset on import, using a 1024 byte buffer and
checking each time to see if the dataset's new name is longer than 256 bytes.
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Paul Dagnelie <pcd@delphix.com>
Previously, ctf_member_info() would ignore members belonging to an
anonymous struct or union. This made it impossible to, for example, trace
the m_next field of an mbuf using DTrace.
Reported and tested by: gallatin
MFC after: 2 weeks
- TESTSBASE and LOCALBASE are already defined in bsd.tests.mk
- TESTSDIR is automatically divined as ${TESTSBASE}${RELDIR:H} after
r289158.
- Replace SRCDIR with SRCTOP
MFC after: 1 week
X-MFC with: r305019
Sponsored by: EMC / Isilon Storage Division
* TESTSDIR is supposed to be under cddl/usr.sbin, not cddl/sbin
* DevdCtl::EventBuffer no longer exists, so remove its forward
declaration
MFC after: 3 days
X-MFC-With: r305013
There are two cases where changing canmount should result in an action:
- canmount is set to off for a mounted filesystem
- canmount is set to on for an unmounted filesystem
Before r297521 we could unmount and re-mount a filesystem when that was
not necessary, but after r297521 we only handled the first of the above
cases.
MFC after: 5 days
Have it print the contents of aggregations, if any. Otherwise, one needs to
kill the running script to view the collected data, or add code to
periodically print it.
Discussed with: gnn
MFC after: 1 month
7085 add support for "if" and "else" statements in dtrace
illumos/illumos-gate@c3bd3abd88
Add syntactic sugar to dtrace: "if" and "else" statements. The sugar is
baked down to standard dtrace features by adding additional clauses with
the appropriate predicates.
Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Bryan Cantrill <bryan@joyent.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Matthew Ahrens <mahrens@delphix.com>
MFC after: 2 weeks
Relnotes: yes
Both test suites require more memory than my amd64 VM using
GENERIC-NODEBUG can provide and reliably panic it with OOM issues in
dtrace(4).
Some of the testcases fail, but this at least bypasses the panic behavior
on platforms that don't have enough resources
MFC after: 2 weeks
Discussed with: markj
Sponsored by: EMC / Isilon Storage Division
This resolves a -Wformat issue when the value is used as a format width
precision specifier, i.e. %*s
MFC after: 1 month
Reported by: clang
Sponsored by: EMC / Isilon Storage Division
Note: conversion of the manual page change from roff to mdoc is mine.
illumos/illumos-gate@b702644a6eb702644a6ehttps://www.illumos.org/issues/7164
If the pool/dataset command-line argument is specified with a trailing
slash, for example, "tank/", we should interpret it as the topmost
dataset (rather than the whole pool)
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Tim Chase <tim@chase2k.com>
PR: 204661
MFC after: 1 week
Relnotes: yes
illumos/illumos-gate@ae24175b2bae24175b2bhttps://www.illumos.org/issues/6391
When using zdb with non-default SPA config file it is not convenient
to add -U <non-default-config-file-path> all the time. This commit
introduces support for setting/overriding SPA config location via
environment variable 'SPA_CONFIG_PATH'.
If -U flag is specified in the command line it will override any other
value as usual.
64d7b6cf75
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Richard Yao <ryao@gentoo.org>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Will Andrews <will@freebsd.org>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Cyril Plisko <cyril.plisko@mountall.com>
MFC after: 1 week
setting a 32 bit value on each socket. This can be used by applications
and DTrace as a rendezvous point so that an applicaton's data can
more easily be captured at run time. Expose the user cookie via
DTrace by updating the translator in tcp.d and add a quick test
program, a TCP server, that sets the cookie on each connection
accepted.
Reviewed by: hiren
MFC after: 1 week
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D7152
cddl/lib/libavl/Makefile
cddl/lib/libctf/Makefile
cddl/lib/libnvpair/Makefile
cddl/lib/libumem/Makefile
cddl/lib/libuutil/Makefile
Increase WARNS to the highest working level for each of these
libraries
Approved by: re (gjb, hrs)
MFC after: 4 weeks
Sponsored by: Spectra Logic Corp
lib/libdevdctl/consumer.cc
In Consumer::DisconnectFromDevd, don't close the socket if it's
already closed.
cddl/usr.sbin/zfsd/case_file.cc
lib/libdevdctl/consumer.h
Delete dead code leftover from before devd(8) gained SOCK_SEQPACKET
support
Reported by: Coverity
CID: 1356155, 1356169
Sponsored by: Spectra Logic Corp
Add zfsd, which deals with hard drive faults in ZFS pools. It manages
hotspares and replements in drive slots that publish physical paths.
cddl/usr.sbin/zfsd
Add zfsd(8) and its unit tests
cddl/usr.sbin/Makefile
Add zfsd to the build
lib/libdevdctl
A C++ library that helps devd clients process events
lib/Makefile
share/mk/bsd.libnames.mk
share/mk/src.libnames.mk
Add libdevdctl to the build. It's a private library, unusable by
out-of-tree software.
etc/defaults/rc.conf
By default, set zfsd_enable to NO
etc/mtree/BSD.include.dist
Add a directory for libdevdctl's include files
etc/mtree/BSD.tests.dist
Add a directory for zfsd's unit tests
etc/mtree/BSD.var.dist
Add /var/db/zfsd/cases, where zfsd stores case files while it's shut
down.
etc/rc.d/Makefile
etc/rc.d/zfsd
Add zfsd's rc script
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c
Fix the resource.fs.zfs.statechange message. It had a number of
problems:
It was only being emitted on a transition to the HEALTHY state.
That made it impossible for zfsd to take actions based on drives
getting sicker.
It compared the new state to vdev_prevstate, which is the state that
the vdev had the last time it was opened. That doesn't make sense,
because a vdev can change state multiple times without being
reopened.
vdev_set_state contains logic that will change the device's new
state based on various conditions. However, the statechange event
was being posted _before_ that logic took effect. Now it's being
posted after.
Submitted by: gibbs, asomers, mav, allanjude
Reviewed by: mav, delphij
Relnotes: yes
Sponsored by: Spectra Logic Corp, iX Systems
Differential Revision: https://reviews.freebsd.org/D6564
The DTraceToolkit is part of the Open DTrace effort and is supported
on FreeBSD as a port (sysutils/DTraceToolkit) which has been updated
to properly track toolkit development upstream.
Sponsored by: DARPA, AFRL
When getline(3) in 2009 was added a _WITH_GETLINE guard has also been added.
This rename is made in preparation for the removal of this guard
Obtained from: NetBSD
after r298107
Summary of changes:
- Replace all instances of FILES/TESTS with ${PACKAGE}FILES. This ensures that
namespacing is kept with FILES appropriately, and that this shouldn't need
to be repeated if the namespace changes -- only the definition of PACKAGE
needs to be changed
- Allow PACKAGE to be overridden by callers instead of forcing it to always be
`tests`. In the event we get to the point where things can be split up
enough in the base system, it would make more sense to group the tests
with the blocks they're a part of, e.g. byacc with byacc-tests, etc
- Remove PACKAGE definitions where possible, i.e. where FILES wasn't used
previously.
- Remove unnecessary TESTSPACKAGE definitions; this has been elided into
bsd.tests.mk
- Remove unnecessary BINDIRs used previously with ${PACKAGE}FILES;
${PACKAGE}FILESDIR is now automatically defined in bsd.test.mk.
- Fix installation of files under data/ subdirectories in lib/libc/tests/hash
and lib/libc/tests/net/getaddrinfo
- Remove unnecessary .include <bsd.own.mk>s (some opportunistic cleanup)
Document the proposed changes in share/examples/tests/tests/... via examples
so it's clear that ${PACKAGES}FILES is the suggested way forward in terms of
replacing FILES. share/mk/bsd.README didn't seem like the appropriate method
of communicating that info.
MFC after: never probably
X-MFC with: r298107
PR: 209114
Relnotes: yes
Tested with: buildworld, installworld, checkworld; buildworld, packageworld
Sponsored by: EMC / Isilon Storage Division
Fix a related typo while here.
Note, this change results in the Kyuafile inclusion in the runtime
package, which needs to be fixed, however addresses the PR as far
as I can tell in my tests.
PR: 209114
Submitted by: ngie
Sponsored by: The FreeBSD Foundation
illumos/illumos-gate@26455f9efc26455f9efchttps://www.illumos.org/issues/6052
At the moment type parameter of lzc_create() is of dmu_objset_type_t type.
That exposes an implementation detail and requires sys/fs/zfs.h to be included
in libzfs_core.h creating unnecessary coupling between libzfs_core interface
and ZFS internals.
I think that dmu_objset_type_t should be replaced with a libzfs_core
enumeration of supported dataset types.
For ABI reasons the new enumeration could be bit-compatible with
dmu_objset_type_t.
For example:
typedef enum {
LZC_DST_ZFS = 2,
LZC_DST_ZVOL
} lzc_dataset_type_t;
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Andriy Gapon <andriy.gapon@clusterhq.com>
MFC after: 2 weeks
Sponsored by: ClusterHQ
This allows one to enable DTrace probes relatively early during boot,
during SI_SUB_DTRACE_ANON, before dtrace(1) can invoked. The desired
enabling is created using dtrace -A, which writes a /boot/dtrace.dof
file and uses nextboot(8) to ensure that DTrace kernel modules are loaded
and that the DOF file describing the enabling is loaded by loader(8)
during the subsequent boot. The trace output can then be fetched with
dtrace -a.
With this commit, boot-time DTrace is only functional on i386 and amd64: on
other architectures, the high-resolution timer frequency is initialized
during SI_SUB_CLOCKS and is thus not available when the anonymous
tracing state is initialized. On x86, the TSC is used and is thus available
earlier.
MFC after: 1 month
Relnotes: yes
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Author: Will Andrews <will@firepipe.net>
Closes#83Closes#32openzfs/openzfs@9663688425
FreeBSD already had `zpool labelclear` functionality, so this is mostly
just a diff reduction.
MFC after: 1 month
Previously this operation tried to unmount and remount children.
Also see https://www.illumos.org/issues/6428.
MFC after: 2 weeks
X-Needs-Upstreaming: illumos
When force-receiving a filesystem that was already mounted the re-created
filesystem is mounted despite -u flag.
Also see https://www.illumos.org/issues/6412.
PR: 204705
Tested by: Vladimir Krstulja <vlad-fbsd@acheronmedia.com>
MFC after: 2 weeks
X-Needs-Upstreaming: illumos
6739 userland version of cv_timedwait_hires() always assumes absolute time
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Reviewed by: Robert Mustacchi <rm@joyent.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: George Wilson <george.wilson@delphix.com>
illumos/illumos-gate@41c6413cb5
calloc(3) is faster and occasionally safer than malloc(3) + bzero(3).
In one case, pointed out by Mark[1], this also cleans up a calculation.
Reviewed by: markj [1]
MFC after: 1 week