The profile for send(2)/recv(2) observation has been refactored to
eliminate alloca() in favor of translations available in HEAD.
Sponsored by: Smule, Inc.
Knowing the value of execname during these probes is of some value even
if it is commonly the interrupt kernel thread (intr[12]) -- quite often
it is not, but that depends on the probe.
Sponsored by: Smule, Inc.
A module that wishes to post-process the output needs to know which
arguments were passed in order to re-execute a child in a pipe-chain.
Further, the expansion of ARGV needs to be such that items are escaped
properly.
Sponsored by: Smule, Inc.
If you were to pass an invalid option after `-B num' or `-K num' you
would see that the usage statement would show the value you passed
instead of the actual default.
Moving the default values to separate variables that are unaffected
by the options-parsing allows the usage statement to correctly show
the hard-coded default values if no flags are used.
Sponsored by: Smule, Inc.
A return statement should have been an exit in list_profiles().
If the user passed `-Q' to list profiles and asked for one-line
per profile (`-1'), list_profiles() would not exit as should.
Sponsored by: Smule, Inc.
https://github.com/illumos/illumos-gate/edit/master/usr/src/lib/libzfs/common/libzfs_sendrecv.c
Patch our version of it to quiesce warnings until someone decides to sync
up our code:
libzfs_sendrecv.c:2555:30: warning: format specifies type 'unsigned long'
but the argument has type 'uint64_t' (aka 'unsigned long long') [-Wformat]
sprintf(guidname, "%lu", thisguid);
~~~ ^~~~~~~~
%llu
libzfs_sendrecv.c:2612:29: warning: format specifies type 'unsigned long'
but the argument has type 'uint64_t' (aka 'unsigned long long') [-Wformat]
sprintf(guidname, "%lu", parent_fromsnap_guid);
~~~ ^~~~~~~~~~~~~~~~~~~~
%llu
libzfs_sendrecv.c:2645:29: warning: format specifies type 'unsigned long'
but the argument has type 'uint64_t' (aka 'unsigned long long') [-Wformat]
sprintf(guidname, "%lu", parent_fromsnap_guid);
~~~ ^~~~~~~~~~~~~~~~~~~~
%llu
Reviewed by: allanjude
Differential Revision: https://reviews.freebsd.org/D15325
The default provides output in the following form:
```
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP
HEALTH ALTROOT
```
this corrects the man page.
Also submitted upstream as
https://github.com/openzfs/openzfs/pull/632/files (with slightly
different changes needed)
This patch adds a new socket option, SO_REUSEPORT_LB, which allow multiple
programs or threads to bind to the same port and incoming connections will be
load balanced using a hash function.
Most of the code was copied from a similar patch for DragonflyBSD.
However, in DragonflyBSD, load balancing is a global on/off setting and can not
be set per socket. This patch allows for simultaneous use of both the current
SO_REUSEPORT and the new SO_REUSEPORT_LB options on the same system.
Required changes to structures
Globally change so_options from 16 to 32 bit value to allow for more options.
Add hashtable in pcbinfo to hold all SO_REUSEPORT_LB sockets.
Limitations
As DragonflyBSD, a load balance group is limited to 256 pcbs
(256 programs or threads sharing the same socket).
Submitted by: Johannes Lundberg <johanlun0@gmail.com>
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D11003
In recently added sendrecv profile, there was a line purposefully
added to introduce a compilation error in which `-dev' is used to
debug the entry. Removing the entry.
Sponsored by: Smule, Inc.
The options `-d' (debug), `-e' (exit after compile), and `-v' (verbose)
when combined in any order (though best remembered as `-dev') will run
the conflated script through dtrace(1), test for error conditions, and
show the line that dtrace(1) failed at (with context).
If no errors are found, the output is the same as `-e[v]'.
When writing a new profile for dwatch(1), you can quickly test to
make sure it compiles by running `dwatch -devX profile_name' where
profiles live in /usr/libexec/dwatch or /usr/local/libexec/dwatch
(the latter being where profiles installed via ports should go).
Sponsored by: Smule, Inc.
The change adds -t <name> option to zpool create and -t option to zpool
import in its form with an old name and a new name. This allows to
import (or create) a pool under a name that's different from its real,
permanent name without affecting that name. This is useful when working
with VM images or images of other physical systems if they happen to
have a ZFS pool with the same name as the host system.
The changes come from ZoL with some small tweaks.
The porting has been done by julian.
The change is being submitted to OpenZFS:
https://github.com/openzfs/openzfs/pull/600
Submitted by: julian
Reviewed by: smh
MFC after: 2 weeks
Sponsored by: Panzura (porting)
Differential Revision: https://reviews.freebsd.org/D14972
9280 Assertion failure while running removal_with_ganging test with 4K devices
illumos/illumos-gate@243952c7ee
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Approved by: Garrett D'Amore <garrett@damore.org>
Author: Matt Ahrens <Matt.Ahrens@delphix.com>
9235 rename zpool_rewind_policy_t to zpool_load_policy_t
illumos/illumos-gate@5dafeea3eb
We want to be able to pass various settings during import/open of a pool,
which are not only related to rewind. Instead of adding a new policy and
duplicate a bunch of code, we should just rename rewind_policy to a more
generic term like load_policy.
For instance, we'd like to set spa->spa_import_flags from the nvlist,
rather from a flags parameter passed to spa_import as in some cases we want
those flags not only for the import case, but also for the open case. One
such flag could be ZFS_IMPORT_MISSING_LOG (as used in zdb) which would
allow zfs to open a pool when logs are missing.
Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Pavel Zakharov <pavel.zakharov@delphix.com>
illumos/illumos-gate@8671400134
The idea of Storage Pool Checkpoint (aka zpool checkpoint) deals with
exactly that. It can be thought of as a “pool-wide snapshot” (or a
variation of extreme rewind that doesn’t corrupt your data). It remembers
the entire state of the pool at the point that it was taken and the user
can revert back to it later or discard it. Its generic use case is an
administrator that is about to perform a set of destructive actions to ZFS
as part of a critical procedure. She takes a checkpoint of the pool before
performing the actions, then rewinds back to it if one of them fails or puts
the pool into an unexpected state. Otherwise, she discards it. With the
assumption that no one else is making modifications to ZFS, she basically
wraps all these actions into a “high-level transaction”.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
8809 libzpool should leverage work done in libfakekernel
illumos/illumos-gate@f06dce2c1f
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Gordon Ross <gordon.w.ross@gmail.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Andrew Stormont <astormont@racktopsystems.com>
We do not have libfakekernel, but need to reduce code divergence.
Rather than zeroing and reading into the a smaller union member the full
union size, just zero and read directly into the union.
No functional change intended.
Reported by: Coverity
Sponsored by: Dell EMC Isilon
illumos/illumos-gate@edc8ef7d92
Reviewed by: C Fraire <cfraire@me.com>
Reviewed by: Andy Fiddaman <omnios@citrus-it.co.uk>
Approved by: Joshua M. Clulow <josh@sysmgr.org>
Author: Toomas Soome <tsoome@me.com>
In pursuit of improving performance on multi-core systems, we should
implements fanned out counters and use them to improve the performance of
some of the arc statistics. These stats are updated extremely frequently,
and can consume a significant amount of CPU time.
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Paul Dagnelie <pcd@delphix.com>
Like r331073, eliminate a UB by fully initializing the struct with a designated
initializer. Note that the similar src_dtt is not fully used, so a similar
treatment was not absolutely required. I chose to leave it alone. It
wouldn't hurt to do the same thing, though.
Reported by: Coverity
Sponsored by: Dell EMC Isilon
r323791 changed the return value of zpool_read_label. Error paths that
previously returned 0 began to return -1 instead. However, not all error
paths initialized errno. When adding vdevs to a very large pool, errno could
be prepopulated with ENOMEM, causing the operation to fail. Fix the bug by
setting errno=ENOENT in the case that no ZFS label is found.
PR: 226096
Submitted by: Nikita Kozlov
Reviewed by: avg
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D13088
+ Add missing signals SIGTHR (32) and SIGLIBRT (33)
+ Add inline for converting SIG* int to string
+ Add inline for converting CLD_* int to string
Reviewed by: markj
Sponsored by: Smule, Inc.
Differential Revision: https://reviews.freebsd.org/D14497
It gets called by dmu_buf_init_user, which is inline but not static. So it
needs global linkage itself.
Reported by: GCC-6
MFC after: 17 days
X-MFC-With: 329722
+ Add dev_type do devinfo_t
+ Add b_cmd to bufinfo_t
+ Add constants for BIO_* and DEVSTAT_TYPE_*
+ Add inline for converting BIO_* int to string
+ Add inline for converting DEVSTAT_TYPE_* int to string
+ Add mask for dev_type & DEVSTAT_TYPE_MASK to string
+ Add mask for dev_type & DEVSTAT_TYPE_IF_MASK to string
Reviewed by: markj
Sponsored by: Smule, Inc.
Differential Revision: https://reviews.freebsd.org/D14396
8940 Sending an intra-pool resumable send stream may result in EXDEV
illumos/illumos-gate@544132fce3
"zfs send -t <token>" for an incremental send should be able to resume
successfully when sending to the same pool: a subtle issue in
zfs_iter_children() doesn't currently allow this.
Because resuming from a token requires "guid" -> "dataset" mapping
(guid_to_name()), we have to walk the whole hierarchy to find the right
snapshots to send.
When resuming an incremental send both source and destination live in the
same pool and have the same guid: this is where zfs_iter_children() gets
confused and picks up the wrong snapshot, so we end up trying to send an
incremental "destination@snap1 -> source@snap2" stream instead of
"source@snap1 -> source@snap2": this fails with an "Invalid cross-device
link" (EXDEV) error.
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Hans Rosenfeld <rosenfeld@grumpf.hope-2000.org>
Author: loli10K <ezomori.nozomu@gmail.com>
9075 Improve ZFS pool import/load process and corrupted pool recovery
illumos/illumos-gate@6f7938128a
Some work has been done lately to improve the debugability of the ZFS pool
load (and import) process. This includes:
https://www.illumos.org/issues/7638: Refactor spa_load_impl into several functions
https://www.illumos.org/issues/8961: SPA load/import should tell us why it failed
https://www.illumos.org/issues/7277: zdb should be able to print zfs_dbgmsg's
To iterate on top of that, there's a few changes that were made to make the
import process more resilient and crash free. One of the first tasks during the
pool load process is to parse a config provided from userland that describes
what devices the pool is composed of. A vdev tree is generated from that config,
and then all the vdevs are opened.
The Meta Object Set (MOS) of the pool is accessed, and several metadata objects
that are necessary to load the pool are read. The exact configuration of the
pool is also stored inside the MOS. Since the configuration provided from
userland is external and might not accurately describe the vdev tree
of the pool at the txg that is being loaded, it cannot be relied upon to safely
operate the pool. For that reason, the configuration in the MOS is read early
on. In the past, the two configurations were compared together and if there was
a mismatch then the load process was aborted and an error was returned.
The latter was a good way to ensure a pool does not get corrupted, however it
made the pool load process needlessly fragile in cases where the vdev
configuration changed or the userland configuration was outdated. Since the MOS
is stored in 3 copies, the configuration provided by userland doesn't have to be
perfect in order to read its contents. Hence, a new approach has been adopted:
The pool is first opened with the untrusted userland configuration just so that
the real configuration can be read from the MOS. The trusted MOS configuration
is then used to generate a new vdev tree and the pool is re-opened.
When the pool is opened with an untrusted configuration, writes are disabled
to avoid accidentally damaging it. During reads, some sanity checks are
performed on block pointers to see if each DVA points to a known vdev;
when the configuration is untrusted, instead of panicking the system if those
checks fail we simply avoid issuing reads to the invalid DVAs.
This new two-step pool load process now allows rewinding pools accross
vdev tree changes such as device replacement, addition, etc. Loading a pool
from an external config file in a clustering environment also becomes much
safer now since the pool will import even if the config is outdated and didn't,
for instance, register a recent device addition.
With this code in place, it became relatively easy to implement a
long-sought-after feature: the ability to import a pool with missing top level
(i.e. non-redundant) devices. Note that since this almost guarantees some loss
Of data, this feature is for now restricted to a read-only import.
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
Approved by: Hans Rosenfeld <rosenfeld@grumpf.hope-2000.org>
Author: Pavel Zakharov <pavel.zakharov@delphix.com>
illumos/illumos-gate@add927f8c8
Reported on the ZFSonLinux https://github.com/zfsonlinux/zfs/issues/4843,
fixed by https://github.com/zfsonlinux/zfs/pull/6339:
If we are in the middle of an incremental zfs receive, the child .../%recv
will exist. If you concurrently run zfs promote .../%recv, it will "work",
but then zfs gets confused. For example, there's no obvious way to destroy
the containing filesystem (because it is now a clone of its invisible child).
Attempting to do this promote should be an error. We could fix this by
having zfs_ioc_promote() check if zc_name contains a %, similar to
zfs_ioc_rename().
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: loli10K <ezomori.nozomu@gmail.com>
illumos/illumos-gate@e144c4e6c9
Currently `zdb` consistently fails to examine non-idle pools as it fails
during the `spa_load()` process. The main problem seems to be that
`spa_load_verify()` fails as can be seen below:
$ sudo zdb -d -G dcenter
zdb: can't open 'dcenter': I/O error
ZFS_DBGMSG(zdb):
spa_open_common: opening dcenter
spa_load(dcenter): LOADING
disk vdev '/dev/dsk/c4t11d0s0': best uberblock found for spa dcenter. txg 40824950
spa_load(dcenter): using uberblock with txg=40824950
spa_load(dcenter): UNLOADING
spa_load(dcenter): RELOADING
spa_load(dcenter): LOADING
disk vdev '/dev/dsk/c3t10d0s0': best uberblock found for spa dcenter. txg 40824952
spa_load(dcenter): using uberblock with txg=40824952
spa_load(dcenter): FAILED: spa_load_verify failed [error=5]
spa_load(dcenter): UNLOADING
This change makes `spa_load_verify()` a dryrun when ran from `zdb`. This is
done by creating a global flag in zfs and then setting it in `zdb`.
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Andy Stormont <astormont@racktopsystems.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Pavel Zakharov <pavel.zakharov@delphix.com>
illumos/illumos-gate@36a64e6284
To prevent kmem_cache reaping from blocking other system resources, turn
kmem_cache_reap_now() (which blocks) into kmem_cache_reap_soon(). Callers
to kmem_cache_reap_soon() should use kmem_cache_reap_active(), which
exploits #9017's new taskq_empty().
Reviewed by: Bryan Cantrill <bryan@joyent.com>
Reviewed by: Dan McDonald <danmcd@joyent.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Yuri Pankov <yuripv@yuripv.net>
Author: Tim Kordas <tim.kordas@joyent.com>
FreeBSD does not use taskqueue for kmem caches reaping, so this change
is less dramatic then it is on Illumos, just limiting reaping to 1 time
per second. It may possibly be improved later, if needed.
illumos/illumos-gate@5cabbc6b49https://www.illumos.org/issues/7614:
This project allows top-level vdevs to be removed from the storage pool with
“zpool remove”, reducing the total amount of storage in the pool. This
operation copies all allocated regions of the device to be removed onto other
devices, recording the mapping from old to new location. After the removal is
complete, read and free operations to the removed (now “indirect”) vdev must
be remapped and performed at the new location on disk. The indirect mapping
table is kept in memory whenever the pool is loaded, so there is minimal
performance overhead when doing operations on the indirect vdev.
The size of the in-memory mapping table will be reduced when its entries
become “obsolete” because they are no longer used by any block pointers in
the pool. An entry becomes obsolete when all the blocks that use it are
freed. An entry can also become obsolete when all the snapshots that
reference it are deleted, and the block pointers that reference it have been
“remapped” in all filesystems/zvols (and clones). Whenever an indirect block
is written, all the block pointers in it will be “remapped” to their new
(concrete) locations if possible. This process can be accelerated by using
the “zfs remap” command to proactively rewrite all indirect blocks that
reference indirect (removed) vdevs.
Note that when a device is removed, we do not verify the checksum of the data
that is copied. This makes the process much faster, but if it were used on
redundant vdevs (i.e. mirror or raidz vdevs), it would be possible to copy
the wrong data, when we have the correct data on e.g. the other side of the
mirror. Therefore, mirror and raidz devices can not be removed.
Reviewed by: Alex Reece <alex@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Richard Laager <rlaager@wiktel.com>
Reviewed by: Tim Chase <tim@chase2k.com>
Approved by: Garrett D'Amore <garrett@damore.org>
Author: Prashanth Sreenivasa <pks@delphix.com>
illumos/illumos-gate@95643f75d295643f75d2https://www.illumos.org/issues/8520
lzc_rollback_to() should support rolling back to a clone's origin.
The current checks in zfs_ioc_rollback() would not allow that because the
origin snapshot belongs to a different filesystem.
The overly restrictive check was introduced in 7600, but it was not a
regression as none of the existing tools provided a way to rollback to the
origin.
https://www.illumos.org/issues/7198
EINVAL is returned when a dataset does not have any snapshots, so there is
nothing to roll back to.
Although the code in zfs_do_rollback checks for that condition in advance, it's
still possible that the snapshot(s) gets removed after the check and before the
rollback sync task is executed.
At the moment zfs command would crash when that happens.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Andriy Gapon <avg@FreeBSD.org>
MFC after: 2 weeks
illumos/illumos-gate@7855d95b307855d95b30https://www.illumos.org/issues/7446
Since we support whole-disk configuration for boot pool, we also will need
whole disk support with UEFI boot and for this, zpool create should create efi-
system partition.
I have borrowed the idea from oracle solaris, and introducing zpool create -
B switch to provide an way to specify that boot partition should be created.
However, there is still an question, how big should the system partition be.
For time being, I have set default size 256MB (thats minimum size for FAT32
with 4k blocks). To support custom size, the set on creation "bootsize"
property is created and so the custom size can be set as: zpool create B -
o bootsize=34MB rpool c0t0d0
After pool is created, the "bootsize" property is read only. When -B switch is
not used, the bootsize defaults to 0 and is shown in zpool get output with
value ''. Older zfs/zpool implementations are ignoring this property.
https://www.illumos.org/rb/r/219/
Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
Reviewed by: Yuri Pankov <yuri.pankov@gmail.com>
Approved by: Dan McDonald <danmcd@kebe.com>
Author: Toomas Soome <tsoome@me.com>
This commit makes no sense for FreeBSD, that is why I blocked the option,
but it should be good to stay closer to upstream.