Commit Graph

5820 Commits

Author SHA1 Message Date
Ryan Moeller
80d98a8f3a
ZTS: Use default_cleanup_noexit where needed
And add log_pass appropriately.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10136
2020-03-17 09:55:18 -07:00
Ryan Moeller
e0d3284bc9
Exit status 256+signum is actually baked in to ksh
While #10121 did fix the signal numbers for FreeBSD/Darwin, it
incorrectly changed the expected encoding of exit status for commands
that exited on a signal.  The encoding 256+signum is a feature of the
shell.  Only the signal numbers themselves are platform-dependent.

Always use the encoding 256+signum when checking exit status for
signal exits.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10137
2020-03-17 09:49:58 -07:00
Ryan Moeller
4d32abaa87
libzfs: Fix bounds checks for float parsing
UINT64_MAX is not exactly representable as a double.

The closest representation is UINT64_MAX + 1, so we can use a >=
comparison instead of > for the bounds check.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10127
2020-03-16 11:56:29 -07:00
Matthew Ahrens
7261fc2e81
Improve zfs receive performance by batching writes
For each WRITE record in the stream, `zfs receive` creates a DMU
transaction (`dmu_tx_create()`) and writes this block's data into the
object.  If per-block overheads (as opposed to per-byte overheads)
dominate performance (as is often the case with small recordsize), the
per-dmu-transaction overheads can be significant.  For example, in some
workloads the `receieve_writer` thread is 100% on CPU, and more than
half of its CPU time is in these per-tx routines (e.g.
dmu_tx_hold_write, dmu_tx_assign, dmu_tx_commit).

To improve performance of `zfs receive`, this commit batches WRITE
records which are to nearby offsets of the same object, and uses one DMU
transaction to write them all.  By default the batch size is 1MB, which
for recordsize=8K reduces the number of DMU transactions by 128x for
full send streams (incrementals will depend on how "clumpy" the changed
blocks are).

This commit improves the performance of `dd if=stream | zfs recv`
from 78,800 blocks/sec to 98,100 blocks/sec (25% improvement).

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10099
2020-03-16 11:51:56 -07:00
Brian Behlendorf
c94fb10917
Remove CI builder customization from TEST
The default options are reasonable for all of the CI builders.

* TEST_XFSTESTS_SKIP=yes  - This is already the default.
* TEST_ZTEST_TIMEOUT=3600 - Increased ztest run time only increases
  code coverage by a small degree.  Default 900s runs are sufficient.
* Disabling certain tests on 32-bit builders is no longer needed.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10129
2020-03-16 10:46:03 -07:00
Ryan Moeller
d3fe62cb35
ZTS: Update flaky tests in zts-report
Some tests which pass on FreeBSD but fail on Linux had been put in the
"maybe" set.  Move these back to "known" under an "if Linux" check so
the expected outcome is clear.

Add some tests that have been found to be flaky on FreeBSD stable/12
to the "maybe" set.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10120
2020-03-13 09:29:10 -07:00
Matthew Ahrens
0fdd6106bb
dmu_objset_from_ds must be called with dp_config_rwlock held
The normal lock order is that the dp_config_rwlock must be held before
the ds_opening_lock.  For example, dmu_objset_hold() does this.
However, dmu_objset_open_impl() is called with the ds_opening_lock held,
and if the dp_config_rwlock is not already held, it will attempt to
acquire it.  This may lead to deadlock, since the lock order is
reversed.

Looking at all the callers of dmu_objset_open_impl() (which is
principally the callers of dmu_objset_from_ds()), almost all callers
already have the dp_config_rwlock.  However, there are a few places in
the send and receive code paths that do not.  For example:
dsl_crypto_populate_key_nvlist, send_cb, dmu_recv_stream,
receive_write_byref, redact_traverse_thread.

This commit resolves the problem by requiring all callers ot
dmu_objset_from_ds() to hold the dp_config_rwlock.  In most cases, the
code has been restructured such that we call dmu_objset_from_ds()
earlier on in the send and receive processes, when we already have the
dp_config_rwlock, and save the objset_t until we need it in the middle
of the send or receive (similar to what we already do with the
dsl_dataset_t).  Thus we do not need to acquire the dp_config_rwlock in
many new places.

I also cleaned up code in dmu_redact_snap() and send_traverse_thread().

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #9662
Closes #10115
2020-03-12 10:55:02 -07:00
Alexander Motin
fa130e010c
Fix infinite scan on a pool with only special allocations
Attempt to run scrub or resilver on a new pool containing only special
allocations (special vdev added on creation) caused infinite loop
because of dsl_scan_should_clear() limiting memory usage to 5% of pool
size, which it calculated accounting only normal allocation class.

Addition of special and just in case dedup classes fixes the issue.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #10106 
Closes #8694
2020-03-12 10:52:03 -07:00
Ryan Moeller
94eb65b4c1
ZTS: Use correct signal numbers for status checks
Different operating systems encode exit status in different ways.
The logapi shell library assumes the Solaris meaning of exit codes,
which is not correct on other platforms.

Define the needed constants according to the platform we are running
on and use those to decode process exit status.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10121
2020-03-12 10:50:51 -07:00
Ryan Moeller
cdbc34fc2b
ZTS: Test boundary conditions in alloc_class_012
Issue #9142 describes an error in the checks for device removal that
can prevent removal of special allocation class vdevs in some
situations.

Enhance alloc_class/alloc_class_012_pos to check situations where this
bug occurs.

Update zts-report with knowledge of issue #9142.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10116 
Issue #9142
2020-03-12 10:50:01 -07:00
Ryan Moeller
e70b127e05
ZTS: Wait for free space between write_dirs tests
Cleanup for write_dirs involves destroying a dataset filling a pool
and then recreating the dataset for the next test.  Due to the
asynchronous nature of free space accounting, recreating the dataset
can fail for lack of space, causing problems for the next test.

Add wait_freeing $TESTPOOL to wait for the space to be freed and then
sync_pool $TESTPOOL to update the space accounting before attempting
to recreate the test filesystem.

Only use a single disk to create the pool.  Make it a small file so it
does not take too long to fill.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10112
2020-03-12 10:48:46 -07:00
John Poduska
e6b28efccc
Prevent race condition in dnode_dest (#10101)
dnode_special_close() waits for the refcount of dn_holds to go to zero
without holding the dn_mtx. dnode_rele_and_unlock() does the final
remove to dn_holds with dn_mtx being held:

	refs = zfs_refcount_remove(&dn->dn_holds, tag);
	mutex_exit(&dn->dn_mtx);

So, there is a race condition after the remove until dn_mtx is
dropped. During that time, dnode_destroy() can get called, which ends
up in dnode_dest() calling mutex_destroy() and a panic since the lock
is still held.

This change adds a condvar to wait for the final dnode_rele_and_unlock()
to release the dn_mtx before calling dnode_destroy().

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: John Poduska <jpoduska@datto.com>
Closes #7814
Closes #10101
2020-03-12 10:25:56 -07:00
Mark Roper
1e9231ada8
Prevent deadlock in arc_read in Linux memory reclaim callback
Using zfs with Lustre, an arc_read can trigger kernel memory allocation
that in turn leads to a memory reclaim callback and a deadlock within a
single zfs process. This change uses spl_fstrans_mark and
spl_trans_unmark to prevent the reclaim attempt and the deadlock
(https://zfsonlinux.topicbox.com/groups/zfs-devel/T4db2c705ec1804ba).
The stack trace observed is:

    __schedule at ffffffff81610f2e
    schedule at ffffffff81611558
    schedule_preempt_disabled at ffffffff8161184a
    __mutex_lock at ffffffff816131e8
    arc_buf_destroy at ffffffffa0bf37d7 [zfs]
    dbuf_destroy at ffffffffa0bfa6fe [zfs]
    dbuf_evict_one at ffffffffa0bfaa96 [zfs]
    dbuf_rele_and_unlock at ffffffffa0bfa561 [zfs]
    dbuf_rele_and_unlock at ffffffffa0bfa32b [zfs]
    osd_object_delete at ffffffffa0b64ecc [osd_zfs]
    lu_object_free at ffffffffa06d6a74 [obdclass]
    lu_site_purge_objects at ffffffffa06d7fc1 [obdclass]
    lu_cache_shrink_scan at ffffffffa06d81b8 [obdclass]
    shrink_slab at ffffffff811ca9d8
    shrink_node at ffffffff811cfd94
    do_try_to_free_pages at ffffffff811cfe63
    try_to_free_pages at ffffffff811d01c4
    __alloc_pages_slowpath at ffffffff811be7f2
    __alloc_pages_nodemask at ffffffff811bf3ed
    new_slab at ffffffff81226304
    ___slab_alloc at ffffffff812272ab
    __slab_alloc at ffffffff8122740c
    kmem_cache_alloc at ffffffff81227578
    spl_kmem_cache_alloc at ffffffffa048a1fd [spl]
    arc_buf_alloc_impl at ffffffffa0befba2 [zfs]
    arc_read at ffffffffa0bf0924 [zfs]
    dbuf_read at ffffffffa0bf9083 [zfs]
    dmu_buf_hold_by_dnode at ffffffffa0c04869 [zfs]

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mark Roper <markroper@gmail.com>
Closes #9987
2020-03-12 10:24:43 -07:00
Olaf Faaland
61871518dd
zloop.sh should call ZDB with pool name
Commit 54007c79 introduced an error, changing the final
argument to $ZDB from ztest to $ZTEST.  This argument
indicates the pool name, not the script, and so should
not have been changed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #10118
2020-03-11 10:02:23 -07:00
Ryan Moeller
ddd9ef3a4f
ZTS: Add a failsafe callback to run after each test
Tests that get killed do not have an opportunity to clean up.

There are many bad states this can leave the system in, but of
particular gravity is when zinject has been used to induce bad
behavior for one or more of the test disks.

Create a failsafe mechanism in test-runner.py that runs a callback
script after every test. The script is common to all tests so all
tests benefit from the protection.

Add an obligatory `zinject -c all` to clear all zinject state after
every test case is run.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10096
2020-03-10 11:00:56 -07:00
Matthew Ahrens
1dc32a67e9
Improve zfs send performance by bypassing the ARC
When doing a zfs send on a dataset with small recordsize (e.g. 8K),
performance is dominated by the per-block overheads.  This is especially
true with `zfs send --compressed`, which further reduces the amount of
data sent, for the same number of blocks.  Several threads are involved,
but the limiting factor is the `send_prefetch` thread, which is 100% on
CPU.

The main job of the `send_prefetch` thread is to issue zio's for the
data that will be needed by the main thread.  It does this by calling
`arc_read(ARC_FLAG_PREFETCH)`.  This has an immediate cost of creating
an arc_hdr, which takes around 14% of one CPU.  It also induces later
costs by other threads:

 * Since the data was only prefetched, dmu_send()->dmu_dump_write() will
   need to call arc_read() again to get the data.  This will have to
   look up the arc_hdr in the hash table and copy the data from the
   scatter ABD in the arc_hdr to a linear ABD in arc_buf.  This takes
   27% of one CPU.

 * dmu_dump_write() needs to arc_buf_destroy()  This takes 11% of one
   CPU.

 * arc_adjust() will need to evict this arc_hdr, taking about 50% of one
   CPU.

All of these costs can be avoided by bypassing the ARC if the data is
not already cached.  This commit changes `zfs send` to check for the
data in the ARC, and if it is not found then we directly call
`zio_read()`, reading the data into a linear ABD which is used by
dmu_dump_write() directly.

The performance improvement is best expressed in terms of how many
blocks can be processed by `zfs send` in one second.  This change
increases the metric by 50%, from ~100,000 to ~150,000.  When the amount
of data per block is small (e.g. 2KB), there is a corresponding
reduction in the elapsed time of `zfs send >/dev/null` (from 86 minutes
to 58 minutes in this test case).

In addition to improving the performance of `zfs send`, this change
makes `zfs send` not pollute the ARC cache.  In most cases the data will
not be reused, so this allows us to keep caching useful data in the MRU
(hit-once) part of the ARC.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10067
2020-03-10 10:51:04 -07:00
Ryan Moeller
9be70c3784
ZTS: Simplify some libtest functions
Don't echo the results of arithmetic expressions, it's not necessary.

Use hw.clockrate sysctl to get CPU freq instead of parsing dmesg.boot
for a line that might not even be there anymore.

Reduce bookkeeping in fill_fs, making it easier to follow.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10113
2020-03-10 10:44:14 -07:00
Richard Laager
5ecbb293c6 Fix zfs-functions packaging bug
This fixes a bug where the generated zfs-functions was being included
along with original zfs-functions.in in the make dist tarball.  This
caused an unfortunate series of events during build/packaging that
resulted in the RPM-installed /etc/zfs/zfs-functions listing the
paths as:

ZFS="/usr/local/sbin/zfs"
ZED="/usr/local/sbin/zed"
ZPOOL="/usr/local/sbin/zpool"

When they should have been:

ZFS="/sbin/zfs"
ZED="/sbin/zed"
ZPOOL="/sbin/zpool"

This affects init.d (non-systemd) distros like CentOS 6.

/etc/default/zfs and /etc/zfs/zfs-functions are also used by the
initramfs, so they need to be built even when init.d support is not.
They have been moved to the (new) etc/default and (existing) etc/zfs
source directories, respectively.

Fixes: #9443

Co-authored-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Richard Laager <rlaager@wiktel.com>
2020-03-10 09:53:20 -07:00
Richard Laager
01243e72a5 initramfs: Eliminate substitutions
These are now handled in zfs-functions, so this is all duplicative and
unnecessary.

Signed-off-by: Richard Laager <rlaager@wiktel.com>
2020-03-10 09:53:20 -07:00
Richard Laager
49afc91387 Delete built init scripts in make clean
Previously, they were being deleted in make distclean.  This brings it
in line with the example:
https://www.gnu.org/software/automake/manual/html_node/Scripts.html

Signed-off-by: Richard Laager <rlaager@wiktel.com>
2020-03-10 09:53:20 -07:00
Richard Laager
dc4dd46728 Make init scripts depend on Makefile
This brings it in line with the example:
https://www.gnu.org/software/automake/manual/html_node/Scripts.html

This way, if the substitution code is changed, they should update.

Signed-off-by: Richard Laager <rlaager@wiktel.com>
2020-03-10 09:53:20 -07:00
InsanePrawn
ff2f960b24
Systemd mount generator: don't fail keyload from file if already loaded
Previously the generated keyload units for encryption roots with
keylocation=file://* didn't contain the code to detect if the key
was already loaded and would be marked failed in such situations.

Move the code to check whether the key is already loaded
from keylocation=prompt handling to general key loading code.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: InsanePrawn <insane.prawny@gmail.com>
Closes #10103
2020-03-09 11:09:09 -07:00
Ryan Moeller
2b95e91132
ZTS: Another round of changes for FreeBSD
Highlights:
* is_linux -> is_illumos swaps
* make block_device_wait more clever when paths are given
* slightly optimize default_cleanup_noexit
* remove platform differences in user_run
* temporarily expect non-libfetch behavior for keylocation=/foo/bar
* fix sharenfs exceptions
* don't test multihost property
* fix misc broken platform checks
* clear zinjected faults in removal_resume_export callback

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10092
2020-03-06 09:31:32 -08:00
Ryan Moeller
f5f6fb03b7
Change default to overlay=on
Filesystems allow overlay mounts by default on FreeBSD and Linux.

Respect the native convention by switching the default to overlay=on,
while retaining the option to turn the property off for compatibility
with other operating systems' conventions.

Update documentation and tests accordingly.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10030
2020-03-06 09:28:19 -08:00
Ryan Moeller
788398c562
ZTS: Update zts-report exceptions for FreeBSD
The new zfs_sync_trim_* tests are skipped on FreeBSD.
Both of the previously failing tests are now passing.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10105
2020-03-06 09:26:38 -08:00
Brian Behlendorf
5a1abc4b5b
ZTS: Speed up write_dirs cleanup
The write_dirs tests fill a filesystem with a bunch of files until it
is full.  In cleanup the files are truncated and removed individually.
These tests already take a while to run.

It is quicker and easier to destroy the whole dataset and create a new
one to replace it in the cleanup functions.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10098
2020-03-04 15:12:12 -08:00
Brian Behlendorf
fa23c5be88
ZTS: Add missing quotes
`default_setup` takes a disk list as the first argument and has
optional additional arguments that control secondary functionality.
A couple of test setups mistakenly call `default_setup $DISKS`.

Add quotes so the second and subsequent disks are correctly included
in the pool as vdevs rather than triggering unwanted behavior from
`default_setup`.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10097
2020-03-04 15:10:45 -08:00
Brian Behlendorf
4b06d05298
ZTS: Add zts-report exceptions for FreeBSD
There are three tests we expect to fail only on FreeBSD.
* link_count never exits and eventually times out:
 - @amotin tells me this test is probably not applicable to us
 - Skip on FreeBSD
* userobj feature does not activate immediately after pool upgrade
 - low impact; we are aware of this issue
* removal does not appear to condense on export
 - low impact; we are aware of this issue

Additionally removal_with_zdb passes on FreeBSD, so it is moved to
"maybe".

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10093
2020-03-04 15:09:40 -08:00
Brian Behlendorf
f49db9b504
zio: dprintf_bp() if errors > 0 in zfs_blkptr_verify()
Also dprintf_bp() in case BLK_VERIFY_HALT of zfs_blkptr_verify_log()
since dprintf_bp() in zfs_blkptr_verify() will never be executed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Justin Keogh <commits@v6y.net>
Closes #10086
2020-03-04 15:08:41 -08:00
Brian Behlendorf
d16c029f99
ZTS: Test the correct filesystem_limits behavior
See issue #8226: Property filesystem_limit does not work as documented

There have been previous attempts to fix the behavior on Linux, but so
far the issue is still open.  See PRs #8228, #8280.

The existing tests pass for the incorrect behavior.  This is a problem
on FreeBSD; we are failing the tests because we implement the feature
correctly.

I have adapted the tests based on the work by @loli10k in #8280 and
extended the changes to fix the snapshot_limit test as well.

Linux now fails these tests, so entries linking to the issue have been
added to the "maybe" group in zts-report.py.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10082
2020-03-04 15:07:52 -08:00
Brian Behlendorf
2288d41968
Add trim support to zpool wait
Manual trims fall into the category of long-running pool activities
which people might want to wait synchronously for. This change adds
support to 'zpool wait' for waiting for manual trim operations to
complete. It also adds a '-w' flag to 'zpool trim' which can be used to
turn 'zpool trim' into a synchronous operation.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Signed-off-by: John Gallagher <john.gallagher@delphix.com>
Closes #10071
2020-03-04 15:07:11 -08:00
Matthew Ahrens
b3212d2fa6
Improve performance of zio_taskq_member
__zio_execute() calls zio_taskq_member() to determine if we are running
in a zio interrupt taskq, in which case we may need to switch to
processing this zio in a zio issue taskq.  The call to
zio_taskq_member() can become a performance bottleneck when we are
processing a high rate of zio's.

zio_taskq_member() calls taskq_member() on each of the zio interrupt
taskqs, of which there are 21.  This is slow because each call to
taskq_member() does tsd_get(taskq_tsd), which on Linux is relatively
slow.

This commit improves the performance of zio_taskq_member() by having it
cache the value of tsd_get(taskq_tsd), reducing the number of those
calls to 1/21th of the current behavior.

In a test case running `zfs send -c >/dev/null` of a filesystem with
small blocks (average 2.5KB/block), zio_taskq_member() was using 6.7% of
one CPU, and with this change it is reduced to 1.3%.  Overall time to
perform the `zfs send` reduced by 10% (~150,000 block/sec to ~165,000
blocks/sec).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10070
2020-03-03 10:29:38 -08:00
Ryan Moeller
0a0f9a7dc6
ZTS: Provide for nested cleanup routines
Shared test library functions lack a simple way to ensure proper
cleanup in the event of a failure.  The `log_onexit` cleanup pattern
cannot be used in library functions because it uses one global
variable to store the cleanup command.

An example of where this is a serious issue is when a tunable that
artifically stalls kernel progress gets activated and then some check
fails.  Unless the caller knows about the tunable and sets it back,
the system will be left in a bad state.

To solve this problem, turn the global cleanup variable into a stack.
Provide push and pop functions to add additional cleanup steps and
remove them after it is safe again.

The first use of this new functionality is in attempt_during_removal,
which sets REMOVAL_SUSPEND_PROGRESS.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10080
2020-03-03 10:28:09 -08:00
Ryan Moeller
9bb907bc3f
Make spa_history_zone platform-dependent in kernel
This function should only return "linux" on Linux.

Move the kernel part of the function out of common code.
Fix the tests for FreeBSD.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10079
2020-03-02 09:43:30 -08:00
Ryan Moeller
1289fbb557
ZTS: Change issue URL template to OpenZFS org
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10081
2020-03-02 09:42:22 -08:00
Matthew Macy
d32eff3a27
Don't open zfs control device exclusively
With the FreeBSD platform changes that were made for #10073
it is no longer necessary on FreeBSD to open the control device
exclusively to get onexit callbacks invoked.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10076
2020-02-28 14:54:14 -08:00
Matthew Macy
cf118ae8dc
Don't call zrele on passed zp in zfs_xattr_owner_unlinked on FreeBSD
FreeBSD has a somewhat more cumbersome locking and refcounting
protocol for the platform counterpart to znode. We need to not call
zrele on the passed zp, but do need to do so on any intermediate zp.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10075
2020-02-28 14:53:18 -08:00
Matthew Macy
ae9f92f6f3
Re-share zfsdev_getminor and zfs_onexit_fd_hold
By adding a zfs_file_private accessor to the common
interfaces and some extensions to FreeBSD platform
code it is now possible to share the implementations
for the aforementioned functions.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10073
2020-02-28 14:50:32 -08:00
Matthew Ahrens
9cdf7b1f6b
Improve zfs destroy performance with zio_t-free zio_free()
When "zfs destroy" is run, it completes quickly, and in the background
we locate the blocks to free and free them.  This background activity
can be observed with `zpool get freeing` and `zpool wait -t free ...`.

This background activity is processed by a single thread (the spa_sync
thread) which calls zio_free() on each of the blocks to free.  With even
modest storage performance, the CPU consumption of zio_free() can be the
performance bottleneck.

Performance of zio_free() can be improved by not actually creating a
zio_t in the common case (non-dedup, non-gang), instead calling
metaslab_free() directly.  This avoids the CPU cost of allocating the
zio_t, and more importantly the cost of adding and later removing this
zio_t from the parent zio's child list.

The result is that performance of background freeing more than doubles,
from 0.6 million blocks per second to 1.3 million blocks per second.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10034
2020-02-28 14:49:44 -08:00
Ryan Moeller
6c0abcfddd
ZTS: Fixup shebang in rsend_016, add to common.run
All other ksh scripts use /bin/ksh in the shebang.

Make rsend_016_neg consistent with the rest of the suite.

The test also was absent from any runfiles. Add it to common.run.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10051
2020-02-28 09:48:29 -08:00
Ryan Moeller
f0410e9806
ZTS: Eliminate partitioning from zpool_add
Use file vdevs if we are short on $DISKS.
Also fixed vol recursion for FreeBSD in 004.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10060
2020-02-28 09:46:51 -08:00
Brian Behlendorf
3f99a3abc7
Fix CONFIG_MODULES=no Linux kernel config
When configuring as builtin (--enable-linux-builtin) for kernels
without loadable module support (CONFIG_MODULES=n) only the object
file is created.  Never a loadable kmod.

Update ZFS_LINUX_TRY_COMPILE to handle this in a manor similar to
the ZFS_LINUX_TEST_COMPILE_ALL macro.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9887
Closes #10063
2020-02-28 09:23:48 -08:00
Brian Behlendorf
bd0d24e09b
Linux 5.5 compat: blkg_tryget()
Commit https://github.com/torvalds/linux/commit/9e8d42a0f accidentally
converted the static inline function blkg_tryget() to GPL-only for
kernels built with CONFIG_PREEMPT_RCU=y and CONFIG_BLK_CGROUP=y.

Resolve the build issue by providing our own equivalent functionality
when needed which uses rcu_read_lock_sched() internally as before.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9745
Closes #10072
2020-02-28 08:58:39 -08:00
Ryan Moeller
2ce90dca91
arc_summary: Make get_descriptions per platform
Linux uses modinfo to get tunables descriptions, FreeBSD has to use
sysctl.

Move the existing function definition so it is defined that way on
Linux, and add a definition in terms of sysctl for FreeBSD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10062
2020-02-27 17:15:06 -08:00
Ryan Moeller
276410387e
pyzfs: Add constants for platform-specific errnos
FreeBSD doesn't have EBADE, ECHRNG, or ETIME.

Add constants for these and set them appropriately for the platform.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10061
2020-02-27 17:14:21 -08:00
Matthew Macy
13fac09868
Consolidate arc_buf allocation checks
The following check currently occurs in three separate locations
in dbuf.c.  This change consolidates those checks in to the
dbuf_alloc_arcbuf_from_arcbuf() function.

if (arc_is_encrypted(data)) {
...
} else if (compress_type != ZIO_COMPRESS_OFF) {
...
} else {
...
}

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10057
2020-02-27 17:12:44 -08:00
Ryan Moeller
3d5ba1cf29
ZTS: Misc fixes for FreeBSD
* Set geom debug flags in corrupt_blocks_at_level
* Use the right time zone for history tests
* Add missing commands.cfg entry for diskinfo
* Rewrite get_last_txg_synced to use zdb
* Don't check ulimits for sparse files
* Suspend removal before removing a vdev, not after

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10054
2020-02-27 09:38:34 -08:00
Matthew Ahrens
ab9646166d
ZTS: Fix zfs_receive_004_neg
`zfs recv` of an incremental stream that already exists is ignored, with
a message like:

    receiving incremental stream of pool/fs@incsnap into pool/fs@incsnap
    snap testpool/testfs@incsnap already exists; ignoring

And the command exits successfully (exit code 0).

The zfs_receive_004_neg test is expecting that a this case will fail,
with nonzero exit code.

The fix is to remove this specific command from the test case.  This
lets us check that the remaining commands do in fact fail.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10055
2020-02-27 09:37:34 -08:00
Brian Behlendorf
2c3a83701d Linux 5.6 compat: time_t
As part of the Linux kernel's y2038 changes the time_t type has been
fully retired.  Callers are now required to use the time64_t type.

Rather than move to the new type, I've removed the few remaining
places where a time_t is used in the kernel code.  They've been
replaced with a uint64_t which is already how ZFS internally
handled these values.

Going forward we should work towards updating the remaining user
space time_t consumers to the 64-bit interfaces.

Reviewed-by: Matthew Macy <mmacy@freebsd.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10052
Closes #10064
2020-02-27 09:31:02 -08:00
Brian Behlendorf
ff5587d651 Linux 5.6 compat: ktime_get_raw_ts64()
The getrawmonotonic() and getrawmonotonic64() interfaces have been
fully retired.  Update gethrtime() to use the replacement interface
ktime_get_raw_ts64() which was introduced in the 4.18 kernel.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10052
Closes #10064
2020-02-27 09:30:45 -08:00