The block pointer verification check in arc_read() should also
cover embedded block pointers. While highly unlikely, accessing
a damaged block pointer can result in panic. To further harden
the code extend the existing check to include embedded block
pointers and add a comment explaining the rational for this
sanity check. Lastly, correct a flaw in zfs_blkptr_verify()
so the error count is checked even when checking a untrusted
config to verify the non-pool-specific portions of a block
pointer.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#12535
For kernel to send snapshot mount/unmount events to zed.
For kernel to send symlink creates/removes on zvol plumbing.
(/dev/run/dsk/zvol/$pool/$zvol -> /dev/diskX)
If zed misses the ENODEV, all errors after are EINVAL. Treat any error
as kernel module failure.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes#12416
When zfs_send_corrupt_data is set, use the TRAVERSE_HARD flag,
so traverse_visitbp() will not fail with ECKSUM if a blockpointer
cannot be read, but rather will continue and send the objects it can.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Sponsored-By: Klara Inc.
Sponsored-By: WHC Online Solutions Inc.
Closes#12541
Unfortunately, there was an overzealous assertion that was (in pretty
specific circumstances) false, causing failure. This assertion was
added in error, so we're removing it.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes#9897Closes#12020Closes#12246
We round up the psize to the nearest multiple of the asize or to the
lsize, whichever is smaller. Once that's done, we allocate a new
buffer of the appropriate size, zero the tail, and copy the data
into it. This adds a small performance cost to these kinds of writes,
but fixes the bookkeeping problems.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Co-authored-by: Matthew Ahrens <matthew.ahrens@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes#12522Closes#8462
Previously, zpool-iostat did not display any data regarding rebuild I/Os
in either the latency/size histograms (-w/-l/-r) or the queue data (-q).
This fix essentially utilizes the existing infrastructure for tracking
rebuild queue data and displays this data in the proper places within
zpool-iostat's output.
Signed-off-by: Trevor Bautista <tbautista@newmexicoconsortium.org>
Signed-off-by: Trevor Bautista <tbautista@lanl.gov>
Co-authored-by: Trevor Bautista <tbautista@newmexicoconsortium.org>
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
benchmark_raidz() allocates a row to benchmark parity calculation and
reconstruction. In the latter case, the parity blocks are left
uninitialized, leading to reports from KMSAN.
Initialize parity blocks to 0xAA as we do for the data earlier in the
function. This does not affect the selected RAID-Z implementation on
any of several systems tested.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes#12473
When a header is allocated for full overwrite it is a waste of time
to allocate b_pabd/b_rabd for it, since arc_write() will free them
without ever being touched. If it is a read or a partial overwrite
then arc_read() and arc_hdr_decrypt() allocate them explicitly.
Reduced memory allocation in user threads also reduces ARC eviction
throttling there, proportionally increasing it in ZIO threads, that
is not good. To minimize or even avoid it introduce ARC allocation
reserve, allowing certain arc_get_data_abd() callers to allocate a
bit longer in situations where user threads will already throttle.
Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes#12398
It is very expensive and not informative to call multilist_is_empty()
for each arc_change_state() on debug builds to check for impossible.
Instead implement special index function for arc_l2c_only->arcs_list,
multilists, panicking on any attempt to use it.
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes#12421
Instead of clearing stats inside arc_buf_alloc_impl() do it inside
arc_hdr_alloc() and arc_release(). It fixes statistics being wiped
every time a new dbuf is filled from the ARC.
Remove b_l1hdr.b_l2_hits. L2ARC hits are accounted at b_l2hdr.b_hits.
Since the hits are accounted under hash lock, replace atomics with
simple increments.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes#12422
vq_lock is already too congested for two more operations per I/O.
Instead of dropping and reacquiring it inside vdev_queue_aggregate()
delegate the zio_vdev_io_bypass() and zio_execute() calls for parent
I/Os to callers, that drop the lock any way to execute the new I/O.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes#12297
Use atomic_load_64() for zfs_refcount_count() to prevent torn reads
on 32-bit platforms. On 64-bit ones it should not change anything.
When built with ZFS_DEBUG but running without tracking enabled use
atomics instead of mutexes same as for builds without ZFS_DEBUG.
Since rc_tracked can't change live we can check it without lock.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes#12420
Before OpenZFS 2.0, trying to set the FreeBSD sysctl vfs.zfs.arc_max
to a disallowed value would return an error.
Since the switch, it instead only generates WARN_IF_TUNING_IGNORED
Keep the ability to set the sysctl's specifically to 0, even though
that is less than the minimum, because some tests depend on this.
Also lost, was the ability to set vfs.zfs.arc_max to a value less
than the default vfs.zfs.arc_min at boot time. Restore this as well.
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes#12161
Run arc_evict thread at higher priority, nice=0, to give it more CPU
time which can improve performance for workload with high ARC evict
activities.
On mixed read/write and sequential read workloads, I've seen between
10-40% better performance.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Tony Nguyen <tony.nguyen@delphix.com>
Closes#12397
It is wrong for arc_write_ready() to use zfs_abd_scatter_enabled to
decide whether to reallocate/copy the buffer, because the answer is
OS-specific and depends on the buffer size. Instead of that use
abd_size_alloc_linear(), moved into public header.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes#12425
Commit 5dbf6c5a66 did not address these format specifier warnings
since they were introduced by an unrelated commit which had not
been rebased on 5dbf6c5a66 when merged. Fix them.
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#12435
In l2arc_add_vdev() first decide whether the device is eligible for
L2ARC rebuild or whole device trim and then add it to the list of cache
devices. Otherwise l2arc_feed_thread() might already start writing on
the device invalidating previous content as l2ad_hand = l2ad_start.
However l2arc_rebuild_vdev() needs the device present in the cache
device list to figure out its l2arc_dev_t. Fix this by moving most of
l2arc_rebuild_vdev() in a new function l2arc_rebuild_dev() which does
not need to search in the cache device list.
In contrast to l2arc_add_vdev() we do not have to worry about
l2arc_feed_thread() invalidating previous content when onlining a
cache device. The device parameters (l2ad*) are not cleared when
offlining the device and writing new buffers will not invalidate
all previous content. In worst case only buffers that have not had
their log block written to the device will be lost.
Retire persist_l2arc_00{4,5,8} tests since they cover code already
covered by the remaining ones. Test persist_l2arc_006 is renamed to
persist_l2arc_004 and persist_l2arc_007 is renamed to persist_l2arc_005.
Fix a typo in persist_l2arc_004, and remove an assertion that is not
always true from l2arc_arcstats_pos. Also update an assertion in
persist_l2arc_005 and explain why in a comment.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes#12365
These were mostly used to annotate do {} while(0)s
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Issue #12201
This includes a simplification of mkbusy and format correctness in zhack
and ztest
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Issue #12201
It seems nothing ensures that this array is zeroed when a dnode is
freshly allocated, so in principle it retains the values from the
previous allocation. In practice it seems to be the case that the
fields should end up zeroed, but we can zero the field anyway for
consistency.
This was found using KMSAN.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes#12383
When logging a TX_WRITE record in the case where file data has to be
copied from the DMU, we pad the log record size to a multiple of 8
bytes. In this case, any padding bytes should be zeroed, otherwise the
contents of uninitialized memory are written to the ZIL.
This was found using KMSAN.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes#12383
When allocating a record, we round up the allocation size to a multiple
of 8. In this case, any padding bytes should be zeroed, otherwise the
contents of uninitialized memory are written to the ZIL.
This was found using KMSAN.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes#12383
When logging TX_SETATTR, we could otherwise fail to initialize part of
the corresponding ZIL record depending on which fields are present in
the xvattr. Initialize the creation time and the AV scan timestamp to
zero so that uninitialized bytes are not written to the ZIL.
This was found using KMSAN.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes#12383
spa_prop_find() may fail to find the specified property, in which case
it suppresses ENOENT from zap_lookup(). In this case, the return value
is left uninitialized, so spa_autoreplace was being initialized using an
uninitialized stack variable.
This was found using KMSAN. It appears to be a regression from commit
9eb7b46ed0, which removed the initialization of "autoreplace" from the
definition.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes#12383
After 1325434b, we can in certain circumstances end up calling
spa_update_dspace with vd->vdev_mg NULL, which ends poorly during
vdev removal.
So let's not do that further space adjustment when we can't.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes#12380Closes#12428
Remove mc_lock use from metaslab_class_throttle_*(). The math there
is based on refcounts and so atomic, so the only race possible there
is between zfs_refcount_count() and zfs_refcount_add(). But in most
cases metaslab_class_throttle_reserve() is called with the allocator
lock held, which covers the race. In cases where the lock is not
held, GANG_ALLOCATION() or METASLAB_MUST_RESERVE are set, and so we
do not use zfs_refcount_count(). And even if we assume some other
non-existing scenario, the worst that may happen from this race is
few more I/Os get to allocation earlier, that is not a problem.
Move locks and data of different allocators into different cache
lines to avoid false sharing. Group spa_alloc_* arrays together
into single array of aligned struct spa_alloc spa_allocs. Align
struct metaslab_class_allocator.
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Don Brady <don.brady@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes#12314
* Add Module Parameters Regarding Log Size Limit
zfs_wrlog_data_max
The upper limit of TX_WRITE log data. Once it is reached,
write operation is blocked, until log data is cleared out
after txg sync. It only counts TX_WRITE log with WR_COPIED
or WR_NEED_COPY.
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: jxdking <lostking2008@hotmail.com>
Closes#12284
Remove unneeded global, practically constant, state pointer variables
(arc_anon, arc_mru, etc.), replacing them with macros of real state
variables addresses (&ARC_anon, &ARC_mru, etc.).
Change ARC_EVICT_ALL from -1ULL to UINT64_MAX, not requiring special
handling in inner loop of ARC reclamation. Respectively change bytes
argument of arc_evict_state() from int64_t to uint64_t.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes#12348
Ensure all calls to bqueue_init() has a corresponding call to bqueue_destroy()
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes#12118
* zio: avoid callback typecasting
* zil: avoid zil_itxg_clean() callback typecasting
* zpl: decouple zpl_readpage() into two separate callbacks
* nvpair: explicitly declare callbacks for xdr_array()
* linux/zfs_nvops: don't use external iput() as a callback
* zcp_synctask: don't use fnvlist_free() as a callback
* zvol: don't use ops->zv_free() as a callback for taskq_dispatch()
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Closes#12260
We don't use or need the pool name or value source in the zvol tasks.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#12361
Most of dsl_dir_diduse_space() and dsl_dir_transfer_space() CPU time
is a dd_lock overhead and time spent in dmu_buf_will_dirty(). Calling
them one after another is a waste of time and even more contention.
Doing that twice for each rewritten block within dbuf_write_done()
via dsl_dataset_block_kill() and dsl_dataset_block_born() created one
of the biggest CPU overheads in case of small blocks rewrite.
dsl_dir_diduse_transfer_space() combines functionality of these two
functions for cases where it is needed, but without double overhead,
practically for the cost of dsl_dir_diduse_space() or even cheaper.
While there, optimize dsl_dir_phys() calls in dsl_dir_diduse_space()
and dsl_dir_transfer_space(). It seems Clang detects some aliasing
there, repeating dd->dd_dbuf->db_data dereference multiple times,
increasing dd_lock scope and contention.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Author: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes#12300
* Tinker with slop space accounting with dedup
Do not include the deduplicated space usage in the slop space
reservation, it leads to surprising outcomes.
* Update spa_dedup_dspace sometimes
Sometimes, we get into spa_get_slop_space() with
spa_dedup_dspace=~0ULL, AKA "unset", while spa_dspace is correctly set.
So call the code to update it before we use it if we hit that case.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes#12271
arc_evict_hdr() returns number of evicted bytes in scope of specific
state. For ghost states it does not mean the amount of really freed
memory, but the logical buffer size. It is correct for the eviction
process, but not for waking up threads waiting for ARC size reduction,
as added in "Revise ARC shrinker algorithm" commit, causing premature
wakeups while ARC is still overflowed, allowing even bigger overflow,
plus processing overhead when next allocation will also get blocked,
probably also for too short time.
To fix that make arc_evict_hdr() also return the amount of really
freed memory, which for the ghost states is only the header, and use
it to update arc_evict_count instead. Originally I was thinking to
not return it at all, since arc_get_data_impl() does not account for
the headers, but decided that some slow allocation progress is better
than long waits, reaching on my tests up to 100ms.
To reduce negative latency effects of long time periods when reclaim
thread can free little real memory, start reclamation process earlier,
before we actually reached the overflow threshold, when we have to
throttle new allocations. We can also do it without taking global
arc_evict_lock, reducing the contention.
Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes#12279
Callers of zfs_file_get and zfs_file_put can corrupt the reference
counts for the file structure resulting in a panic or a soft lockup.
When zfs send/recv runs, it will add a reference count to the
open file, and begin to send or recv the stream. If the file descriptor
is closed, then when dmu_recv_stream() or dmu_send() return we will
call zfs_file_put to remove the reference we placed on the file
structure. Unfortunately, because zfs_file_put() uses the file
descriptor to lookup the file structure, it may end up finding that
the file descriptor table no longer contains the file struct, thus
leaking the file structure. Or it might end up finding a file
descriptor for a different file and blindly updating its reference
counts. Other failure modes probably exists.
This change reworks the zfs_file_[get|put] interface to not rely
on the file descriptor but instead pass the zfs_file_t pointer around.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Co-authored-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: George Wilson <gwilson@delphix.com>
External-issue: DLPX-76119
Closes#12299
This dramatically reduces the lock contention on systems with slower
(non-TSC) timecounters. With TSC the difference is minimal, but since
this lock is pretty congested, any improvement counts. Plus I don't
see any reason to do it under the lock other than the latency of the
lock itself, which this change actually reduces.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes#12281
With default dbuf cache size of 1/32 of ARC, it makes no sense to have
hash table of the same size (or even bigger on Linux). Reduce it to
1/8 of ARC's one, still leaving some slack, assuming higher I/O rate
via dbuf cache than via ARC.
Remove padding from ARC hash locks array. The idea behind padding
is to avoid false sharing between locks. It would have sense if
there would be a limited number of very busy locks. But since we
have no limit on the number, using the same memory for more locks we
can achieve even lower lock contention with the same false sharing,
or we can use less memory for the same contention level.
Reduce number of hash locks from 8192 to 2048. The number is still
big enough to not cause contention, but reduced memory size improves
cache hit rate for mutex_tryenter() in ARC eviction thread, saving
about 1% of the thread time.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes#12289
Fix a leak of abd_t that manifested mostly when using
raidzN with at least as many columns as N (e.g. a
four-disk raidz2 but not a three-disk raidz2).
Sufficiently heavy raidz use would eventually run a system
out of memory.
Additionally:
* Switch abd_cache arena to FIRSTFIT, which empirically
improves perofrmance.
* Make abd_chunk_cache more performant and debuggable.
* Allocate the abd_zero_buf from abd_chunk_cache rather
than the heap.
* Don't try to reap non-existent qcaches in abd_cache arena.
* KM_PUSHPAGE->KM_SLEEP when allocating chunks from their
own arena
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Co-authored-by: Sean Doran <smd@use.net>
Closes#12295
dmu_zfetch_stream_fini() is missing calls to destroy the refcounts,
leaking them and the mutex inside.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes#12294
Use dp_dirty_pertxg[] for txg_kick(), instead of dp_dirty_total in
original code. Extra parameter "txg" is added for txg_kick(), thus it
knows which txg to kick. Also txg_kick() call is moved from
dsl_pool_need_dirty_delay() to dsl_pool_dirty_space() so that we can
know the txg number assigned for txg_kick().
Some unnecessary code regarding dp_dirty_total in txg_sync_thread() is
also cleaned up.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: jxdking <lostking2008@hotmail.com>
Closes#12274
The only reason for spa_config_*() to use refcount instead of simple
non-atomic (thanks to scl_lock) variable for scl_count is tracking,
hard disabled for the last 8 years. Switch to simple int scl_count
reduces the lock hold time by avoiding atomic, plus makes structure
fit into single cache line, reducing the locks contention.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes#12287
LLVM's Polly (ISL to be precise) is unhappy with the loop from
ddt_stat_add():
CC [M] fs/zfs/zfs/ddt.o
../lib/External/isl/isl_schedule_node.c:2470: cannot insert node
between set or sequence node and its filter children
(building with the custom patch which adds Polly support to Kbuild)
The mentioned loop is rather suboptimal. All that we need is to just
treat ddt_stat_t as an array of u64 and perform 1:1 addition or
substraction. This can be done in simpler for-loop with the
determined index and bounds. Compiler will expand d_end - d into
a number of ddt_stat_t fields at compile time.
This prevents Polly from failing on this file.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Closes#12253
The number of sublists in a multilist is relatively small. We dont need
64 bits to calculate an index. 32 bits is sufficient and makes the
code more efficient.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes#12288
While abd_verify() does nothing when built without debug, compiler
can't optimize it out by itself due to calls to external list_*()
and abd_verify_scatter(). This commit makes it explicit.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Adam Moss <c@yotes.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes#12280
Unlike most other properties the 'compatibility' property is stored
in the pool config object and not the DMU_OT_POOL_PROPS object.
This had the advantage that the compatibility information is available
without needing to fully import the pool (it can be read with zdb).
However, this means we need to make sure to update both the copy of
the config in the MOS and the cache file. This wasn't being done.
This commit adds a call to spa_async_request() to ensure the copy of
the config in the cache file gets updated as well as the one stored
in the pool. This same change is made for the 'comment' property
which suffers from the same inconsistency.
Reviewed-by: Sean Eric Fagan <sef@ixsystems.com>
Reviewed-by: Colm Buckley <colm@tuatha.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#12261Closes#12276
According to current zfs man page zfs_metaslab_mem_limit should be
25 instead of 75.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: jumbi77@users.noreply.github.comCloses#12273