Fix data corruption when cloning embedded blocks
Don't overwrite blk_phys_birth, as for embedded blocks it is part of
the payload.
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Issue #13392Closes#14739
Approved by: oshogbo, mm
There is one data corruption problem reported and fixed upstream, not
cherry-picked here yet.
On top of it the following fires under load:
VERIFY(zil_replaying(zfsvfs->z_log, tx));
The patch which introduced the entire machinery is a revert candidate,
but as the machinery came with a dedicated feature flag, doing so would
render affected pools read-only at best. To be figured out.
As a temporary bandaid at least stop the active usage.
Note this patch does not make the feature disappear from zpool upgrade.
Sponsored by: Rubicon Communications, LLC ("Netgate")
It is not implemented and causes panics on boot.
This is a temporary measure until someone(tm) sorts it out.
Reported by: many
Sponsored by: Rubicon Communications, LLC ("Netgate")
API contract requires VOPs to handle EXDEV internally, worst case by
falling back to the generic copy routine. This broke with the recent
changes.
While here whack custom loop to lock 2 vnodes with vn_lock_pair, which
provides the same functionality internally. write start/finish around
it plays no role so got eliminated.
One difference is that vn_lock_pair always takes an exclusive lock on
both vnodes. I did not patch around it because current code takes an
exclusive lock on the target vnode. zfs supports shared-locking for
writes, so this serializes different calls to the routine as is, despite
range locking inside. At the same time you may notice the source vnode
can get some traffic if only shared-locked, thus once more this goes
the safer route of exclusive-locking. Note this should be patched to
use shared-locking for both once the feature is considered stable.
Technically the switch to vn_lock_pair should be a separate change, but
it would only introduce churn immediately whacked by the rest of the
patch.
[note: technically the review is still in progress, but so is the
active breakage]
Sponsored by: Rubicon Communications, LLC ("Netgate")
If block_cloning is disabled, or other errors from zfs_clone_range()
return an EXDEV we should fall back to vn_generic_copy_file_range().
This fixes issues when copying files on the same dataset with
block_cloning disabled.
Upstreamed as pull request to OpenZFS.
Reviewed by: Mateusz Guzik <mjguzik@gmail.com>
OpenZFS pull request: 14713
Notable upstream pull request merges:
#12194 Fix short-lived txg caused by autotrim
#13368 ZFS_IOC_COUNT_FILLED does unnecessary txg_wait_synced()
#13392 Implementation of block cloning for ZFS
#13741 SHA2 reworking and API for iterating over multiple implementations
#14282 Sync thread should avoid holding the spa config write lock
when possible
#14283 txg_sync should handle write errors in ZIL
#14359 More adaptive ARC eviction
#14469 Fix NULL pointer dereference in zio_ready()
#14479 zfs redact fails when dnodesize=auto
#14496 improve error message of zfs redact
#14500 Skip memory allocation when compressing holes
#14501 FreeBSD: don't verify recycled vnode for zfs control directory
#14502 partially revert PR 14304 (eee9362a7)
#14509 Fix per-jail zfs.mount_snapshot setting
#14514 Fix data race between zil_commit() and zil_suspend()
#14516 System-wide speculative prefetch limit
#14517 Use rw_tryupgrade() in dmu_bonus_hold_by_dnode()
#14519 Do not hold spa_config in ZIL while blocked on IO
#14523 Move dmu_buf_rele() after dsl_dataset_sync_done()
#14524 Ignore too large stack in case of dsl_deadlist_merge
#14526 Use .section .rodata instead of .rodata on FreeBSD
#14528 ICP: AES-GCM: Refactor gcm_clear_ctx()
#14529 ICP: AES-GCM: Unify gcm_init_ctx() and gmac_init_ctx()
#14532 Handle unexpected errors in zil_lwb_commit() without ASSERT()
#14544 icp: Prevent compilers from optimizing away memset()
in gcm_clear_ctx()
#14546 Revert zfeature_active() to static
#14556 Remove bad kmem_free() oversight from previous zfsdev_state_list
patch
#14563 Optimize the is_l2cacheable functions
#14565 FreeBSD: zfs_znode_alloc: lock the vnode earlier
#14566 FreeBSD: fix false assert in cache_vop_rmdir when replaying ZIL
#14567 spl: Add cmn_err_once() to log a message only on the first call
#14568 Fix incremental receive silently failing for recursive sends
#14569 Restore ASMABI and other Unify work
#14576 Fix detection of IBM Power8 machines (ISA 2.07)
#14577 Better handling for future crypto parameters
#14600 zcommon: Refactor FPU state handling in fletcher4
#14603 Fix prefetching of indirect blocks while destroying
#14633 Fixes in persistent error log
#14639 FreeBSD: Remove extra arc_reduce_target_size() call
#14641 Additional limits on hole reporting
#14649 Drop lying to the compiler in the fletcher4 code
#14652 panic loop when removing slog device
#14653 Update vdev state for spare vdev
#14655 Fix cloning into already dirty dbufs
#14678 Revert "Do not hold spa_config in ZIL while blocked on IO"
Obtained from: OpenZFS
OpenZFS commit: 431083f75b
In commit 0a5b942d4 the FreeBSD SECTION_STATIC macro was set to
".rodata". This assembler directive is supported by LLVM (as a
convenience alias for ".section .rodata") by not by GNU as.
This caused the FreeBSD builds that are done with gcc to fail.
Therefore, use ".section .rodata" instead, similar to the other
asm_linkage.h headers.
[mjg: cherry-picked from upstream zfs bf1bec394e
to unbreak gcc12 build]
Reviewed-by: Mateusz Guzik <mjguzik@gmail.com>
Reviewed-by: Attila Fülöp <attila@fueloep.org>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Dimitry Andric <dimitry@andric.com>
Closes#14526
When jail.conf set the nopersist flag during startup, it was
incorrectly destroying the per-jail ZFS settings.
PR: 260160
Reported by: imp (previous version), mm (upstream), freqlabs (upstream)
MFC after: immediately
Sponsored by: Modirum MDPay
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D38662
Notable upstream pull request merges:
#13805 Configure zed's diagnosis engine with vdev properties
#14110 zfs list: Allow more fields in ZFS_ITER_SIMPLE mode
#14121 Batch enqueue/dequeue for bqueue
#14123 arc_read()/arc_access() refactoring and cleanup
#14159 Bypass metaslab throttle for removal allocations
#14243 Implement uncached prefetch
#14251 Cache dbuf_hash() calculation
#14253 Allow reciever to override encryption property in case of replication
#14254 Restrict visibility of per-dataset kstats inside FreeBSD jails
#14255 Zero end of embedded block buffer in dump_write_embedded()
#14263 Cleanups identified by CodeQL and Coverity
#14264 Miscellaneous fixes
#14272 Change ZEVENT_POOL_GUID to ZEVENT_POOL to display pool names
#14287 FreeBSD: Remove stray debug printf
#14288 Colorize zfs diff output
#14289 deadlock between spa_errlog_lock and dp_config_rwlock
#14291 FreeBSD: Fix potential boot panic with bad label
#14292 Add tunable to allow changing micro ZAP's max size
#14293 Turn default_bs and default_ibs into ZFS_MODULE_PARAMs
#14295 zed: add hotplug support for spare vdevs
#14304 Activate filesystem features only in syncing context
#14311 zpool: do guid-based comparison in is_vdev_cb()
#14317 Pack zrlock_t by 8 bytes
#14320 Update arc_summary and arcstat outputs
#14328 FreeBSD: catch up to 1400077
#14376 Use setproctitle to report progress of zfs send
#14340 Remove some dead ARC code
#14358 Wait for txg sync if the last DRR_FREEOBJECTS might result in a hole
#14360 libzpool: fix ddi_strtoull to update nptr
#14364 Fix unprotected zfs_znode_dmu_fini
#14379 zfs_receive_one: Check for the more likely error first
#14380 Cleanup of dead code suggested by Clang Static Analyzer
#14397 Avoid passing an uninitialized index to dsl_prop_known_index
#14404 Fix reading uninitialized variable in receive_read
#14407 free_blocks(): Fix reports from 2016 PVS Studio FreeBSD report
#14418 Introduce minimal ZIL block commit delay
#14422 x86 assembly: fix .size placement and replace .align with .balign
Obtained from: OpenZFS
OpenZFS commit: 9cd71c8604
To quote from a comment above vput_final:
<quote>
* XXX Some filesystems pass in an exclusively locked vnode and strongly depend
* on the lock being held all the way until VOP_INACTIVE. This in particular
* happens with UFS which adds half-constructed vnodes to the hash, where they
* can be found by other code.
</quote>
As is there is no mechanism which allows filesystems to denote that a
vnode is fully initialized, consequently problems like the above are
only found the hard way(tm).
Add rudimentary support for state transitions, which in particular allow
to assert the vnode is not legally unlocked until its fate is decided
(either construction finishes or vgone is called to abort it).
The new field lands in a 1-byte hole, thus it does not grow the struct.
Bump __FreeBSD_version to 1400077
Reviewed by: kib (previous version)
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D37759
Notable upstream pull request merges:
#13782 Fix setting the large_block feature after receiving a snapshot
#14157 FreeBSD: stop using buffer cache-only routines on sync
#14172 zed: post a udev change event from spa_vdev_attach()
#14181 zed: unclean disk attachment faults the vdev
#14190 Bump checksum error counter before reporting to ZED
#14196 Remove atomics from zh_refcount
#14197 Don't leak packed recieved proprties
#14198 Switch dnode stats to wmsums
#14199 Remove few pointer dereferences in dbuf_read()
#14200 Micro-optimize zrl_remove()
#14204 Lua: Fix bad bitshift in lua_strx2number()
#14212 Zstd fixes
#14218 Avoid a null pointer dereference in zfs_mount() on FreeBSD
#14235 nopwrites on dmu_sync-ed blocks can result in a panic
#14236 zio can deadlock during device removal
#14247 Micro-optimize fletcher4 calculations
#14261 FreeBSD: zfs_register_callbacks() must implement error check
correctly
Obtained from: OpenZFS
OpenZFS commit: 59493b63c1
The import of OpenZFS PR 13758 causes a panic in zfsctl_is_node()
if ZFS is mounting as root filesystem. This implements a workaround
until the issue is resolved by authors.
Notable upstream pull request merges:
#13680 Add options to zfs redundant_metadata property
#13758 Allow mounting snapshots in .zfs/snapshot as a regular user
#13838 quota: disable quota check for ZVOL
#13839 quota: extend quota for dataset
#13973 Fix memory leaks in dmu_send()/dmu_send_obj()
#13977 Avoid unnecessary metaslab_check_free calling
#13978 PAM: Fix unchecked return value from zfs_key_config_load()
#13979 Handle possible null pointers from malloc/strdup/strndup()
#13997 zstream: allow decompress to fix metadata for uncompressed
records
#13998 zvol_wait logic may terminate prematurely
#14001 FreeBSD: Fix a pair of bugs in zfs_fhtovp()
#14003 Stop ganging due to past vdev write errors
#14039 Optimize microzaps
#14050 Fix draid2+2s metadata error on simultaneous 2 drive failures
#14062 zed: Avoid core dump if wholedisk property does not exist
#14077 Propagate extent_bytes change to autotrim thread
#14079 FreeBSD: vn_flush_cached_data: observe vnode locking contract
#14093 Fix ARC target collapse when zfs_arc_meta_limit_percent=100
#14106 Add ability to recompress send streams with new compression
algorithm
#14119 Deny receiving into encrypted datasets if the keys are not
loaded
#14120 Fix arc_p aggressive increase
#14129 zed: Prevent special vdev to be replaced by hot spare
#14133 Expose zfs_vdev_open_timeout_ms as a tunable
#14135 FreeBSD: Fix out of bounds read in zfs_ioctl_ozfs_to_legacy()
#14152 Adds the `-p` option to `zfs holds`
#14161 Handle and detect #13709's unlock regression
Obtained from: OpenZFS
OpenZFS commit: 2163cde450
This cherry-picks upstream ed566bf1cd
- Add a zfs_exit() call in an error path, otherwise a lock is
leaked.
- Remove the fid_gen > 1 check. That appears to be Linux-specific:
zfsctl_snapdir_fid() sets fid_gen to 0 or 1 depending on whether
the snapshot directory is mounted. On FreeBSD it fails, making
snapshot dirs inaccessible via NFS.
PR: 266236
MFC after: 3 days
Powerpc* doesn't support support floating point in the kernel yet,
so disable it on zfs for now.
This fixes "panic: altivec unavailable trap" when loading zfs.ko
Approved by: jhibbits
Sponsored by: Instituto de Pesquisas Eldorado (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D36894
This cherry-picks upstream eb9bec0a5d
The current value causes significant artificial slowdown during mass
parallel file removal, which can be observed both on FreeBSD and Linux
when running real workloads.
Sample results from Linux doing make -j 96 clean after an allyesconfig
modules build:
before: 4.14s user 6.79s system 48% cpu 22.631 total
after: 4.17s user 6.44s system 153% cpu 6.927 total
FreeBSD results in the ticket.
See https://github.com/openzfs/zfs/issues/13932
The breakage was introduced in OpenZFS commit 48cf170d5.
When a (different) fix solving this issue gets upstreamed
it will replace the current fix in the next merge from OpenZFS.
Notable upstream pull request merges:
#13725 Fix BLAKE3 tuneable and module loading on Linux and FreeBSD
#13756 FreeBSD: Organize sysctls
#13773 FreeBSD: add kqfilter support for zvol cdev
#13781 Importing from cachefile can trip assertion
#13794 Apply arc_shrink_shift to ARC above arc_c_min
#13798 Improve too large physical ashift handling
#13799 Revert "Avoid panic with recordsize > 128k, raw sending and
no large_blocks"
#13802 Add zfs.sync.snapshot_rename
#13831 zfs_enter rework
#13855 zfs recv hangs if max recordsize is less than received
recordsize
Obtained from: OpenZFS
OpenZFS commit: c629f0bf62
This removes some of the complexity needed to maintain HASBUF and
allows for removing injecting SAVENAME by filesystems.
Reviewed by: kib (previous version)
Differential Revision: https://reviews.freebsd.org/D36542
Notable upstream pull request merges:
#13717 Fix zpool status in case of unloaded keys
#13753 Prevent zevent list from consuming all of kernel memory
#13767 arcstat: fix -p option
#13785 Updates for snapshots_changed property
Obtained from: OpenZFS
OpenZFS commit: a582d52993
Notable upstream pull request merges:
#9372 Implement a new type of zfs receive: corrective receive (-c)
#13635 Add snapshots_changed as property
#13636 Add support for per dataset zil stats and use wmsum counters
#13643 Fix scrub resume from newly created hole
#13670 Fix memory allocation for the checksum benchmark
#13695 Skip checksum benchmarks on systems with slow cpu
#13711 zpool: fix redundancy check after vdev removal
Obtained from: OpenZFS
OpenZFS commit: b3d0568cfd
We don't need the compress rotuines, nor zstd_opt.c. Remove them.
Expand the number of places we omit code for IN_LIBSA (which are FreeBSD
specific). Due to the agressive optimization, though, this doesn't
reduce the size of the loader. It does reduce the number of 'false
positives' for places to omit to reduce the size as well as reducing the
build time slightly.
Sponsored by: Netflix
Reviewed by: tsoome, delphij
Differential Revision: https://reviews.freebsd.org/D36145
Notable upstream pull request merges:
#12321 Fix inflated quiesce time caused by lwb_tx during zil_commit()
#13244 zstd early abort
#13360 Verify BPs as part of spa_load_verify_cb()
#13452 More speculative prefetcher improvements
#13466 Expose zpool guids through kstats
#13476 Refactor Log Size Limit
#13484 FreeBSD: libspl: Add locking around statfs globals
#13498 Cancel in-progress rebuilds when we finish removal
#13499 zed: Take no action on scrub/resilver checksum errors
#13513 Remove wrong assertion in log spacemap
Obtained from: OpenZFS
OpenZFS commit: b9d98453f9
Notable upstream pull request merges:
#10662 zvol_wait: Ignore locked zvols
#12789 Improve log spacemap load time
#12812 Improved zpool status output, list all affected datasets
#13277 FreeBSD: Use NDFREE_PNBUF if available
#13302 Make zfs_max_recordsize default to 16M
#13311 Fix error handling in FreeBSD's get/putpages VOPs
#13345 FreeBSD: Fix translation from ABD to physical pages
#13373 zfs: holds: dequadratify
#13375 Corrected edge case in uncompressed ARC->L2ARC handling
#13388 Improve mg_aliquot math
#13405 Reduce dbuf_find() lock contention
#13406 FreeBSD: use zero_region instead of allocating a dedicated page
Obtained from: OpenZFS
OpenZFS commit: c0cf6ed679
Notable upstream pull request merges:
#9078: log xattr=sa create/remove/update to ZIL
#11919: Cross-platform xattr user namespace compatibility
#13014: Report dnodes with faulty bonuslen
#13016: FreeBSD: Fix zvol_cdev_open locking
#13019: spl: Don't check FreeBSD rwlocks for double initialization
#13027: Fix clearing set-uid and set-gid bits on a file when
replying a write
#13031: Add enumerated vdev names to 'zpool iostat -v' and
'zpool list -v'
#13074: Enable encrypted raw sending to pools with greater ashift
#13076: Receive checks should allow unencrypted child datasets
#13098: Avoid dirtying the final TXGs when exporting a pool
#13172: Fix ENOSPC when unlinking multiple files from full pool
Obtained from: OpenZFS
OpenZFS commit: a86e089415
First open locking changes were correctly applied to zvol_geom_open but
incorrectly applied to zvol_cdev_open, causing spa_namespace_lock to be
held indefinitely.
Make the first open locking in zvol_cdev_open match zvol_geom_open.
This change has been accepted upstream in openzfs/zfs#13016 but is not
yet merged.
Reviewed by: mav
Fixes: e92ffd9b62
Sponsored by: iXsystems, Inc.
Notable upstream pull request merges:
#12766 Fix error propagation from lzc_send_redacted
#12805 Updated the lz4 decompressor
#12851 FreeBSD: Provide correct file generation number
#12857 Verify dRAID empty sectors
#12874 FreeBSD: Update argument types for VOP_READDIR
#12896 Reduce number of arc_prune threads
#12934 FreeBSD: Fix zvol_*_open() locking
#12947 lz4: Cherrypick fix for CVE-2021-3520
#12961 FreeBSD: Fix leaked strings in libspl mnttab
#12964 Fix handling of errors from dmu_write_uio_dbuf() on FreeBSD
#12981 Introduce a flag to skip comparing the local mac when raw sending
#12985 Avoid memory allocations in the ARC eviction thread
Obtained from: OpenZFS
OpenZFS commit: 17b2ae0b24