freebsd-dev

Author	SHA1	Message	Date
Richard Yao	8e7ebf4e2d	Cleanup: Use C99 flexible array members instead of zero length arrays The Linux 5.16.14 kernel's coccicheck caught this. The semantic patch that caught it was: ./scripts/coccinelle/misc/flexible_array.cocci The Linux kernel's documentation makes a good case for why we should not use these: https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14372	2023-01-12 15:59:41 -08:00
Richard Yao	c9c3ce7976	Cleanup: Use kmem_zalloc() instead of memset() to zero memory The Linux 5.16.14 kernel's coccicheck caught this. The semantic patch that caught it was: ./scripts/coccinelle/api/alloc/zalloc-simple.cocci Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14372	2023-01-12 15:59:28 -08:00
Richard Yao	7384ec65cd	Cleanup: Remove unnecessary explicit casts of pointers from allocators The Linux 5.16.14 kernel's coccicheck caught these. The semantic patch that caught them was: ./scripts/coccinelle/api/alloc/alloc_cast.cocci Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14372	2023-01-12 15:59:12 -08:00
Gian-Carlo DeFazio	80d64bb85f	change how d_alias is replaced by du.d_alias d_alias may need to be converted to du.d_alias depending on the kernel version. d_alias is currently in only one place in the code which changes "hlist_for_each_entry(dentry, &inode->i_dentry, d_alias)" to "hlist_for_each_entry(dentry, &inode->i_dentry, d_u.d_alias)" as neccesary. This effectively results in a double macro expansion for code that uses the zfs headers but already has its own macro for just d_alias (lustre in this case). Remove the conditional code for hlist_for_each_entry and have a macro for "d_alias -> du.d_alias" instead. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Gian-Carlo DeFazio <defazio1@llnl.gov> Closes #14377	2023-01-12 10:14:04 -08:00
George Amanakis	eee9362a72	Activate filesystem features only in syncing context When activating filesystem features after receiving a snapshot, do so only in syncing context. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: George Amanakis <gamanakis@gmail.com> Closes #14304 Closes #14252	2023-01-11 18:00:39 -08:00
Brian Behlendorf	6320b9e68e	CI: remove unused packages/snaps Removing portions of packages/snaps directly with rm can result in unexpected errors when running `apt update`. Free up the additional space by removing (some) packages with the proper tools. This change frees up slightly less space than before, but it is expected to still be sufficient. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #14374	2023-01-11 15:18:51 -08:00
rob-wing	6f2ffd272c	zpool: do guid-based comparison in is_vdev_cb() is_vdev_cb() uses string comparison to find a matching vdev and will fallback to comparing the guid via a string. These changes drop the string comparison and compare the guids instead. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Rob Wing <rob.wing@klarasystems.com> Co-authored-by: Rob Wing <rob.wing@klarasystems.com> Sponsored-by: Seagate Technology Submitted-by: Klara, Inc. Closes #14311	2023-01-11 15:14:35 -08:00
Mateusz Piotrowski	926715b9fc	Turn default_bs and default_ibs into ZFS_MODULE_PARAMs The default_bs and default_ibs tunables control the default block size and indirect block size. So far, default_bs and default_ibs were tunable only on FreeBSD, e.g., sysctl vfs.zfs.default_ibs Remove the FreeBSD-specific sysctl code and expose default_bs and default_ibs as tunables on both Linux and FreeBSD using ZFS_MODULE_PARAM. One of the use cases for changing the values of those tunables is to lower the indirect block size, which may improve performance of large directories (as discussed during the OpenZFS Leadership Meeting on 2022-08-16). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: Mateusz Piotrowski <mateusz.piotrowski@klarasystems.com> Sponsored-by: Wasabi Technology, Inc. Closes #14293	2023-01-11 09:38:20 -08:00
Tony Hutter	4ba3eff2a6	Update META to 6.1 kernel ZFS successfully builds against the 6.1.4 kernel. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #14371	2023-01-10 15:53:33 -08:00
Mateusz Piotrowski	a4b21eadec	Add tunable to allow changing micro ZAP's max size This change turns `MZAP_MAX_BLKSZ` into a `ZFS_MODULE_PARAM()` called `zap_micro_max_size`. As a result, we can experiment with different micro ZAP sizes to improve directory size scaling. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: Mateusz Piotrowski <mateuszpiotrowski@klarasystems.com> Co-authored-by: Toomas Soome <toomas.soome@klarasystems.com> Signed-off-by: Mateusz Piotrowski <mateuszpiotrowski@klarasystems.com> Sponsored-by: Wasabi Technology, Inc. Closes #14292	2023-01-10 13:41:54 -08:00
Alyssa Ross	1f19826c9a	etc/systemd/zfs-mount-generator: avoid strndupa The non-standard strndupa function is not implemented by musl libc, and can be dangerous due to its potential to blow the stack. (musl _does_ implement strdupa, used elsewhere in this function.) With a similar amount of code, we can use a heap allocation to construct the pool name, which is musl-friendly and doesn't have potential stack problems. (Why care about musl when systemd only supports glibc? Some distros patch systemd with portability fixes, and it would be nice to be able to use ZFS on those distros.) Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alyssa Ross <alyssa.ross@unikie.com> Closes #14327	2023-01-10 13:40:31 -08:00
Matthew Ahrens	fc45975ec8	Batch enqueue/dequeue for bqueue The Blocking Queue (bqueue) code is used by zfs send/receive to send messages between the various threads. It uses a shared linked list, which is locked whenever we enqueue or dequeue. For workloads which process many blocks per second, the locking on the shared list can be quite expensive. This commit changes the bqueue logic to have 3 linked lists: 1. An enquing list, which is used only by the (single) enquing thread, and thus needs no locks. 2. A shared list, with an associated lock. 3. A dequing list, which is used only by the (single) dequing thread, and thus needs no locks. The entire enquing list can be moved to the shared list in constant time, and the entire shared list can be moved to the dequing list in constant time. These operations only happen when the `fill_fraction` is reached, or on an explicit flush request. Therefore, the lock only needs to be acquired infrequently. The API already allows for dequing to block until an explicit flush, so callers don't need to be changed. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Wilson <george.wilson@delphix.com> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #14121	2023-01-10 13:39:22 -08:00
Brian Behlendorf	0c8fbe5b6a	ztest: update ztest_dmu_snapshot_create_destroy() ECHRNG is returned when the channel program encounters a runtime error. For example, this can happen when a snapshot doesn't exist. We handle this error the same way as the existing EEXIST and ENOENT error checks. Additionally, improve the internal debug message to include the error describing why a pool couldn't be opened. Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #14351	2023-01-10 13:27:48 -08:00
Brian Behlendorf	549aafb7c8	ztest: ztest_dsl_prop_set_uint64() ENOSPC consistency It is possible for ztest_dsl_prop_set_uint64() to fail with ENOSPC and this needs to be handled consistently. Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #14351	2023-01-10 13:27:48 -08:00
Brian Behlendorf	f7788883ab	ztest: reduce `zpool split` frequency There's no need to so aggressively test splitting a pool. Reduce the occurence of this test to once every 10 seconds. Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #14351	2023-01-10 13:27:48 -08:00
Brian Behlendorf	4208a052c2	ztest: update expectation for sparing a special device Commit `c23738c70e` modified the expected behavior of attach to prevent hot spares from being used as special vdev replacements. We update ztest's expectations accordingly to prevent it from failing when testing the updated behavior. Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #14351	2023-01-10 13:26:44 -08:00
Matthew Ahrens	40d7e971ff	ztest fails assertion in zio_write_gang_member_ready() Encrypted blocks can have up to 2 DVA's, as the third DVA is reserved for the salt+IV. However, dmu_write_policy() allows non-encrypted blocks (e.g. DMU_OT_OBJSET) inside encrypted datasets to request and allocate 3 DVA's, since they don't need a salt+IV (they are merely authenicated). However, if such a block becomes a gang block, the gang code incorrectly limits the gang block header to 2 DVA's. This leads to a "NDVAs inversion", where a parent block (the gang block header) has less DVA's than its children (the gang members), causing an assertion failure in zio_write_gang_member_ready(). This commit addresses the problem by only restricting the gang block header to 2 DVA's if the block is actually encrypted (and thus its gang block members can have at most 2 DVA's). Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #14250 Closes #14356	2023-01-09 16:43:45 -08:00
Charles Suh	44a78c05b3	libzpool: fix ddi_strtoull to update nptr Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Charles Suh <charles.suh@gmail.com> Closes #14360	2023-01-09 12:49:35 -08:00
Ameer Hamza	5091867ee6	zed: add hotplug support for spare vdevs This commit supports for spare vdev hotplug. The spare vdev associated with all the pools will be marked as "Removed" when the drive is physically detached and will become "Available" when the drive is reattached. Currently, the spare vdev status does not change on the drive removal and the same is the case with reattachment. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ameer Hamza <ahamza@ixsystems.com> Closes #14295	2023-01-09 12:43:03 -08:00
Alexander Motin	289f7e6adb	Remove some dead ARC code. (#14340 ) Every ARC buffer holds a reference on the header. It means headers with buffers are never evictable. When we are evicting a header, there can be no more buffers to free. Just assert that. b_evict_lock seems not protecting anything now. Remove it. Buffers checksum should also be freed with the last uncompressed buffer, so it should not be there also when we are evicting the header. Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc.	2023-01-09 10:45:17 -08:00
Coleman Kane	a0105f6cd4	linux 6.2 compat: bio->bi_rw was renamed bio->bi_opf The bi_rw member of struct bio was renamed to bi_opf in Linux 6.2. As well, Linux's implementation of bio_set_op_attrs(...) has been removed. The HAVE_BIO_BI_OPF macro already appears to be defined, but the removal of the bio_set_op_attrs(...) implementation makes the build fall back on the locally-defined implementation, which isn't updated for the bio->bi_opf change. This commit adds that update. Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Coleman Kane <ckane@colemankane.org> Closes #14324 Closes #14331	2023-01-06 14:43:22 -08:00
Coleman Kane	884a69357f	linux 6.2 compat: get_acl() got moved to get_inode_acl() in 6.2 Linux 6.2 renamed the get_acl() operation to get_inode_acl() in the inode_operations struct. This should fix Issue #14323. Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Coleman Kane <ckane@colemankane.org> Closes #14323 Closes #14331	2023-01-06 14:40:54 -08:00
Antonio Russo	556ed09537	Introduce ZFS_LINUX_REQUIRE_API autoconf macro Currently, if API tests fail, we either ignore the failures, or unconditionally halt the kernel build. This leads to situations where incompatibilities with existing APIs may develop, but not trip the configure compatibility checks. This introduces a new mechanism to require APIs for kernels above a particular version. While not perfect, this at least guarantees mainline kernels do not break existing APIs without at least providing some warning. Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Antonio Russo <aerusso@aerusso.net> Closes #14343	2023-01-06 14:33:53 -08:00
Antonio Russo	d27c81847b	Linux 6.1 compat: open inside tmpfile() Linux 863f144 modified the .tmpfile interface to pass a struct file, rather than a struct dentry, and expect the tmpfile implementation to open inside of tmpfile(). This patch implements a configuration test that checks for this new API and appropriately sets a HAVE_TMPFILE_DENTRY flag that tracks this old API. Contingent on this flag, the appropriate API is implemented. Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Antonio Russo <aerusso@aerusso.net> Closes #14301 Closes #14343	2023-01-06 14:33:00 -08:00
Antonio Russo	a7304ab9c1	ZTS: close in mmapwrite.c mmapwrite is used during the ZTS to identify issues with mmap-ed files. This helper program exercises this pathway by continuously writing to a file. `ee6bf97c7` modified the writing threads to terminate after a set amount of total data is written. This change allows standard program execution to reach the end of a writer thread without closing the file descriptor, introducing a resource "leak." This patch appeases resource leak analyses by close()-ing the file at the end of the thread. Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Antonio Russo <aerusso@aerusso.net> Closes #14353	2023-01-06 10:52:08 -08:00
Antonio Russo	ee6bf97c77	ZTS: limit mmapwrite file size mmapwrite spawns several threads, all of which perform writes on a file for the purpose of testing the behavior of mmap(2)-ed files. One thread performs an mmap and a write to the beginning of that region, while the others perform regular writes after lseek(2)-ing the end of the file. Because these regular writes are set in a while (1) loop, they will write an unbounded amount of data to disk. The mmap_write_001_pos test script SIGKILLs them after 30 seconds, but on fast testbeds, this may be enough time to exhaust the available space in the filesystem, leading to spurious test failures. Instead, limit the total file size by checking that the lseek return value is no greater than 250 * 1024*1024 bytes, which is less than the default minimum vdev size defined in includes/default.cfg . Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Antonio Russo <aerusso@aerusso.net> Closes #14277 Closes #14345	2023-01-05 12:50:30 -08:00
Clemens Lang	8352e9dfae	contrib: dracut: Do not timeout waiting for pw systemd-ask-password has a default timeout of 90 seconds, which means that dracut will fall back to the rescue shell 4.5 minutes after boot if no password is entered. This is undesirable when combined with, for example, unlocking remotely using dracut-sshd and systemd-tty-ask-password-agent. See also https://github.com/gsauthof/dracut-sshd#timeout and https://bugzilla.redhat.com/show_bug.cgi?id=868421. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Clemens Lang <neverpanic@gmail.com> Closes #14341	2023-01-05 12:07:43 -08:00
Richard Yao	1f3bc5ea80	Illumos #15286 : do_composition() needs sign awareness Authored by: Dan McDonald <danmcd@mnx.io> Reviewed by: Patrick Mooney <pmooney@pfmooney.com> Reviewed by: Richard Lowe <richlowe@richlowe.net> Approved by: Joshua M. Clulow <josh@sysmgr.org> Ported-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Illumos-issue: https://www.illumos.org/issues/15286 Illumos-commit: `f137b22e73` Porting Notes: The patch in illumos did not have much of a commit message, and did not provide attribution to the reporter, while original patch proposed to OpenZFS did, so I am listing the reporter (myself) and original patch author (also myself) below while including the original commit message with some minor corrections as part of the porting notes: In do_composition(), we have: size = u8_number_of_bytes[p]; if (size <= 1 \|\| (p + size) > oslast) break; There, we have type promotion from int8_t to size_t, which is unsigned. C will sign extend the value as part of the widening before treating the value as unsigned and the negative values we can counter are error values from U8_ILLEGAL_CHAR and U8_OUT_OF_RANGE_CHAR, which are -1 and -2 respectively. The unsigned versions of these under two's complement are SIZE_MAX and SIZE_MAX-1 respectively. The bounds check is written under the assumption that `size <= 1` does a signed comparison. This is followed by a pointer comparison to see if the string has the correct length, which is fine. A little further down we have: for (i = 0; i < size; i++) tc[i] = p++; When an error condition is encountered, this will attempt to iterate at least SIZE_MAX-1 times, which will massively overflow the buffer, which is not fine. The kernel will kill the loop as soon as it hits the kernel stack guard on Linux systems built with CONFIG_VMAP_STACK=y, which should be just about all of them. That prevents arbitrary code execution and just about any other bad thing that a black hat attacker might attempt with knowledge of this buffer overflow. Other systems' kernels have mitigations for unbounded in-kernel buffer overflows that will catch this too. Also, the patch in illumos-gate made an effort to fix C style issues that had been fixed in the OpenZFS/ZFSOnLinux repository. Those issues had been mentioned in the email that I originally sent them about this issue. One of the fixes had not been already done, so it is included. Another to collect_a_seq()'s arguments was handled differently in OpenZFS. For the sake of avoiding unnecessary differences, it has been adopted. This has the interesting effect that if you correct the paths in the illumos-gate patch to match the current OpenZFS repository, you can reverse apply it cleanly. Original-patch-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reported-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Co-authored-by: Dan McDonald <danmcd@mnx.io> Closes #14318 Closes #14342	2023-01-05 11:16:21 -08:00
Matthew Ahrens	b72efb7511	removal of LegacyVersion broke ax_python_dev.m4 The 22.0 release of the python `packaging` package removed the `LegacyVersion` trait, causing ZFS to no longer compile. This commit replaces the sections of `ax_python_dev.m4` that rely on `LegacyVersion` with updated implementations from the upstream `autoconf-archive`. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #14297	2023-01-05 11:04:24 -08:00
Mateusz Guzik	f25f1f9091	FreeBSD: catch up to 1400077 Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Closes #14328	2023-01-05 10:56:40 -08:00
Martin Rüegg	f6f215f07f	Fix shebang for helper script of deb-utils Shebang was missing the `!` between `#` and the actual path. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Martin Rüegg <martin.rueegg@metaworx.ch> Closes #14339	2023-01-05 10:50:00 -08:00
Martin Rüegg	fb000f7867	Add quotation marks around `$PATH` for deb-utils Fix #14338, failing to build deb-utils if existing `$PATH` variable would include a whitespace. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Martin Rüegg <martin.rueegg@metaworx.ch> Closes #14339	2023-01-05 10:49:33 -08:00
Alexander Motin	db832c47fe	Pack zrlock_t by 8 bytes On FreeBSD this reduces this structure size from 64 to 56 bytes. dnode_handle_t respectively reduces from 72 to 64 bytes. It sounds like a waste to need 72 bytes to be able to relocate 808 bytes of dnode_t, which relocation on FreeBSD is not even supported. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #14317	2023-01-05 09:31:55 -08:00
Alexander Motin	792a6ee462	Update arc_summary and arcstat outputs Recent ARC commits added new statistic counters, such as iohits, uncached state, etc. Represent those. Also some of previously reported numbers were confusing or even made no sense. Cleanup and restructure existing reports. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Issue #14115 Issue #14123 Issue #14243 Closes #14320	2023-01-05 09:29:13 -08:00
Alexander Motin	bacf366fe2	Hide b_freeze_* under ZFS_DEBUG This saves 40 bytes per full ARC header, reducing it on FreeBSD from 240 to 200 bytes on production bits. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Closes #14315	2023-01-05 10:15:31 -07:00
Alexander Motin	ed2f7ba08d	Implement uncached prefetch Previously the primarycache property was handled only in the dbuf layer. Since the speculative prefetcher is implemented in the ARC, it had to be disabled for uncacheable buffers. This change gives the ARC knowledge about uncacheable buffers via arc_read() and arc_write(). So when remove_reference() drops the last reference on the ARC header, it can either immediately destroy it, or if it is marked as prefetch, put it into a new arc_uncached state. That state is scanned every second, evicting stale buffers that were not demand read. This change also tracks dbufs that were read from the beginning, but not to the end. It is assumed that such buffers may receive further reads, and so they are stored in dbuf cache. If a following reads reaches the end of the buffer, it is immediately evicted. Otherwise it will follow regular dbuf cache eviction. Since the dbuf layer does not know actual file sizes, this logic is not applied to the final buffer of a dnode. Since uncacheable buffers should no longer stay in the ARC for long, this patch also tries to optimize I/O by allocating ARC physical buffers as linear to allow buffer sharing. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Wilson <george.wilson@delphix.com> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #14243	2023-01-04 17:29:54 -07:00
Alexander Motin	c935fe2e92	arc_read()/arc_access() refactoring and cleanup ARC code was many times significantly modified over the years, that created significant amount of tangled and potentially broken code. This should make arc_access()/arc_read() code some more readable. - Decouple prefetch status tracking from b_refcnt. It made sense originally, but became highly cryptic over the years. Move all the logic into arc_access(). While there, clean up and comment state transitions in arc_access(). Some transitions were weird IMO. - Unify arc_access() calls to arc_read() instead of sometimes calling it from arc_read_done(). To avoid extra state changes and checks add one more b_refcnt for ARC_FLAG_IO_IN_PROGRESS. - Reimplement ARC_FLAG_WAIT in case of ARC_FLAG_IO_IN_PROGRESS with the same callback mechanism to not falsely account them as hits. Count those as "iohits", an intermediate between "hits" and "misses". While there, call read callbacks in original request order, that should be good for fairness and random speculations/allocations/aggregations. - Introduce additional statistic counters for prefetch, accounting predictive vs prescient and hits vs iohits vs misses. - Remove hash_lock argument from functions not needing it. - Remove ARC_FLAG_PREDICTIVE_PREFETCH, since it should be opposite to ARC_FLAG_PRESCIENT_PREFETCH if ARC_FLAG_PREFETCH is set. We may wish to add ARC_FLAG_PRESCIENT_PREFETCH to few more places. - Fix few false positive tests found in the process. Reviewed-by: George Wilson <gwilson@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #14123	2022-12-22 12:10:24 -08:00
Ryan Moeller	dc8c2f6158	FreeBSD: Fix potential boot panic with bad label vdev_geom_read_pool_label() can leave NULL in configs. Check for it and skip consistently when generating rootconf. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #14291	2022-12-22 11:50:09 -08:00
Matthew Ahrens	018f26041d	deadlock between spa_errlog_lock and dp_config_rwlock There is a lock order inversion deadlock between `spa_errlog_lock` and `dp_config_rwlock`: A thread in `spa_delete_dataset_errlog()` is running from a sync task. It is holding the `dp_config_rwlock` for writer (see `dsl_sync_task_sync()`), and waiting for the `spa_errlog_lock`. A thread in `dsl_pool_config_enter()` is holding the `spa_errlog_lock` (see `spa_get_errlog_size()`) and waiting for the `dp_config_rwlock` (as reader). Note that this was introduced by #12812. This commit address this by defining the lock ordering to be dp_config_rwlock first, then spa_errlog_lock / spa_errlist_lock. spa_get_errlog() and spa_get_errlog_size() can acquire the locks in this order, and then process_error_block() and get_head_and_birth_txg() can verify that the dp_config_rwlock is already held. Additionally, a buffer overrun in `spa_get_errlog()` is corrected. Many code paths didn't check if `*count` got to zero, instead continuing to overwrite past the beginning of the userspace buffer at `uaddr`. Tested by having some errors in the pool (via `zinject -t data /path/to/file`), one thread running `zpool iostat 0.001`, and another thread runs `zfs destroy` (in a loop, although it hits the first time). This reproduces the problem easily without the fix, and works with the fix. Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Reviewed-by: George Amanakis <gamanakis@gmail.com> Reviewed-by: George Wilson <gwilson@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #14239 Closes #14289	2022-12-22 11:48:49 -08:00
Brian Behlendorf	29e1b089c1	Documentation corrections - Update the link to the OpenZFS Code of Conduct. - Remove extra "the" from contrib/initramfs/scripts/zfs Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #14298 Closes #14307	2022-12-22 11:34:28 -08:00
Brian Behlendorf	b4cd4fe1aa	Revert "zdb: zdb_ddt_leak_init() reads uninitialized memory..." This reverts commit `d30db519af`. With this change applied zloop.sh fails reliably with the following ASSERT. zio_wait(zio_claim(NULL, zcb->zcb_spa, refcnt ? 0 : spa_min_claim_txg( zcb->zcb_spa), bp, NULL, NULL, ZIO_FLAG_CANFAIL)) == 0 (0x2 == 0x0) ASSERT at cmd/zdb/zdb.c:5452:zdb_count_block() Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #14306	2022-12-21 09:17:00 -08:00
George Melikov	bd9dc5a1dc	systemd: set restart=always for zfs-zed.service Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: George Melikov <mail@gmelikov.ru> Co-authored-by: Attila Fülöp <attila@fueloep.org> Closes #14294	2022-12-19 16:08:15 -08:00
Ethan Coe-Renner	fb11b1570a	Add color output to zfs diff. This adds support to color zfs diff (in the style of git diff) conditional on the ZFS_COLOR environment variable. Signed-off-by: Ethan Coe-Renner <coerenner1@llnl.gov>	2022-12-15 10:14:32 -08:00
Doug Rabson	24502bd3a7	FreeBSD: Remove stray debug printf Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: Doug Rabson <dfr@rabson.org> Closes #14286 Closes #14287	2022-12-13 17:35:07 -08:00
Umer Saleem	e6e31dd540	Add native-deb* targets to build native Debian packages In continuation of previous #13451, this commits adds native-deb* targets for make to build native debian packages. Github workflows are updated to build and test native Debian packages. Native packages only build with pre-configured paths (see the dh_auto_configure section in contrib/debian/rules.in). While building native packages, paths should not be configured. Initial config flags e.g. '--enable-debug' are replaced in contrib/debian/rules.in. Additional packages on top of existing zfs packages required to build native packages include debhelper-compat, dh-python, dkms, po-debconf, python3-all-dev, python3-sphinx. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Umer Saleem <usaleem@ixsystems.com> Closes #14265	2022-12-13 17:33:05 -08:00
Richard Yao	f3f5263f8a	Zero end of embedded block buffer in dump_write_embedded() This fixes a kernel stack leak. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Tested-by: Nicholas Sherlock <n.sherlock@gmail.com> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #13778 Closes #14255	2022-12-13 17:31:47 -08:00
Ameer Hamza	9be34ec99e	Allow receiver to override encryption properties in case of replication Currently, the receiver fails to override the encryption property for the plain replicated dataset with the error: "cannot receive incremental stream: encryption property 'encryption' cannot be set for incremental streams.". The problem is resolved by allowing the receiver to override the encryption property for plain replicated send. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ameer Hamza <ahamza@ixsystems.com> Closes #14253 Closes #13533	2022-12-13 17:30:46 -08:00
Richard Yao	3236c0b891	Cache dbuf_hash() calculation We currently compute a 64-bit hash three times, which consumes 0.8% CPU time on ARC eviction heavy workloads. Caching the 64-bit value in the dbuf allows us to avoid that overhead. Sponsored-By: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Richard Yao <richard.yao@klarasystems.com> Closes #14251	2022-12-13 17:29:21 -08:00
Allan Jude	dc95911d21	zfs list: Allow more fields in ZFS_ITER_SIMPLE mode If the fields to be listed and sorted by are constrained to those populated by dsl_dataset_fast_stat(), then zfs list is much faster, as it does not need to open each objset and reads its properties. A previous optimization by Pawel Dawidek (`0cee24064a`) took advantage of this to make listing snapshot names sorted only by name much faster. However, it was limited to `-o name -s name`, this work extends this optimization to work with: - name - guid - createtxg - numclones - inconsistent - redacted - origin and could be further extended to any other properties supported by dsl_dataset_fast_stat() or similar, that do not require extra locking or reading from disk. This was committed before (9a9e2e343dfa2af28bf7910de77ae73aa006de62), but was reverted due to a regression when used with an older kernel. If the kernel does not populate zc->zc_objset_stats, we now fallback to getting the properties via the slower interface, to avoid problems with newer userland and older kernels. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Allan Jude <allan@klarasystems.com> Closes #14110	2022-12-13 17:27:54 -08:00
Marcel Menzel	70ac2654f5	Change ZEVENT_POOL_GUID to ZEVENT_POOL to display pool names Outgoing mails for ZFS pool events include the pool GUID, but not the actual pool name. Let's change this for better readability, as it is already done in the mails for finished pool resilvers. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: Marcel Menzel <mail@mcl.gg> Closes #14272	2022-12-13 17:26:10 -08:00

1 2 3 4 5 ...

8328 Commits