freebsd-nq

Author	SHA1	Message	Date
Toomas Soome	b40aaca6dd	loader should support large_dnode The zfsonlinux feature large_dnode is not yet supported by the loader. Reviewed by: avg, allanjude Differential Revision: https://reviews.freebsd.org/D12288	2017-09-12 13:45:04 +00:00
Andriy Gapon	1a2ddb2997	fix a fallout from the ZTOV tightening, r323479 MFC after: 13 days X-MFC with: r323479	2017-09-12 13:21:14 +00:00
Andriy Gapon	bcab65cab5	zfsctl_snapdir_lookup should be able to handle an uncovered vnode The uncovered vnode is possible because there is no guarantee that its hold count would go to zero (and it would be inactivated and reclaimed) immediately after a covering filesystem is unmounted. So, such a vnode should be expected and it is possible to re-use it without any trouble. MFC after: 3 weeks Sponsored by: Panzura	2017-09-12 06:06:58 +00:00
Andriy Gapon	c09d0da8d1	zfs_ctldir: remove obsolete / bogus ARGSUSED lint directives None of the tagged functions had unused parameters. MFC after: 1 week	2017-09-12 06:05:30 +00:00
Andriy Gapon	65b38f7311	zfsvfs_hold: assert that the busied filesystem can not be unmounted This is a FreeBSD specific feature. MFC after: 3 weeks Sponsored by: Panzura	2017-09-12 06:04:50 +00:00
Andriy Gapon	d092f79489	zfs_get_vfs: reference a requested filesystem instead of vfs_busy-ing it The only consumer of zfs_get_vfs, zfs_unmount_snap, does not need the filesystem to be busy, it just need a reference that it can pass to dounmount. Also, previously the code was racy as it unbusied the filesystem before taking a reference on it. Now the code should be simpler and safer. MFC after: 2 weeks Sponsored by: Panzura	2017-09-12 06:04:01 +00:00
Andriy Gapon	f7519dbb76	zfs: tighten debug versions of ZTOV and VTOZ MFC after: 2 weeks Sponsored by: Panzura	2017-09-12 06:02:21 +00:00
Andriy Gapon	970165f190	MFV r323111: 8569 problem with inline functions in abd.h illumos/illumos-gate@37e84ab74e `37e84ab74e` https://www.illumos.org/issues/8569 C [C99] has peculiar rules for inline functions that are different from the C++ rules. Unlike C++ where inline is "fire and forget", in C a programmer must pay attention to the function's storage class / visibility. The main problem is with the case where a compiler decides to not inline a call to the function declared as inline. Some relevant links: - http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka15831.html - http://www.drdobbs.com/the-new-c-inline-functions/184401540 The summary is that either the inline functions should be declared 'static inline' or one of the compilation units (.c files) must provide a callable externally visible function definition. In the former case, the compiler would automatically create a local non-inlined function instance in every compilation unit where it's needed. In the latter case the single external definition is used to satisfy any non-inlined calls in all compilation units. As things stand right now, we can get an undefined reference error under certain combinations of compilers and compiler options. For example, this is what I get on FreeBSD when compiling with clang 4.0.0 and -O1: In function `abd_free': /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/abd.c:385: undefined reference to `abd_is_linear' Reviewed by: Matt Ahrens <mahrens@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Andriy Gapon <avg@FreeBSD.org> MFC after: 1 week	2017-09-11 12:15:49 +00:00
Andriy Gapon	25625d8746	Revert r322601, Mark ZFS ABD inline functions static An alternative fix is to be merged from illumos shortly.	2017-09-11 12:08:20 +00:00
Andriy Gapon	3d9a0e564d	MFV r323110: 8558 lwp_create() returns EAGAIN on system with more than 80K ZFS filesystems illumos/illumos-gate@216d7723a1 `216d7723a1` https://www.illumos.org/issues/8558 On a system with more than 80K ZFS filesystems, we've seen cases where lwp_create() will start to fail by returning EAGAIN. The problem being, for each of those 80K ZFS filesystems, a taskq will be created for each dataset as part of the ZIL for each dataset. For each of these taskq's, a kernel thread will be created which results in 24KB being allocated for each thread. With enough of these 24KB allocations, we eventually exhaust the memory region set aside for these allocations. Currently, segkpsize is set to a value of 2GB, which means we can only support about 80K filesystems; 2GB / 24KB = ~80K. The lwp_create() failure comes into play due to the fact that LWP creation also allocates 24KB from this same region of memory. Thus, if we've exhausted this region of memory due to the number of ZIL taskq's, there won't be any memory avaible to allow the call to lwp_create() to succeed. FreeBSD note: I haven't created sysctl-s for the new ZIL clean parameters. Let's add them if anyone requires to tune them. Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Sebastien Roy <sebastien.roy@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Prakash Surya <prakash.surya@delphix.com> MFC after: 3 weeks	2017-09-11 11:31:43 +00:00
Andriy Gapon	90354c3200	MFV r323107: 8414 Implemented zpool scrub pause/resume illumos/illumos-gate@1702cce751 `1702cce751` FreeBSD note: rather than merging the zpool.8 update I copied the zpool scrub section from the illumos zpool.1m to FreeBSD zpool.8 almost verbatim. Now that the illumos page uses the mdoc format, it was an easier option. Perhaps the change is not in perfect compliance with the FreeBSD style, but I think that it is acceptible. https://www.illumos.org/issues/8414 This issue tracks the port of scrub pause from ZoL: https://github.com/zfsonlinux/zfs/pull/6167 Currently, there is no way to pause a scrub. Pausing may be useful when the pool is busy with other I/O to preserve bandwidth. Description This patch adds the ability to pause and resume scrubbing. This is achieved by maintaining a persistent on-disk scrub state. While the state is 'paused' we do not scrub any more blocks. We do however perform regular scan housekeeping such as freeing async destroyed and deadlist blocks while paused. Motivation and Context Scrub pausing can be an I/O intensive operation and people have been asking for the ability to pause a scrub for a while. This allows one to preserve scrub progress while freeing up bandwidth for other I/O. Reviewed by: George Melikov <mail@gmelikov.ru> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Brad Lewis <brad.lewis@delphix.com> Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com> Reviewed by: Matt Ahrens <mahrens@delphix.com> Approved by: Dan McDonald <danmcd@joyent.com> Author: Alek Pinchuk <apinchuk@datto.com> MFC after: 2 weeks	2017-09-09 11:00:07 +00:00
Kurt Lidl	a8273e4371	Enable dtrace support for mips64 and the ERL kernel config Turn on the required options in the ERL config file, and ensure that the fbt module is listed as a dependency for mips in the modules/dtrace/dtraceall/dtraceall.c file. PR: 220346 Reviewed by: gnn, markj MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D12227	2017-09-06 03:19:52 +00:00
Alan Somers	0df3d85af4	Fix remounting ZFS filesystem with "zfs mount" "zfs mount -o" passes a list of mount options directly to nmount(2) after sanity checking them. In particular, zfs(8) will refuse to mount an already existing file system unless "remount" is specified in the option list. However, the "remount" option only exists in Illumos. FreeBSD's equivalent is "update". PR: 221985 Reviewed by: avg MFC after: 3 weeks Sponsored by: Spectra Logic Corp Differential Revision: https://reviews.freebsd.org/D12233	2017-09-05 19:40:04 +00:00
Baptiste Daroussin	91f1caeccb	Add sysctls for arc shrinking and growing values The default value for arc_no_grow_shift may not be optimal when using several GiB ARC. Expose it via sysctl allows users to tune it easily. Also expose arc_grow_retry via sysctl for the same reason. The default value of 60s might, in case of intensive load, be too long. Submitted by: Nikita Kozlov <nikita.kozlov@blade-group.com> Reviewed by: mav, manu, bapt MFC after: 2 weeks Sponsored by: blade Differential Revision: https://reviews.freebsd.org/D12144	2017-08-31 13:02:17 +00:00
Ed Maste	3c3d2ba6fe	zfs: do not advertise edonr which is not yet supported illumos 4185 ("add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R") was intentionally merged only partially in r289422, without adding support for skein, sha512 and edonr on FreeBSD. Support for skein and sha512 was added later on, but edonr is still not implemented in FreeBSD. Prior to this commit zfs(8) correctly rejected edonr, but with an error message that claimed support: fk@r500 ~ $zfs set checksum=edonr tank cannot set property for 'tank': 'checksum' must be one of 'on \| off \| fletcher2 \| fletcher4 \| sha256 \| sha512 \| skein \| edonr' PR: 204055 Submitted by: Fabian Keil Approved by: allanjude Obtained from: ElectroBSD MFC after: 1 week	2017-08-29 22:24:22 +00:00
John Baldwin	ac3b479ec8	Add a guard around _ILP32 for mips. This is already done for other architectures in this file and fixes the build with clang.	2017-08-21 17:45:06 +00:00
John Baldwin	b99836cea9	Mark ZFS ABD inline functions static. When built with -fno-inline-functions zfs.ko contains undefined references to these functions if they are only marked inline. Reviewed by: avg (earlier version) MFC after: 1 week Sponsored by: Chelsio Communications	2017-08-16 23:40:32 +00:00
Alan Somers	69b14f7acd	Fix some ZFS debugging messages sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c Be more careful about the use of provider names vs vdev names in ZFS_LOG statements. MFC after: 3 weeks Sponsored by: Spectra Logic Corp	2017-08-15 15:20:04 +00:00
Andriy Gapon	984d43cca5	MFV r322242: 8373 TXG_WAIT in ZIL commit path illumos/illumos-gate@d28671a3b0 `d28671a3b0` https://www.illumos.org/issues/8373 The code that writes ZIL blocks uses dmu_tx_assign(TXG_WAIT) to assign a transaction to a transaction group. That seems to be logically incorrect as writing of the ZIL block does not introduce any new dirty data. Also, when there is a lot of dirty data, the call can introduce significant delays into the ZIL commit path, thus affecting all synchronous writes. Additionally, ARC throttling may affect the ZIL writing. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Approved by: Dan McDonald <danmcd@joyent.com> Author: Andriy Gapon <avg@FreeBSD.org> MFC after: 2 weeks	2017-08-08 11:26:03 +00:00
Andriy Gapon	2653426e89	MFV r322240: 8491 uberblock on-disk padding to reserve space for smoothly merging zpool checkpoint & MMP in ZFS illumos/illumos-gate@79c2b812ee `79c2b812ee` https://www.illumos.org/issues/8491 The zpool checkpoint feature in DxOS added a new field in the uberblock. The Multi-Modifier Protection Pull Request from ZoL adds two new fields in the uberblock (Reference: https://github.com/zfsonlinux/zfs/pull/6279). As these two changes come from two different sources and once upstreamed and deployed will introduce an incompatibility with each other we want to upstream a change that will reserve the padding for both of them so integration goes smoothly and everyone gets both features. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Olaf Faaland <faaland1@llnl.gov> Approved by: Gordon Ross <gwr@nexenta.com> Author: Serapheim Dimitropoulos <serapheim@delphix.com> MFC after: 3 weeks	2017-08-08 11:21:58 +00:00
Andriy Gapon	6f2f8727e3	MFV r322238: 7915 checks in l2arc_evict could use some cleaning up illumos/illumos-gate@267ae6c3a8 `267ae6c3a8` https://www.illumos.org/issues/7915 l2arc_evict() is strictly serialized with respect to l2arc_write_buffers() and l2arc_write_done(). Normally, l2arc_evict() and l2arc_write_buffers() are called from the same thread, so they can not be concurrent. Also, l2arc_write_buffers() uses zio_wait() on the parent zio of all cache zio-s. That ensures that l2arc_write_done() is completed before l2arc_write_buffers() returns. Finally, if a cache device is removed, then l2arc_evict() is called under SCL_ALL in the exclusive mode. That ensures that it can not be concurrent with the normal L2ARC accesses to the device (including writing and evicting buffers). Given the above, some checks and actions in l2arc_evict() do not make sense. For instance, it must never encounter the write head header let alone remove it from the buffer list. Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Approved by: Matthew Ahrens <mahrens@delphix.com> Author: Andriy Gapon <avg@FreeBSD.org> MFC after: 2 weeks	2017-08-08 11:19:14 +00:00
Andriy Gapon	3cf2ea1aea	MFV r322236: 8126 ztest assertion failed in dbuf_dirty due to dn_nlevels changing illumos/illumos-gate@dcb6872c56 `dcb6872c56` https://www.illumos.org/issues/8126 The sync thread is concurrently modifying dn_phys->dn_nlevels while dbuf_dirty() is trying to assert something about it, without holding the necessary lock. We need to move this assertion further down in the function, after we have acquired the dn_struct_rwlock. Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com> MFC after: 2 weeks	2017-08-08 11:14:40 +00:00
Andriy Gapon	98a9c8e68a	zfs: no need for __DECONST after abd constification in r322233 Note that vdev_label_write_pad2() is FreeBSD specific. MFC after: 2 weeks X-MFC after: r322233	2017-08-08 11:07:34 +00:00
Andriy Gapon	b9a4f29445	MFV r322232: 8426 mark immutable buffer arguments as such in abd.h illumos/illumos-gate@9b195260e2 `9b195260e2` https://www.illumos.org/issues/8426 abd_copy_from_buf and abd_cmp_buf do not modify their void *buf arguments, so qualify them with const. abd_copy_from_buf_off and abd_cmp_buf_off already had that type for the corresponding arguments. Reviewed by: Matt Ahrens <mahrens@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Andriy Gapon <avg@FreeBSD.org> MFC after: 2 weeks	2017-08-08 10:59:18 +00:00
Andriy Gapon	9c48e95dd9	MFV r322229: 7600 zfs rollback should pass target snapshot to kernel illumos/illumos-gate@77b171372e `77b171372e` https://www.illumos.org/issues/7600 At present, the kernel side code seems to blindly rollback to whatever happens to be the latest snapshot at the time when the rollback task is processed. The expected target's name should be passed to the kernel driver and the sync task should validate that the target exists and that it is the latest snapshot indeed. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Andriy Gapon <avg@FreeBSD.org> MFC after: 3 weeks	2017-08-08 10:52:01 +00:00
Andriy Gapon	8605a08bd2	MFV r322227: 8377 Panic in bookmark deletion illumos/illumos-gate@42418f9e73 `42418f9e73` https://www.illumos.org/issues/8377 The problem is that when dsl_bookmark_destroy_check() is executed from open context (the pre-check), it fills in dbda_success based on the existence of the bookmark. But the bookmark (or containing filesystem as in this case) can be destroyed before we get to syncing context. When we re-run dsl_bookmark_destroy_check() in syncing context, it will not add the deleted bookmark to dbda_success, intending for dsl_bookmark_destroy_sync() to not process it. But because the bookmark is still in dbda_success from the open-context call, we do try to destroy it. The fix is that dsl_bookmark_destroy_check() should not modify dbda_success when called from open context. Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com> MFC after: 2 weeks	2017-08-08 10:48:52 +00:00
Andriy Gapon	b4e4140d13	MFV r322223: 8378 crash due to bp in-memory modification of nopwrite block illumos/illumos-gate@b7edcb9408 `b7edcb9408` https://www.illumos.org/issues/8378 The problem is that zfs_get_data() supplies a stale zgd_bp to dmu_sync(), which we then nopwrite against. zfs_get_data() doesn't hold any DMU-related locks, so after it copies db_blkptr to zgd_bp, dbuf_write_ready() could change db_blkptr, and dbuf_write_done() could remove the dirty record. dmu_sync() then sees the stale BP and that the dbuf it not dirty, so it is eligible for nop-writing. The fix is for dmu_sync() to copy db_blkptr to zgd_bp after acquiring the db_mtx. We could still see a stale db_blkptr, but if it is stale then the dirty record will still exist and thus we won't attempt to nopwrite. Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com> MFC after: 2 weeks	2017-08-08 10:46:51 +00:00
Andriy Gapon	c6fb364293	MFV r322221: 7910 l2arc_write_buffers() may write beyond target_sz FreeBD note: the essence of this change was committed to FreeBSD in r314274. This commit catches up with differences between what was committed to FreeBSD and what was committed to OpenZFS, mainly more logical variable names. illumos/illumos-gate@16a7e5ac11 `16a7e5ac11` https://www.illumos.org/issues/7910 It seems that the change in issue #6950 resurrected the problem that was earlier fixed by the change in issue #5219. Please also see the following FreeBSD bug report: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=216178 Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Andriy Gapon <avg@FreeBSD.org> MFC after: 2 weeks	2017-08-08 10:43:41 +00:00
Ruslan Bukin	ca20f8ec29	o Replace __riscv__ with __riscv o Replace __riscv64 with (__riscv && __riscv_xlen == 64) This is required to support new GCC 7.1 compiler. This is compatible with current GCC 6.1 compiler. RISC-V is extensible ISA and the idea here is to have built-in define per each extension, so together with __riscv we will have some subset of these as well (depending on -march string passed to compiler): __riscv_compressed __riscv_atomic __riscv_mul __riscv_div __riscv_muldiv __riscv_fdiv __riscv_fsqrt __riscv_float_abi_soft __riscv_float_abi_single __riscv_float_abi_double __riscv_cmodel_medlow __riscv_cmodel_medany __riscv_cmodel_pic __riscv_xlen Reviewed by: ngie Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D11901	2017-08-07 14:09:57 +00:00
Andriy Gapon	f6040f9e8e	spa_import_rootpool should be able to handle an imported root pool That is required to support reboot -r with a new root filesystem being on an already imported pool. PR: 210721 Reported by: Jan Bramkamp <crest_maintainer@rlwinm.de> MFC after: 2 weeks	2017-07-25 13:17:06 +00:00
Ed Maste	5c12f7c3e2	zfs: Fix a typo in the delay_min_dirty_percent sysctl description The description is FreeBSD-specific and was added in r266497 to fix PR189865. PR: 220825 Submitted by: Fabian Keil Obtained from: ElectroBSD MFC after: 1 week	2017-07-19 18:17:41 +00:00
Andriy Gapon	37ec52ca7a	fix a regression in r320452, ZFS ABD import I overlooked the fact that vdev_op_io_done hook is called even if the actual I/O is skipped, for example, in the case of a missing vdev. Arguably, this could be considered an issue in the zio pipeline engine, but for now I am adding defensive code to check for io_bp being NULL along with assertions that that happens only when it can be really expected. PR: 220691 Reported by: peter, cy Tested by: cy MFC after: 1 week X-MFC with: r320156, r320452	2017-07-18 07:41:38 +00:00
Justin Hibbits	8fe026c641	Make ZFS not crash on mount on 32-bit systems ZPL_VERSION is unsigned long long, not an int. With this change, a zpool can be created on a 32-bit system (tested on powerpcspe) and mounted correctly. Reviewed by: allanjude	2017-07-18 01:08:45 +00:00
Andriy Gapon	1db5f1724b	fix an architectural problem introduced in r320156, ZFS ABD import The implementation of ZFS refcount_t uses the emulated illumos mutex (the sx lock) and the waiting memory allocation when ZFS_DEBUG is enabled. This makes refcount_t unsuitable for use in GEOM g_up thread where sleeping is prohibited. When importing the ABD change I modified vdev_geom using illumos vdev_disk as an example. As a result, I added a call to abd_return_buf in vdev_geom_io_intr. The latter is called on g_up thread while the former uses refcount_t. This change fixes the problem by deferring the abd_return_buf call to the previously unused vdev_geom_io_done that is called on a ZFS zio taskqueue thread where sleeping is allowed. A side bonus of this change is that now a vdev zio has a pointer to its corresponding bio while the zio is active. Reported by: Shawn Webb <shawn.webb@hardenedbsd.org> Tested by: Shawn Webb <shawn.webb@hardenedbsd.org> MFC after: 1 week X-MFC with: r320156	2017-06-28 13:59:20 +00:00
Andriy Gapon	c20b00c6af	zfs: port vdev_file part of illumos change 3306 3306 zdb should be able to issue reads in parallel illumos/illumos-gate/31d7e8fa33fae995f558673adb22641b5aa8b6e1 https://www.illumos.org/issues/3306 The upstream change was made before we started to import upstream commits individually. It was imported into the illumos vendor area as r242733. That commit was MFV-ed in r260138, but as the commit message says vdev_file.c was left intact. This commit actually implements the parallel I/O for vdev_file using a taskqueue with multiple thread. This implementation does not depend on the illumos or FreeBSD bio interface at all, but uses zio_t to pass around all the relevent data. So, the code looks a bit different from the upstream. This commit also incorporates ZoL commit zfsonlinux/zfs/bc25c9325b0e5ced897b9820dad239539d561ec9 that fixed https://github.com/zfsonlinux/zfs/issues/2270 We need to use a dedicated taskqueue for exactly the same reason as ZoL as we do not implement TASKQ_DYNAMIC. Obtained from: illumos, ZFS on Linux MFC after: 2 weeks	2017-06-26 09:10:09 +00:00
Andriy Gapon	ee2d3c0a5b	fix gcc-specific fallout from r320156, MFV of r318946, ZFS ABD Reported by: jhibbits MFC after: 1 week X-MFC with: r320156	2017-06-23 08:42:53 +00:00
Andriy Gapon	3385c74539	MFV r319950: 5220 L2ARC does not support devices that do not provide 512B access FreeBSD note: the actual change has been in FreeBSD since r297848. This commit accounts for integration of that change with subsequent changes, especially r320156 (MFV of r318946) and r314274. illumos/illumos-gate@403a8da73c `403a8da73c` https://www.illumos.org/issues/5220 There are disk devices that have logical sector size larger than 512B, for example 4KB. That is, their physical sector size is larger than 512B and they do not provide emulation for 512B sector sizes. For such devices both a data offset and a data size must be properly aligned. L2ARC should arrange that because it uses physical I/O. zio_vdev_io_start() performs a necessary transformation if io_size is not aligned to vdev_ashift, but that is done only for logical I/O. Something similar should be done in L2ARC code. * a temporary write buffer should be allocated if the original buffer is not going to be compressed and its size is not aligned * size of a temporary compression buffer should be ashift aligned * for the reads, if a size of a target buffer is not sufficiently large and it is not aligned then a temporary read buffer should be allocated Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com> Approved by: Dan McDonald <danmcd@joyent.com> Author: Andriy Gapon <avg@FreeBSD.org> MFC after: 3 weeks	2017-06-22 17:10:34 +00:00
Andriy Gapon	ae5ec64b88	MFV r319742: 8056 zfs send size estimate is inaccurate for some zvols illumos/illumos-gate@0255edcc85 `0255edcc85` https://www.illumos.org/issues/8056 The send size estimate for a zvol can be too low, if the size of the record headers (dmu_replay_record_t's) is a significant portion of the size. This is typically the case when the data is highly compressible, especially with embedded blocks. The problem is that dmu_adjust_send_estimate_for_indirects() assumes that blocks are the size of the "recordsize" property (128KB). However, for zvols, the blocks are the size of the "volblocksize" property (8KB). Therefore, we estimate that there will be 16x less record headers than there really will be. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Paul Dagnelie <pcd@delphix.com> MFC after: 3 weeks	2017-06-22 16:58:09 +00:00
Andriy Gapon	e70097b50f	MFV r318947: 7578 Fix/improve some aspects of ZIL writing. FreeBSD note: this commit removes small differences between what mav committed to FreeBSD in r308782 and what ended up committed to illumos after addressing all review comments. illumos/illumos-gate@c5ee46810f `c5ee46810f` https://www.illumos.org/issues/7578 After some ZIL changes 6 years ago zil_slog_limit got partially broken due to zl_itx_list_sz not updated when async itx'es upgraded to sync. Actually because of other changes about that time zl_itx_list_sz is not really required to implement the functionality, so this patch removes some unneeded broken code and variables. Original idea of zil_slog_limit was to reduce chance of SLOG abuse by single heavy logger, that increased latency for other (more latency critical) loggers, by pushing heavy log out into the main pool instead of SLOG. Beside huge latency increase for heavy writers, this implementation caused double write of all data, since the log records were explicitly prepared for SLOG. Since we now have I/O scheduler, I've found it can be much more efficient to reduce priority of heavy logger SLOG writes from ZIO_PRIORITY_SYNC_WRITE to ZIO_PRIORITY_ASYNC_WRITE, while still leave them on SLOG. Existing ZIL implementation had problem with space efficiency when it has to write large chunks of data into log blocks of limited size. In some cases efficiency stopped to almost as low as 50%. In case of ZIL stored on spinning rust, that also reduced log write speed in half, since head had to uselessly fly over allocated but not written areas. This change improves the situation by offloading problematic operations from z_log_write() to zil_lwb_commit(), which knows real situation of log blocks allocation and can split large requests into pieces much more efficiently. Also as side effect it removes one of two data copy operations done by ZIL code WR_COPIED case. While there, untangle and unify code of z_log_write() functions. Also zfs_log_write() alike to zvol_log_write() can now handle writes crossing block boundary, that may also improve efficiency if ZPL is made to do that. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Andriy Gapon <avg@FreeBSD.org> Reviewed by: Steven Hartland <steven.hartland@multiplay.co.uk> Reviewed by: Brad Lewis <brad.lewis@delphix.com> Reviewed by: Richard Elling <Richard.Elling@RichardElling.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Alexander Motin <mav@FreeBSD.org> MFC after: 3 weeks	2017-06-22 16:52:22 +00:00
Andriy Gapon	a4a2976d8a	fix several fallouts from r320156, ZFS ABD import All of the problems were related to the FreeBSD-only features. One was caused by a mismerge in the zfsbootcfg support code. All others were in the TRIM support code. MFC after: 1 week X-MFC with: r320156	2017-06-21 08:12:07 +00:00
Andriy Gapon	ebf3b53dac	fix several fallouts from r320156, ZFS ABD import All of the problems were related to the FreeBSD-only features. One was caused by a mismerge in the zfsbootcfg support code. All others were in the TRIM support code. Reported by: ken, O. Hartmann <ohartmann@walstatt.org>, Trond Endrestøl <Trond.Endrestol@fagskolen.gjovik.no> MFC after: 1 week X-MFC with: r320156	2017-06-21 08:10:45 +00:00
Andriy Gapon	f9cdbaba8d	MFV r318946: 8021 ARC buf data scatter-ization illumos/illumos-gate@770499e185 `770499e185` https://www.illumos.org/issues/8021 The ARC buf data project (known simply as "ABD" since its genesis in the ZoL community) changes the way the ARC allocates `b_pdata` memory from using linear `void *` buffers to using scatter/gather lists of fixed-size 1KB chunks. This improves ZFS's performance by helping to defragment the address space occupied by the ARC, in particular for cases where compressed ARC is enabled. It could also ease future work to allocate pages directly from `segkpm` for minimal- overhead memory allocations, bypassing the `kmem` subsystem. This is essentially the same change as the one which recently landed in ZFS on Linux, although they made some platform-specific changes while adapting this work to their codebase: 1. Implemented the equivalent of the `segkpm` suggestion for future work mentioned above to bypass issues that they've had with the Linux kernel memory allocator. 2. Changed the internal representation of the ABD's scatter/gather list so it could be used to pass I/O directly into Linux block device drivers. (This feature is not available in the illumos block device interface yet.) FreeBSD notes: - the actual (default) chunk size is 4KB (despite the text above saying 1KB) - we can try to reimplement ABDs, so that they are not permanently mapped into the KVA unless explicitly requested, especially on platforms with scarce KVA - we can try to use unmapped I/O and avoid intermediate allocation of a linear, virtual memory mapped buffer - we can try to avoid extra data copying by referring to chunks / pages in the original ABD Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Prashanth Sreenivasa <pks@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Chris Williamson <chris.williamson@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Dan Kimmel <dan.kimmel@delphix.com> MFC after: 3 weeks	2017-06-20 17:39:24 +00:00
Andriy Gapon	42ce346fcc	revert r315852 which introduced zio_buf_alloc_nowait for use in vdev_queue_aggregate I think that the change is still good, but reconciling it with a planned merge of the ARC buf data scatter-ization is a bit more tedious than I can handle. MFC after: 17 days	2017-06-20 16:55:30 +00:00
Andriy Gapon	602cf4e4a7	MFV r319951: 8311 ZFS_READONLY is a little too strict illumos/illumos-gate@2889ec41c0 `2889ec41c0` https://www.illumos.org/issues/8311 Description: There was a misunderstanding about the enforcement details of the "Read-only" flag introduced for SMB/CIFS compatibility, way back in 2007 in the Sun PSARC 2007/315 case. The original authors thought enforcement of the READONLY flag should work similarly as the IMMUTABLE flag. Unfortunately, that enforcement is incompatible with the expectations of Windows applications using this feature through the SMB service. Applications assume (and the MS File System Algorithms MS-FSA confirms they should) that an SMB client can: (a) Open an SMB handle on a file with read/write access, (b) Set the DOS attributes to include the READONLY flag, (c) continue to have write access via that handle. This access model is essentially the same as a Unix/POSIX application that creates a file (with read/write access), uses fchmod() to change the file mode to something not granting write access (i.e. 0444), and then continues to write that file using the open handle it got before the mode change. Currently, the SMB server works-around this problem in a way that will become difficult to maintain as we implement support for SMB3 persistent handles, so SMB depends on this fix. I've written a test program that can be used to demonstrate this problem, and added it to zfs-tests (tests/functional/acl/cifs/cifs_attr_004_pos). It currently fails, but will pass when this problem fixed. Steps to Reproduce: Run the test program on a ZFS file system. Expected Results: Pass Actual Results: Fail. Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com> Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com> Reviewed by: Andrew Stormont <andyjstormont@gmail.com> Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed by: John Kennedy <john.kennedy@delphix.com> Approved by: Prakash Surya <prakash.surya@delphix.com> Author: Gordon Ross <gwr@nexenta.com> MFC after: 2 weeks	2017-06-14 16:55:47 +00:00
Andriy Gapon	7d506d0d57	MFV r319948: 5428 provide fts(), reallocarray(), and strtonum() illumos/illumos-gate@4585130b25 `4585130b25` https://www.illumos.org/issues/5428 Most of the upstream change is not applicable to FreeBSD. Only the renaming of strtonum to zfs_strtonum is relevant to us. And we already had it partially done. Reviewed by: Robert Mustacchi <rm@joyent.com> Approved by: Joshua M. Clulow <josh@sysmgr.org> Author: Yuri Pankov <yuri.pankov@nexenta.com> MFC after: 1 week	2017-06-14 16:42:38 +00:00
Andriy Gapon	b8d341fe26	MFV r319945,r319946: 8264 want support for promoting datasets in libzfs_core illumos/illumos-gate@a4b8c9aa65 `a4b8c9aa65` https://www.illumos.org/issues/8264 Oddly there is a lzc_clone function, but no lzc_promote function. Reviewed by: Andriy Gapon <avg@FreeBSD.org> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Dan McDonald <danmcd@kebe.com> Approved by: Dan McDonald <danmcd@kebe.com> Author: Andrew Stormont <astormont@racktopsystems.com> MFC after: 1 week	2017-06-14 16:31:36 +00:00
Justin Hibbits	880870b41a	Follow up r313841 on powerpc Close a potential race in reading the CPU dtrace flags, where a thread can start on one CPU, and partway through retrieving the flags be swapped out, while another thread traps and sets the CPU_DTRACE_NOFAULT. This could cause the first thread to return without handling the fault. Discussed with: markj@	2017-06-09 20:26:42 +00:00
Andriy Gapon	667002fa27	MFV r319741: 8156 dbuf_evict_notify() does not need dbuf_evict_lock illumos/illumos-gate@dbfd9f9300 `dbfd9f9300` https://www.illumos.org/issues/8156 dbuf_evict_notify() holds the dbuf_evict_lock while checking if it should do the eviction itself (because the evict thread is not able to keep up). This can result in massive lock contention. It isn't necessary to hold the lock, because if we make the wrong choice occasionally, nothing bad will happen. Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com> MFC after: 1 week	2017-06-09 15:28:57 +00:00
Andriy Gapon	5f9cf93878	MFV r319739: 8005 poor performance of 1MB writes on certain RAID-Z configurations illumos/illumos-gate@5b06278253 `5b06278253` https://www.illumos.org/issues/8005 RAID-Z requires that space be allocated in multiples of P+1 sectors, because this is the minimum size block that can have the required amount of parity. Thus blocks on RAIDZ1 must be allocated in a multiple of 2 sectors; on RAIDZ2 multiple of 3; and on RAIDZ3 multiple of 4. A sector is a unit of 2^ashift bytes, typically 512B or 4KB. To satisfy this constraint, the allocation size is rounded up to the proper multiple, resulting in up to 3 "pad sectors" at the end of some blocks. The contents of these pad sectors are not used, so we do not need to read or write these sectors. However, some storage hardware performs much worse (around 1/2 as fast) on mostly-contiguous writes when there are small gaps of non-overwritten data between the writes. Therefore, ZFS creates "optional" zio's when writing RAID-Z blocks that include pad sectors. If writing a pad sector will fill the gap between two (required) writes, we will issue the optional zio, thus doubling performance. The gap-filling performance improvement was introduced in July 2009. Writing the optional zio is done by the io aggregation code in vdev_queue.c. The problem is that it is also subject to the limit on the size of aggregate writes, zfs_vdev_aggregation_limit, which is by default 128KB. For a given block, if the amount of data plus padding written to a leaf device exceeds zfs_vdev_aggregation_limit, the optional zio will not be written, resulting in a ~2x performance degradation. The problem occurs only for certain values of ashift, compressed block size, and RAID-Z configuration (number of parity and data disks). It cannot occur with the default recordsize=128KB. If compression is enabled, all configurations with recordsize=1MB or larger will be impacted to some degree. The problem notably occurs with recordsize=1MB, compression=off, with 10 disks in a RAIDZ2 or RAIDZ3 group (with 512B or 4KB sectors). Therefore Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com> MFC after: 10 days	2017-06-09 15:27:22 +00:00
Andriy Gapon	9f141f8d71	MFV r319738: 8155 simplify dmu_write_policy handling of pre-compressed buffers illumos/illumos-gate@adaec86ad2 `adaec86ad2` https://www.illumos.org/issues/8155 When writing pre-compressed buffers, arc_write() requires that the compression algorithm used to compress the buffer matches the compression algorithm requested by the zio_prop_t, which is set by dmu_write_policy(). This makes dmu_write_policy() and its callers a bit more complicated. We can simplify this by making arc_write() trust the caller to supply the type of pre-compressed buffer that it wants to write, and override the compression setting in the zio_prop_t. Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com> MFC after: 10 days	2017-06-09 15:26:03 +00:00

1 2 3 4 5 ...

1788 Commits