freebsd-skq

Author	SHA1	Message	Date
Andriy Gapon	efd3d79ea3	8414 Implemented zpool scrub pause/resume illumos/illumos-gate@1702cce751 `1702cce751` https://www.illumos.org/issues/8414 This issue tracks the port of scrub pause from ZoL: https://github.com/zfsonlinux/zfs/pull/6167 Currently, there is no way to pause a scrub. Pausing may be useful when the pool is busy with other I/O to preserve bandwidth. Description This patch adds the ability to pause and resume scrubbing. This is achieved by maintaining a persistent on-disk scrub state. While the state is 'paused' we do not scrub any more blocks. We do however perform regular scan housekeeping such as freeing async destroyed and deadlist blocks while paused. If you're testing this change, you probably want to include the patch from #6164 Motivation and Context Scrub pausing can be an I/O intensive operation and people have been asking for the ability to pause a scrub for a while. This allows one to preserve scrub progress while freeing up bandwidth for other I/O. How Has This Been Tested? Unit testing and zfs-tests. to the pool. This patch will also include the patch from https://github.com/zfsonlinux/zfs/ pull/6164 In certain cases (dsl_scan_sync() is one), we may end up calling Reviewed by: George Melikov <mail@gmelikov.ru> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Brad Lewis <brad.lewis@delphix.com> Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com> Reviewed by: Matt Ahrens <mahrens@delphix.com> Approved by: Dan McDonald <danmcd@joyent.com> Author: Alek Pinchuk <apinchuk@datto.com>	2017-09-01 17:43:08 +00:00
Andriy Gapon	7d823d46e5	8547 update mandoc to 1.14.3 illumos/illumos-gate@c66b804654 `c66b804654` https://www.illumos.org/issues/8547 update mandoc (now it's the official name) suite to new upstream version, which among a lot of fixes, brings in much improved eqn(5)/tbl(5) support. Reviewed by: Robert Mustacchi <rm@joyent.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Yuri Pankov <yuri.pankov@nexenta.com>	2017-09-01 17:38:49 +00:00
Andriy Gapon	1493f2aba3	8300 fix man page issues found by mandoc 1.14.1 illumos/illumos-gate@72d3dbb9ab `72d3dbb9ab` https://www.illumos.org/issues/8300 Prior to integrating the mdocml update to 1.14.1, fix issues found by new version, especially the "new sentence, new line" style rule. Reviewed by: Robert Mustacchi <rm@joyent.com> Reviewed by: Toomas Soome <tsoome@me.com> Approved by: Gordon Ross <gwr@nexenta.com> Author: Yuri Pankov <yuri.pankov@nexenta.com>	2017-09-01 17:37:10 +00:00
Andriy Gapon	e9b6f3f506	8138 Improve manpage spelling illumos/illumos-gate@bccbd30bb6 `bccbd30bb6` https://www.illumos.org/issues/8138 While reading man pages, I've noticed a number of spelling mistakes and simple typos we should fix. Reviewed by: Toomas Soome <tsoome@me.com> Reviewed by: Cody Mello <melloc@joyent.com> Reviewed by: Patrick Mooney <patrick.mooney@joyent.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Peter Tribble <peter.tribble@gmail.com>	2017-09-01 17:35:27 +00:00
Andriy Gapon	739d7fa9c4	8508 Mounting a zpool on 32-bit platforms panics illumos/illumos-gate@b11fe8c014 `b11fe8c014` https://www.illumos.org/issues/8508 Mounting a zpool on a 32-bit system triggers a panic in spa_history_log_version () due to a type format mismatch for ZPL_VERSION. ZPL_VERSION is a unsigned long long, but the format expects an integer. On 64-bit platforms this may not be an issue due to word size and alignment. On 32-bit platforms a word size is half that of a long long, causing the second word of the long long to be seen as the string pointer for utsname.nodename. Reviewed by: Matt Ahrens <mahrens@delphix.com> Approved by: Dan McDonald <danmcd@joyent.com> Author: Justin Hibbits <chmeeedalf@gmail.com>	2017-08-08 11:27:19 +00:00
Andriy Gapon	9fdc44732a	8373 TXG_WAIT in ZIL commit path illumos/illumos-gate@d28671a3b0 `d28671a3b0` https://www.illumos.org/issues/8373 The code that writes ZIL blocks uses dmu_tx_assign(TXG_WAIT) to assign a transaction to a transaction group. That seems to be logically incorrect as writing of the ZIL block does not introduce any new dirty data. Also, when there is a lot of dirty data, the call can introduce significant delays into the ZIL commit path, thus affecting all synchronous writes. Additionally, ARC throttling may affect the ZIL writing. We probably need a new mechanism similar to dmu_tx_create_assigned to assign ZIL transactions. (Ab)using TXG_WAITED does not seem to be sufficient. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Approved by: Dan McDonald <danmcd@joyent.com> Author: Andriy Gapon <avg@FreeBSD.org>	2017-08-08 11:24:13 +00:00
Andriy Gapon	23dbc36649	8491 uberblock on-disk padding to reserve space for smoothly merging zpool checkpoint & MMP in ZFS illumos/illumos-gate@79c2b812ee `79c2b812ee` https://www.illumos.org/issues/8491 The zpool checkpoint feature in DxOS added a new field in the uberblock. The Multi-Modifier Protection Pull Request from ZoL adds two new fields in the uberblock (Reference: https://github.com/zfsonlinux/zfs/pull/6279). As these two changes come from two different sources and once upstreamed and deployed will introduce an incompatibility with each other we want to upstream a change that will reserve the padding for both of them so integration goes smoothly and everyone gets both features. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Olaf Faaland <faaland1@llnl.gov> Approved by: Gordon Ross <gwr@nexenta.com> Author: Serapheim Dimitropoulos <serapheim@delphix.com>	2017-08-08 11:19:56 +00:00
Andriy Gapon	74a3d2b7c8	7915 checks in l2arc_evict could use some cleaning up illumos/illumos-gate@267ae6c3a8 `267ae6c3a8` https://www.illumos.org/issues/7915 l2arc_evict() is strictly serialized with respect to l2arc_write_buffers() and l2arc_write_done(). Normally, l2arc_evict() and l2arc_write_buffers() are called from the same thread, so they can not be concurrent. Also, l2arc_write_buffers() uses zio_wait() on the parent zio of all cache zio- s. That ensures that l2arc_write_done() is completed before l2arc_write_buffers() returns. Finally, if a cache device is removed, then l2arc_evict() is called under SCL_ALL in the exclusive mode. That ensures that it can not be concurrent with the normal L2ARC accesses to the device (including writing and evicting buffers). Given the above, some checks and actions in l2arc_evict() do not make sense. For instance, it must never encounter the write head header let alone remove it from the buffer list. Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Approved by: Matthew Ahrens <mahrens@delphix.com> Author: Andriy Gapon <avg@FreeBSD.org>	2017-08-08 11:15:36 +00:00
Andriy Gapon	5f25e25435	8126 ztest assertion failed in dbuf_dirty due to dn_nlevels changing illumos/illumos-gate@dcb6872c56 `dcb6872c56` https://www.illumos.org/issues/8126 The sync thread is concurrently modifying dn_phys->dn_nlevels while dbuf_dirty() is trying to assert something about it, without holding the necessary lock. We need to move this assertion further down in the function, after we have acquired the dn_struct_rwlock. Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com>	2017-08-08 11:13:27 +00:00
Andriy Gapon	33f8e79f6a	8067 zdb should be able to dump literal embedded block pointer illumos/illumos-gate@4923c69fdd `4923c69fdd` https://www.illumos.org/issues/8067 Add an option to zdb to print a literal embedded block pointer supplied on the command line: zdb -E [-A] word0:word1:...:word15 Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Alex Reece <alex@delphix.com> Reviewed by: Yuri Pankov <yuri.pankov@gmail.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com>	2017-08-08 11:10:37 +00:00
Andriy Gapon	84e46b3cbd	8426 mark immutable buffer arguments as such in abd.h illumos/illumos-gate@9b195260e2 `9b195260e2` https://www.illumos.org/issues/8426 abd_copy_from_buf and abd_cmp_buf do not modify their void *buf arguments, so qualify them with const. abd_copy_from_buf_off and abd_cmp_buf_off already had that type for the corresponding arguments. Reviewed by: Matt Ahrens <mahrens@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Andriy Gapon <avg@FreeBSD.org>	2017-08-08 10:58:01 +00:00
Andriy Gapon	fd6c8b414e	8430 dir_is_empty_readdir() doesn't properly handle error from fdopendir() illumos/illumos-gate@ba6e7e6505 `ba6e7e6505` https://www.illumos.org/issues/8430 we should close dirfd if fdopendir() fails. Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com> Reviewed by: Igor Kozhukhov <igor@dilos.org> Approved by: Robert Mustacchi <rm@joyent.com> Author: Sowrabha Gopal <sowrabha.gopal@delphix.com>	2017-08-08 10:55:42 +00:00
Andriy Gapon	59946bc86e	7600 zfs rollback should pass target snapshot to kernel illumos/illumos-gate@77b171372e `77b171372e` https://www.illumos.org/issues/7600 At present, the kernel side code seems to blindly rollback to whatever happens to be the latest snapshot at the time when the rollback task is processed. The expected target's name should be passed to the kernel driver and the sync task should validate that the target exists and that it is the latest snapshot indeed. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Andriy Gapon <avg@FreeBSD.org>	2017-08-08 10:49:56 +00:00
Andriy Gapon	9ff5c34a7e	8377 Panic in bookmark deletion illumos/illumos-gate@42418f9e73 `42418f9e73` https://www.illumos.org/issues/8377 The problem is that when dsl_bookmark_destroy_check() is executed from open context (the pre-check), it fills in dbda_success based on the existence of the bookmark. But the bookmark (or containing filesystem as in this case) can be destroyed before we get to syncing context. When we re-run dsl_bookmark_destroy_check() in syncing context, it will not add the deleted bookmark to dbda_success, intending for dsl_bookmark_destroy_sync() to not process it. But because the bookmark is still in dbda_success from the open-context call, we do try to destroy it. The fix is that dsl_bookmark_destroy_check() should not modify dbda_success when called from open context. Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com>	2017-08-08 10:47:56 +00:00
Andriy Gapon	28bfd86184	8378 crash due to bp in-memory modification of nopwrite block illumos/illumos-gate@b7edcb9408 `b7edcb9408` https://www.illumos.org/issues/8378 The problem is that zfs_get_data() supplies a stale zgd_bp to dmu_sync(), which we then nopwrite against. zfs_get_data() doesn't hold any DMU-related locks, so after it copies db_blkptr to zgd_bp, dbuf_write_ready() could change db_blkptr, and dbuf_write_done() could remove the dirty record. dmu_sync() then sees the stale BP and that the dbuf it not dirty, so it is eligible for nop-writing. The fix is for dmu_sync() to copy db_blkptr to zgd_bp after acquiring the db_mtx. We could still see a stale db_blkptr, but if it is stale then the dirty record will still exist and thus we won't attempt to nopwrite. Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com>	2017-08-08 10:44:48 +00:00
Andriy Gapon	fd454218f8	7910 l2arc_write_buffers() may write beyond target_sz illumos/illumos-gate@16a7e5ac11 `16a7e5ac11` https://www.illumos.org/issues/7910 It seems that the change in issue #6950 resurrected the problem that was earlier fixed by the change in issue #5219. Please also see the following FreeBSD bug report: https://bugs.freebsd.org/ bugzilla/show_bug.cgi?id=216178 Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Andriy Gapon <avg@FreeBSD.org>	2017-08-08 10:37:03 +00:00
Andriy Gapon	0ae34b751c	8416 abd.h is not C++ friendly illumos/illumos-gate@5e2a074725 `5e2a074725` https://www.illumos.org/issues/8416 A C++ compiler fails to compile abd_is_linear(), which is an inline function defined in abd.h, with the following error: error: cannot initialize return object of type 'boolean_t' with an rvalue of type 'bool' That happens because a bool can not be converted to an enum in C++. That's a problem because abd.h can be visible through other header files that a C++ program that works with ZFS can include. Reviewed by: Igor Kozhukhov <igor@dilos.org> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Alek Pinchuk <pinchuk.alek@gmail.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Andriy Gapon <avg@FreeBSD.org>	2017-08-08 10:31:42 +00:00
Andriy Gapon	32b3bd238f	8418 zfs_prop_get_table() call in zfs_validate_name() is a no-op illumos/illumos-gate@e09ba01dcd `e09ba01dcd` https://www.illumos.org/issues/8418 The following line in zfs_validate_name() is just a no-op and it should be removed: 108 (void) zfs_prop_get_table(); Reviewed by: Vitaliy Gusev <gusev.vitaliy@icloud.com> Approved by: Matthew Ahrens <mahrens@delphix.com> Author: Marcel Telka <marcel@telka.sk>	2017-08-08 10:28:01 +00:00
Andriy Gapon	ecbf7dec14	8311 ZFS_READONLY is a little too strict illumos/illumos-gate@2889ec41c0 `2889ec41c0` https://www.illumos.org/issues/8311 Description: There was a misunderstanding about the enforcement details of the "Read-only" flag introduced for SMB/CIFS compatibility, way back in 2007 in the Sun PSARC 2007/315 case. The original authors thought enforcement of the READONLY flag should work similarly as the IMMUTABLE flag. Unfortunately, that enforcement is incompatible with the expectations of Windows applications using this feature through the SMB service. Applications assume (and the MS File System Algorithms MS-FSA confirms they should) that an SMB client can: (a) Open an SMB handle on a file with read/write access, (b) Set the DOS attributes to include the READONLY flag, (c) continue to have write access via that handle. This access model is essentially the same as a Unix/POSIX application that creates a file (with read/write access), uses fchmod() to change the file mode to something not granting write access (i.e. 0444), and then continues to write that file using the open handle it got before the mode change. Currently, the SMB server works-around this problem in a way that will become difficult to maintain as we implement support for SMB3 persistent handles, so SMB depends on this fix. I've written a test program that can be used to demonstrate this problem, and added it to zfs-tests (tests/functional/acl/cifs/cifs_attr_004_pos). It currently fails, but will pass when this problem fixed. Steps to Reproduce: Run the test program on a ZFS file system. Expected Results: Pass Actual Results: Fail. Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com> Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com> Reviewed by: Andrew Stormont <andyjstormont@gmail.com> Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed by: John Kennedy <john.kennedy@delphix.com> Approved by: Prakash Surya <prakash.surya@delphix.com> Author: Gordon Ross <gwr@nexenta.com>	2017-06-14 16:46:49 +00:00
Andriy Gapon	b06c8f09e9	5220 L2ARC does not support devices that do not provide 512B access illumos/illumos-gate@403a8da73c `403a8da73c` https://www.illumos.org/issues/5220 There are disk devices that have logical sector size larger than 512B, for example 4KB. That is, their physical sector size is larger than 512B and they do not provide emulation for 512B sector sizes. For such devices both a data offset and a data size must be properly aligned. L2ARC should arrange that because it uses physical I/O. zio_vdev_io_start() performs a necessary transformation if io_size is not aligned to vdev_ashift, but that is done only for logical I/O. Something similar should be done in L2ARC code. * a temporary write buffer should be allocated if the original buffer is not going to be compressed and its size is not aligned * size of a temporary compression buffer should be ashift aligned * for the reads, if a size of a target buffer is not sufficiently large and it is not aligned then a temporary read buffer should be allocated Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com> Approved by: Dan McDonald <danmcd@joyent.com> Author: Andriy Gapon <avg@FreeBSD.org>	2017-06-14 16:44:10 +00:00
Andriy Gapon	4b16e7e931	5428 provide fts(), reallocarray(), and strtonum() illumos/illumos-gate@4585130b25 `4585130b25` https://www.illumos.org/issues/5428 Reviewed by: Robert Mustacchi <rm@joyent.com> Approved by: Joshua M. Clulow <josh@sysmgr.org> Author: Yuri Pankov <yuri.pankov@nexenta.com>	2017-06-14 16:36:01 +00:00
Andriy Gapon	e460d6ff6b	8264 want support for promoting datasets in libzfs_core illumos/illumos-gate@a4b8c9aa65 `a4b8c9aa65` https://www.illumos.org/issues/8264 Oddly there is a lzc_clone function, but no lzc_promote function. Reviewed by: Andriy Gapon <avg@FreeBSD.org> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Dan McDonald <danmcd@kebe.com> Approved by: Dan McDonald <danmcd@kebe.com> Author: Andrew Stormont <astormont@racktopsystems.com>	2017-06-14 16:27:54 +00:00
Andriy Gapon	71672e5c5d	8264 want support for promoting datasets in libzfs_core illumos/illumos-gate@a4b8c9aa65 `a4b8c9aa65` https://www.illumos.org/issues/8264 Oddly there is a lzc_clone function, but no lzc_promote function. Reviewed by: Andriy Gapon <avg@FreeBSD.org> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Dan McDonald <danmcd@kebe.com> Approved by: Dan McDonald <danmcd@kebe.com> Author: Andrew Stormont <astormont@racktopsystems.com>	2017-06-14 16:23:15 +00:00
Andriy Gapon	acb89578f1	fix up r319744, add new files 8269 dtrace stddev aggregation is normalized incorrectly illumos/illumos-gate@79809f9cf4 `79809f9cf4` https://www.illumos.org/issues/8269 It seems that currently normalization of stddev aggregation is done incorrectly. We divide both the sum of values and the sum of their squares by the normalization factor. But we should divide the sum of squares by the normalization factor squared to scale the original values properly. Reviewed by: Bryan Cantrill <bryan@joyent.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Andriy Gapon <avg@FreeBSD.org>	2017-06-09 15:06:50 +00:00
Andriy Gapon	79edb7989a	8269 dtrace stddev aggregation is normalized incorrectly illumos/illumos-gate@79809f9cf4 `79809f9cf4` https://www.illumos.org/issues/8269 It seems that currently normalization of stddev aggregation is done incorrectly. We divide both the sum of values and the sum of their squares by the normalization factor. But we should divide the sum of squares by the normalization factor squared to scale the original values properly. Reviewed by: Bryan Cantrill <bryan@joyent.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Andriy Gapon <avg@FreeBSD.org>	2017-06-09 15:04:10 +00:00
Andriy Gapon	27ba1b79ca	8108 zdb -l fails to read labels 2 and 3 illumos/illumos-gate@22c8b9583d `22c8b9583d` https://www.illumos.org/issues/8108 Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Toomas Soome <tsoome@me.com> Reviewed by: Igor Kozhukhov <igor@dilos.org> Reviewed by: Andrew Stormont <andyjstormont@gmail.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Yuri Pankov <yuri.pankov@nexenta.com>	2017-06-09 15:03:07 +00:00
Andriy Gapon	e45762f8b4	8056 zfs send size estimate is inaccurate for some zvols illumos/illumos-gate@0255edcc85 `0255edcc85` https://www.illumos.org/issues/8056 The send size estimate for a zvol can be too low, if the size of the record headers (dmu_replay_record_t's) is a significant portion of the size. This is typically the case when the data is highly compressible, especially with embedded blocks. The problem is that dmu_adjust_send_estimate_for_indirects() assumes that blocks are the size of the "recordsize" property (128KB). However, for zvols, the blocks are the size of the "volblocksize" property (8KB). Therefore, we estimate that there will be 16x less record headers than there really will be. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Paul Dagnelie <pcd@delphix.com>	2017-06-09 15:02:07 +00:00
Andriy Gapon	5243698560	8156 dbuf_evict_notify() does not need dbuf_evict_lock illumos/illumos-gate@dbfd9f9300 `dbfd9f9300` https://www.illumos.org/issues/8156 dbuf_evict_notify() holds the dbuf_evict_lock while checking if it should do the eviction itself (because the evict thread is not able to keep up). This can result in massive lock contention. It isn't necessary to hold the lock, because if we make the wrong choice occasionally, nothing bad will happen. Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com>	2017-06-09 15:01:18 +00:00
Andriy Gapon	1cb31ff6a4	8168 NULL pointer dereference in zfs_create() illumos/illumos-gate@690031d326 `690031d326` https://www.illumos.org/issues/8168 If we manage to export the pool on which we are creating a dataset (filesystem or zvol) between entering libzfs`zfs_create() and libzfs`zpool_open() call (for which we never check the return value) we end up dereferencing a NULL pointer in libzfs`zpool_close(). This was discovered on ZFS on Linux. The same issue can be reproduced on Illumos running in parallel: while :; do zpool import -d /tmp testpool ; zpool export testpool ; done while :; do zfs create testpool/fs; zfs destroy testpool/fs ; done Eventually this will result in several core dumps like this one: [root@52-54-00-d3-7a-01 /cores]# mdb core.zfs.4244 Loading modules: [ libumem.so.1 libc.so.1 libtopo.so.1 libavl.so.1 libnvpair.so.1 ld.so.1 ] > ::stack libzfs.so.1`zpool_close+0x17(0, 0, 0, `8047450`) libzfs.so.1`zfs_create+0x1bb(8090548, 8047e6f, 1, 808cba8) zfs_do_create+0x545(2, 8047d74, 80778a0, 801, 0, 3) main+0x22c(8047d2c, fef5c6e8, 8047d64, 8055a17, 3, 8047d70) _start+0x83(3, 8047e64, 8047e68, 8047e6f, 0, 8047e7b) > Fix and reproducer (systemtap): https://github.com/zfsonlinux/zfs/pull/6096 Reviewed by: Matt Ahrens <mahrens@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: loli10K <ezomori.nozomu@gmail.com>	2017-06-09 15:00:13 +00:00
Andriy Gapon	acabd65c2a	8005 poor performance of 1MB writes on certain RAID-Z configurations illumos/illumos-gate@5b06278253 `5b06278253` https://www.illumos.org/issues/8005 RAID-Z requires that space be allocated in multiples of P+1 sectors, because this is the minimum size block that can have the required amount of parity. Thus blocks on RAIDZ1 must be allocated in a multiple of 2 sectors; on RAIDZ2 multiple of 3; and on RAIDZ3 multiple of 4. A sector is a unit of 2^ashift bytes, typically 512B or 4KB. To satisfy this constraint, the allocation size is rounded up to the proper multiple, resulting in up to 3 "pad sectors" at the end of some blocks. The contents of these pad sectors are not used, so we do not need to read or write these sectors. However, some storage hardware performs much worse (around 1/2 as fast) on mostly-contiguous writes when there are small gaps of non-overwritten data between the writes. Therefore, ZFS creates "optional" zio's when writing RAID-Z blocks that include pad sectors. If writing a pad sector will fill the gap between two (required) writes, we will issue the optional zio, thus doubling performance. The gap-filling performance improvement was introduced in July 2009. Writing the optional zio is done by the io aggregation code in vdev_queue.c. The problem is that it is also subject to the limit on the size of aggregate writes, zfs_vdev_aggregation_limit, which is by default 128KB. For a given block, if the amount of data plus padding written to a leaf device exceeds zfs_vdev_aggregation_limit, the optional zio will not be written, resulting in a ~2x performance degradation. The problem occurs only for certain values of ashift, compressed block size, and RAID-Z configuration (number of parity and data disks). It cannot occur with the default recordsize=128KB. If compression is enabled, all configurations with recordsize=1MB or larger will be impacted to some degree. The problem notably occurs with recordsize=1MB, compression=off, with 10 disks in a RAIDZ2 or RAIDZ3 group (with 512B or 4KB sectors). Therefore Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com>	2017-06-09 14:58:51 +00:00
Andriy Gapon	02ae6a9ae6	8155 simplify dmu_write_policy handling of pre-compressed buffers illumos/illumos-gate@adaec86ad2 `adaec86ad2` https://www.illumos.org/issues/8155 When writing pre-compressed buffers, arc_write() requires that the compression algorithm used to compress the buffer matches the compression algorithm requested by the zio_prop_t, which is set by dmu_write_policy(). This makes dmu_write_policy() and its callers a bit more complicated. We can simplify this by making arc_write() trust the caller to supply the type of pre-compressed buffer that it wants to write, and override the compression setting in the zio_prop_t. Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com>	2017-06-09 14:57:45 +00:00
Andriy Gapon	cc03e98a7b	6939 add sysevents to zfs core for commands illumos/illumos-gate@ce1577b049 `ce1577b049` https://www.illumos.org/issues/6939 Originally created https://smartos.org/bugview/OS-4489 sysevents should be fired in the kernel from ZFS whenever a command is run that is logged in zpool history. Example output Terminal 1 root - gz sunos ~ # zfs create zones/foobar root - gz sunos ~ # zfs set quota=10g zones/foobar root - gz sunos ~ # zfs destroy zones/foobar Terminal 2 root - gz sunos ~ # sysevent EC_zfs nvlist version: 0 date = 2016-04-28T14:50:08.964Z vendor = SUNW publisher = zfs class = EC_zfs subclass = ESC_ZFS_history_event pid = 0 data = (embedded nvlist) nvlist version: 0 pool_name = zones pool_guid = 0x40c964e8f9a7a694 history_record = (embedded nvlist) nvlist version: 0 dsname = zones/foobar dsid = 0x1525 history internal str = internal_name = create history txg = 0x4c4ef3 Reviewed by: Patrick Mooney <patrick.mooney@joyent.com> Reviewed by: Joshua M. Clulow <jmc@joyent.com> Reviewed by: Josh Wilsdon <jwilsdon@joyent.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Richard Elling <Richard.Elling@RichardElling.com> Reviewed by: Alan Somers <asomers@gmail.com> Reviewed by: Andrew Stormont <andyjstormont@gmail.com> Approved by: Matthew Ahrens <mahrens@delphix.com> Author: Dave Eddy <dave@daveeddy.com>	2017-06-09 14:56:17 +00:00
Andriy Gapon	a168c6e861	6396 remove SVM illumos/illumos-gate@5f10ef697f `5f10ef697f` https://www.illumos.org/issues/6396 LVM = SVM = Solaris Volume Manager dead code and not using with ZFS based platform. Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com> Reviewed by: Toomas Soome <tsoome@me.com> Approved by: Hans Rosenfeld <rosenfeld@grumpf.hope-2000.org> Author: Yuri Pankov <yuri.pankov@nexenta.com>	2017-06-09 14:55:12 +00:00
Andriy Gapon	3899d91a09	7578 Fix/improve some aspects of ZIL writing. illumos/illumos-gate@c5ee46810f `c5ee46810f` https://www.illumos.org/issues/7578 After some ZIL changes 6 years ago zil_slog_limit got partially broken due to zl_itx_list_sz not updated when async itx'es upgraded to sync. Actually because of other changes about that time zl_itx_list_sz is not really required to implement the functionality, so this patch removes some unneeded broken code and variables. Original idea of zil_slog_limit was to reduce chance of SLOG abuse by single heavy logger, that increased latency for other (more latency critical) loggers, by pushing heavy log out into the main pool instead of SLOG. Beside huge latency increase for heavy writers, this implementation caused double write of all data, since the log records were explicitly prepared for SLOG. Since we now have I/O scheduler, I've found it can be much more efficient to reduce priority of heavy logger SLOG writes from ZIO_PRIORITY_SYNC_WRITE to ZIO_PRIORITY_ASYNC_WRITE, while still leave them on SLOG. Existing ZIL implementation had problem with space efficiency when it has to write large chunks of data into log blocks of limited size. In some cases efficiency stopped to almost as low as 50%. In case of ZIL stored on spinning rust, that also reduced log write speed in half, since head had to uselessly fly over allocated but not written areas. This change improves the situation by offloading problematic operations from z_log_write() to zil_lwb_commit(), which knows real situation of log blocks allocation and can split large requests into pieces much more efficiently. Also as side effect it removes one of two data copy operations done by ZIL code WR_COPIED case. While there, untangle and unify code of z_log_write() functions. Also zfs_log_write() alike to zvol_log_write() can now handle writes crossing block boundary, that may also improve efficiency if ZPL is made to do that. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Andriy Gapon <avg@FreeBSD.org> Reviewed by: Steven Hartland <steven.hartland@multiplay.co.uk> Reviewed by: Brad Lewis <brad.lewis@delphix.com> Reviewed by: Richard Elling <Richard.Elling@RichardElling.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Alexander Motin <mav@FreeBSD.org>	2017-05-26 12:13:58 +00:00
Andriy Gapon	d7f3871103	8021 ARC buf data scatter-ization 8100 8021 seems to cause random BAD TRAP: type=d (#gp General protection) illumos/illumos-gate@770499e185 `770499e185` https://www.illumos.org/issues/8021 The ARC buf data project (known simply as "ABD" since its genesis in the ZoL community) changes the way the ARC allocates `b_pdata` memory from using linear `void *` buffers to using scatter/gather lists of fixed-size 1KB chunks. This improves ZFS's performance by helping to defragment the address space occupied by the ARC, in particular for cases where compressed ARC is enabled. It could also ease future work to allocate pages directly from `segkpm` for minimal- overhead memory allocations, bypassing the `kmem` subsystem. This is essentially the same change as the one which recently landed in ZFS on Linux, although they made some platform-specific changes while adapting this work to their codebase: 1. Implemented the equivalent of the `segkpm` suggestion for future work mentioned above to bypass issues that they've had with the Linux kernel memory allocator. 2. Changed the internal representation of the ABD's scatter/gather list so it could be used to pass I/O directly into Linux block device drivers. (This feature is not available in the illumos block device interface yet.) https://www.illumos.org/issues/8100 My supermicro system is getting random BAD TRAP: type=d (#gp General protection) at about the stage where ZFS filesystems are mounted - usually console login prompt is already present but the services are still starting. After backing out 8021, the boot is completed and no panics do occur. Machine does dump, however savecore fails: savecore: bad magic number baddcafe I can get more data out with boot -k, if needed. # psrinfo -vp The physical processor has 4 cores and 8 virtual processors (0-7) The core has 2 virtual processors (0 4) The core has 2 virtual processors (1 5) The core has 2 virtual processors (2 6) The core has 2 virtual processors (3 7) x86 (GenuineIntel 306C3 family 6 model 60 step 3 clock 3500 MHz) Intel(r) Xeon(r) CPU E3-1246 v3 @ 3.50GHz # prtconf -m 32657 $ zpool status pool: rpool state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 c3t0d0 ONLINE 0 0 0 c3t1d0 ONLINE 0 0 0 Reviewed by: Matthew Ahrens mahrens@delphix.com Reviewed by: George Wilson george.wilson@delphix.com Reviewed by: Paul Dagnelie pcd@delphix.com Reviewed by: John Kennedy john.kennedy@delphix.com Reviewed by: Prakash Surya prakash.surya@delphix.com Reviewed by: Prashanth Sreenivasa pks@delphix.com Reviewed by: Pavel Zakharov pavel.zakharov@delphix.com Reviewed by: Chris Williamson chris.williamson@delphix.com Approved by: Richard Lowe <richlowe@richlowe.net> Author: Dan Kimmel <dan.kimmel@delphix.com>	2017-05-26 12:13:27 +00:00
Andriy Gapon	a8aa933b61	8265 Reserve send stream flag for large dnode feature illumos/illumos-gate@bc83969fdb `bc83969fdb` https://www.illumos.org/issues/8265 Reserve bit 23 in the zfs send stream flags for the large dnode feature which has been implemented for Linux. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Brian Behlendorf <behlendorf1@llnl.gov>	2017-05-26 12:07:47 +00:00
Andriy Gapon	f45d37d04e	8166 zpool scrub thinks it repaired offline device illumos/illumos-gate@2d2f193a21 `2d2f193a21` https://www.illumos.org/issues/8166 If we do a scrub while a leaf device is offline (via "zpool offline"), we will inadvertently clear the DTL (dirty time log) of the offline device, even though it is still damaged. When the device comes back online, we will incompletely resilver it, thinking that the scrub repaired blocks written before the scrub was started. The incomplete resilver can lead to data loss if there is a subsequent failure of a different leaf device. The fix is to never clear the DTL of offline devices. Note that if a device is onlined while a scrub is in progress, the scrub will be restarted. The problem can be worked around by running "zpool scrub" after "zpool online". See also https://github.com/zfsonlinux/zfs/issues/5806 Reviewed by: George Wilson george.wilson@delphix.com Reviewed by: Brad Lewis <brad.lewis@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Matthew Ahrens <mahrens@delphix.com>	2017-05-26 12:02:51 +00:00
Andriy Gapon	9df2d6f729	7446 zpool create should support efi system partition illumos/illumos-gate@7855d95b30 `7855d95b30` https://www.illumos.org/issues/7446 Since we support whole-disk configuration for boot pool, we also will need whole disk support with UEFI boot and for this, zpool create should create efi- system partition. I have borrowed the idea from oracle solaris, and introducing zpool create - B switch to provide an way to specify that boot partition should be created. However, there is still an question, how big should the system partition be. For time being, I have set default size 256MB (thats minimum size for FAT32 with 4k blocks). To support custom size, the set on creation "bootsize" property is created and so the custom size can be set as: zpool create B - o bootsize=34MB rpool c0t0d0 After pool is created, the "bootsize" property is read only. When -B switch is not used, the bootsize defaults to 0 and is shown in zpool get output with value ''. Older zfs/zpool implementations are ignoring this property. https://www.illumos.org/rb/r/219/ Reviewed by: Andrew Stormont <andyjstormont@gmail.com> Reviewed by: Yuri Pankov <yuri.pankov@gmail.com> Approved by: Dan McDonald <danmcd@kebe.com> Author: Toomas Soome <tsoome@me.com>	2017-05-26 12:02:14 +00:00
Andriy Gapon	ab57ddbb43	7956 "minimum" is misspelled in zpool manpage illumos/illumos-gate@eba8726136 `eba8726136` https://www.illumos.org/issues/7956 Reviewed by: Prashanth Sreenivasa <pks@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Sebastien Roy <sebastien.roy@delphix.com> Reviewed by: Yuri Pankov <yuri.pankov@gmail.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Brad Lewis <blewis@delphix.com>	2017-05-26 12:00:31 +00:00
Andriy Gapon	5fbf54a543	6418 zpool should have a label clearing command illumos/illumos-gate@6401734d54 `6401734d54` https://www.illumos.org/issues/6418 An easy, direct means of sanitizing pool vdevs can be helpful for management purposes. FreeBSD has had a 'zpool labelclear' for some time, see: https:// svnweb.freebsd.org/base?view=revision&revision=224171 SpectraBSD has a slightly updated version, which I propose for inclusion. Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Will Andrews <will@firepipe.net> Note: the bulk of the change has been already imported, this is a follow up that imports zpool.1m changes.	2017-05-26 11:59:20 +00:00
Andriy Gapon	23d34c4278	6781 zpool man page needs updated to remove duplicate entry of "cannot be" where it discusses cache devices illumos/illumos-gate@e4cb59f791 `e4cb59f791` https://www.illumos.org/issues/6781 cache A device used to cache storage pool data. A cache device cannot be cannot be configured as a mirror or raidz group. For more information, see the "Cache Devices" section. needs changed to cache A device used to cache storage pool data. A cache device cannot be configured as a mirror or raidz group. For more information, see the "Cache Devices" section. Reviewed by: Toomas Soome <tsoome@me.com> Reviewed by: Robert Mustacchi <rm@joyent.com> Approved by: Matthew Ahrens <mahrens@delphix.com> Author: Alexander Pyhalov <apyhalov@gmail.com>	2017-05-26 11:56:28 +00:00
Andriy Gapon	15a3493f4e	2897 "zpool split" documentation missing from manpage illumos/illumos-gate@879bece34e `879bece34e` https://www.illumos.org/issues/2897 Found this option in some Oracle documentation and wanted to check out the zpool manpage on it in OI. Unfortunately it seems to be missing from the manpage, so I first thought it was unsupported. However, "# zpool split" does print the correct usage. Testing with the "-n" switch makes me believe it is supported (I don't actually need to split my pool). Reviewed by: Dan McDonald <danmcd@omniti.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Hans Rosenfeld <rosenfeld@grumpf.hope-2000.org> Author: Steven Burgess <sburgess@datto.com>	2017-05-26 11:55:31 +00:00
Andriy Gapon	a1a27d6fd4	4465 zpool(1M) is able to offline cache vdevs despite what man page says 5659 in the manual page for zpool(1M), one misuse of the word 'zpool' to describe a pool illumos/illumos-gate@c8323d4323 `c8323d4323` https://www.illumos.org/issues/4465 zpool(1M) is able to offline cache vdevs despite man page saying that it isn't: zpool offline [-t] pool device ... Takes the specified physical device offline. While the device is offline, no attempt is made to read or write to the device. This command is not applicable to spares or cache devices. altair:root:~# zpool create testoff c9t67d0 cache c9t71d0 altair:root:~# zpool status testoff pool: testoff state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM testoff ONLINE 0 0 0 c9t67d0 ONLINE 0 0 0 cache c9t71d0 ONLINE 0 0 0 errors: No known data errors altair:root:~# zpool offline testoff c9t71d0 altair:root:~# zpool status testoff pool: testoff state: ONLINE status: One or more devices has been taken offline by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Online the device using 'zpool online' or replace the device with 'zpool replace'. scan: none requested https://www.illumos.org/issues/5659 At https://github.com/illumos/illumos-gate/blob/master/usr/src/man/man1m/ zpool.1m#L931 Do not add a disk that is currently configured as a quorum device to a zpool. – should be: Do not add a disk that is currently configured as a quorum device to a pool. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Hans Rosenfeld <rosenfeld@grumpf.hope-2000.org> Author: Yuri Pankov <yuri.pankov@nexenta.com>	2017-05-26 11:54:42 +00:00
Andriy Gapon	2053e2d0d0	8070 Add some ZFS comments illumos/illumos-gate@40713f2b24 `40713f2b24` https://www.illumos.org/issues/8070 Add some ZFS comments left by various developers at different times Reviewed by: Yuri Pankov <yuri.pankov@gmail.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Alan Somers <asomers@gmail.com>	2017-05-26 11:48:29 +00:00
Andriy Gapon	5ce561a2f5	8064 need a static DTrace probe in VN_HOLD illumos/illumos-gate@ade42b557a `ade42b557a` https://www.illumos.org/issues/8064 It's currently nearly impossible to trace what process places a hold on a vnode, as the only ways holds are place is via the `VN_HOLD()` and `VN_HOLD_CALLER()` macros, which inline the bumping of `v_count`. Adding static DTrace probes to these macros would enable tracing of where specific vnode references come from. For completeness and symmetry, a similar static probe should be added to `vn_rele()` and `vn_rele_dnlc()`. Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Prashanth Sreenivasa <pks@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Sebastien Roy <seb@delphix.com>	2017-05-26 11:39:34 +00:00
Andriy Gapon	f9e7ac9d61	8063 verify that we do not attempt to access inactive txg illumos/illumos-gate@b7b2590dd9 `b7b2590dd9` https://www.illumos.org/issues/8063 A standard practice in ZFS is to keep track of "per-txg" state. Any of the 3 active TXG's (open, quiescing, syncing) can have different values for this state. We should assert that we do not attempt to modify other (inactive) TXG's. Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com>	2017-05-26 11:35:34 +00:00
Andriy Gapon	92455d0cca	7786 zfs`vdev_online() needs better notification about state changes illumos/illumos-gate@5f368aef86 `5f368aef86` https://www.illumos.org/issues/7786 Currently, vdev_online() will only post sysevent if previous state was "offline". It should also post the event when the state changes from "removed" or "faulted" to "healthy" or "degraded". This will fix the following scenario: - pull disk from slot A - check that hotspare has taken its place (if available) - insert disk into slot B - check that hotspare moved back to "avail" state (if spare was used) The problem here is that we don't get any ESC_ZFS_VDEV_* notification and fail to update the vdev FRU. Reviewed by: Matthew Ahrens mahrens@delphix.com Reviewed by: George Wilson george.wilson@delphix.com Approved by: Albert Lee <trisk@forkgnu.org> Author: Yuri Pankov <yuri.pankov@nexenta.com>	2017-05-26 11:32:05 +00:00
Andriy Gapon	1c42c71f38	8025 dbuf_read() creates unnecessary zio_root() for bonus buf illumos/illumos-gate@def4fac588 `def4fac588` https://www.illumos.org/issues/8025 dbuf_read() creates a zio_root() to track and wait for all the zio's that may happen as part of this call. However, if the blkptr_t for this buffer is NULL or a hole, we will not create any more zio's, so this zio_root() is unnecessary. This is always the case when calling dbuf_read() on a bonus buffer, because it has no blkptr (it's part of the containing dnode). For workloads that read a lot of bonus buffers (e.g. file creation and removal), creating and destroying these unnecessary zio's can decrease performance by around 3%. Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Prashanth Sreenivasa <pks@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com>	2017-05-26 11:29:31 +00:00
Andriy Gapon	7248081159	5704 libzfs can only handle 255 file descriptors illumos/illumos-gate@bde3d612a7 `bde3d612a7` https://www.illumos.org/issues/5704 libzfs uses fopen(), at least in libzfs_init(). If there are more than 255 filedescriptors open, fopen() will fail unless you give 'F' as the last mode character. The fix would be to give 'rF' instead of 'r' as mode to fopen(). Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com> Reviewed by: John Kennedy <john.kennedy@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Simon Klinkert <simon.klinkert@gmail.com>	2017-04-14 18:56:00 +00:00
Andriy Gapon	89a08ff6ba	7340 receive manual origin should override automatic origin illumos/illumos-gate@ed4e7a6a5c `ed4e7a6a5c` https://www.illumos.org/issues/7340 When -o origin=<snapshot> is specified as part of a ZFS receive, that origin should override the automatic detection in libzfs. Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Paul Dagnelie <pcd@delphix.com>	2017-04-14 18:54:11 +00:00

1 2 3 4 5 ...

610 Commits