freebsd-nq

Author	SHA1	Message	Date
Andriy Gapon	79edb7989a	8269 dtrace stddev aggregation is normalized incorrectly illumos/illumos-gate@79809f9cf4 `79809f9cf4` https://www.illumos.org/issues/8269 It seems that currently normalization of stddev aggregation is done incorrectly. We divide both the sum of values and the sum of their squares by the normalization factor. But we should divide the sum of squares by the normalization factor squared to scale the original values properly. Reviewed by: Bryan Cantrill <bryan@joyent.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Andriy Gapon <avg@FreeBSD.org>	2017-06-09 15:04:10 +00:00
Andriy Gapon	27ba1b79ca	8108 zdb -l fails to read labels 2 and 3 illumos/illumos-gate@22c8b9583d `22c8b9583d` https://www.illumos.org/issues/8108 Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Toomas Soome <tsoome@me.com> Reviewed by: Igor Kozhukhov <igor@dilos.org> Reviewed by: Andrew Stormont <andyjstormont@gmail.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Yuri Pankov <yuri.pankov@nexenta.com>	2017-06-09 15:03:07 +00:00
Andriy Gapon	e45762f8b4	8056 zfs send size estimate is inaccurate for some zvols illumos/illumos-gate@0255edcc85 `0255edcc85` https://www.illumos.org/issues/8056 The send size estimate for a zvol can be too low, if the size of the record headers (dmu_replay_record_t's) is a significant portion of the size. This is typically the case when the data is highly compressible, especially with embedded blocks. The problem is that dmu_adjust_send_estimate_for_indirects() assumes that blocks are the size of the "recordsize" property (128KB). However, for zvols, the blocks are the size of the "volblocksize" property (8KB). Therefore, we estimate that there will be 16x less record headers than there really will be. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Paul Dagnelie <pcd@delphix.com>	2017-06-09 15:02:07 +00:00
Andriy Gapon	5243698560	8156 dbuf_evict_notify() does not need dbuf_evict_lock illumos/illumos-gate@dbfd9f9300 `dbfd9f9300` https://www.illumos.org/issues/8156 dbuf_evict_notify() holds the dbuf_evict_lock while checking if it should do the eviction itself (because the evict thread is not able to keep up). This can result in massive lock contention. It isn't necessary to hold the lock, because if we make the wrong choice occasionally, nothing bad will happen. Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com>	2017-06-09 15:01:18 +00:00
Andriy Gapon	1cb31ff6a4	8168 NULL pointer dereference in zfs_create() illumos/illumos-gate@690031d326 `690031d326` https://www.illumos.org/issues/8168 If we manage to export the pool on which we are creating a dataset (filesystem or zvol) between entering libzfs`zfs_create() and libzfs`zpool_open() call (for which we never check the return value) we end up dereferencing a NULL pointer in libzfs`zpool_close(). This was discovered on ZFS on Linux. The same issue can be reproduced on Illumos running in parallel: while :; do zpool import -d /tmp testpool ; zpool export testpool ; done while :; do zfs create testpool/fs; zfs destroy testpool/fs ; done Eventually this will result in several core dumps like this one: [root@52-54-00-d3-7a-01 /cores]# mdb core.zfs.4244 Loading modules: [ libumem.so.1 libc.so.1 libtopo.so.1 libavl.so.1 libnvpair.so.1 ld.so.1 ] > ::stack libzfs.so.1`zpool_close+0x17(0, 0, 0, 8047450) libzfs.so.1`zfs_create+0x1bb(8090548, 8047e6f, 1, 808cba8) zfs_do_create+0x545(2, 8047d74, 80778a0, 801, 0, 3) main+0x22c(8047d2c, fef5c6e8, 8047d64, 8055a17, 3, 8047d70) _start+0x83(3, 8047e64, 8047e68, 8047e6f, 0, 8047e7b) > Fix and reproducer (systemtap): https://github.com/zfsonlinux/zfs/pull/6096 Reviewed by: Matt Ahrens <mahrens@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: loli10K <ezomori.nozomu@gmail.com>	2017-06-09 15:00:13 +00:00
Andriy Gapon	acabd65c2a	8005 poor performance of 1MB writes on certain RAID-Z configurations illumos/illumos-gate@5b06278253 `5b06278253` https://www.illumos.org/issues/8005 RAID-Z requires that space be allocated in multiples of P+1 sectors, because this is the minimum size block that can have the required amount of parity. Thus blocks on RAIDZ1 must be allocated in a multiple of 2 sectors; on RAIDZ2 multiple of 3; and on RAIDZ3 multiple of 4. A sector is a unit of 2^ashift bytes, typically 512B or 4KB. To satisfy this constraint, the allocation size is rounded up to the proper multiple, resulting in up to 3 "pad sectors" at the end of some blocks. The contents of these pad sectors are not used, so we do not need to read or write these sectors. However, some storage hardware performs much worse (around 1/2 as fast) on mostly-contiguous writes when there are small gaps of non-overwritten data between the writes. Therefore, ZFS creates "optional" zio's when writing RAID-Z blocks that include pad sectors. If writing a pad sector will fill the gap between two (required) writes, we will issue the optional zio, thus doubling performance. The gap-filling performance improvement was introduced in July 2009. Writing the optional zio is done by the io aggregation code in vdev_queue.c. The problem is that it is also subject to the limit on the size of aggregate writes, zfs_vdev_aggregation_limit, which is by default 128KB. For a given block, if the amount of data plus padding written to a leaf device exceeds zfs_vdev_aggregation_limit, the optional zio will not be written, resulting in a ~2x performance degradation. The problem occurs only for certain values of ashift, compressed block size, and RAID-Z configuration (number of parity and data disks). It cannot occur with the default recordsize=128KB. If compression is enabled, all configurations with recordsize=1MB or larger will be impacted to some degree. The problem notably occurs with recordsize=1MB, compression=off, with 10 disks in a RAIDZ2 or RAIDZ3 group (with 512B or 4KB sectors). Therefore Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com>	2017-06-09 14:58:51 +00:00
Andriy Gapon	02ae6a9ae6	8155 simplify dmu_write_policy handling of pre-compressed buffers illumos/illumos-gate@adaec86ad2 `adaec86ad2` https://www.illumos.org/issues/8155 When writing pre-compressed buffers, arc_write() requires that the compression algorithm used to compress the buffer matches the compression algorithm requested by the zio_prop_t, which is set by dmu_write_policy(). This makes dmu_write_policy() and its callers a bit more complicated. We can simplify this by making arc_write() trust the caller to supply the type of pre-compressed buffer that it wants to write, and override the compression setting in the zio_prop_t. Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com>	2017-06-09 14:57:45 +00:00
Andriy Gapon	cc03e98a7b	6939 add sysevents to zfs core for commands illumos/illumos-gate@ce1577b049 `ce1577b049` https://www.illumos.org/issues/6939 Originally created https://smartos.org/bugview/OS-4489 sysevents should be fired in the kernel from ZFS whenever a command is run that is logged in zpool history. Example output Terminal 1 root - gz sunos ~ # zfs create zones/foobar root - gz sunos ~ # zfs set quota=10g zones/foobar root - gz sunos ~ # zfs destroy zones/foobar Terminal 2 root - gz sunos ~ # sysevent EC_zfs nvlist version: 0 date = 2016-04-28T14:50:08.964Z vendor = SUNW publisher = zfs class = EC_zfs subclass = ESC_ZFS_history_event pid = 0 data = (embedded nvlist) nvlist version: 0 pool_name = zones pool_guid = 0x40c964e8f9a7a694 history_record = (embedded nvlist) nvlist version: 0 dsname = zones/foobar dsid = 0x1525 history internal str = internal_name = create history txg = 0x4c4ef3 Reviewed by: Patrick Mooney <patrick.mooney@joyent.com> Reviewed by: Joshua M. Clulow <jmc@joyent.com> Reviewed by: Josh Wilsdon <jwilsdon@joyent.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Richard Elling <Richard.Elling@RichardElling.com> Reviewed by: Alan Somers <asomers@gmail.com> Reviewed by: Andrew Stormont <andyjstormont@gmail.com> Approved by: Matthew Ahrens <mahrens@delphix.com> Author: Dave Eddy <dave@daveeddy.com>	2017-06-09 14:56:17 +00:00
Andriy Gapon	a168c6e861	6396 remove SVM illumos/illumos-gate@5f10ef697f `5f10ef697f` https://www.illumos.org/issues/6396 LVM = SVM = Solaris Volume Manager dead code and not using with ZFS based platform. Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com> Reviewed by: Toomas Soome <tsoome@me.com> Approved by: Hans Rosenfeld <rosenfeld@grumpf.hope-2000.org> Author: Yuri Pankov <yuri.pankov@nexenta.com>	2017-06-09 14:55:12 +00:00
Andriy Gapon	3899d91a09	7578 Fix/improve some aspects of ZIL writing. illumos/illumos-gate@c5ee46810f `c5ee46810f` https://www.illumos.org/issues/7578 After some ZIL changes 6 years ago zil_slog_limit got partially broken due to zl_itx_list_sz not updated when async itx'es upgraded to sync. Actually because of other changes about that time zl_itx_list_sz is not really required to implement the functionality, so this patch removes some unneeded broken code and variables. Original idea of zil_slog_limit was to reduce chance of SLOG abuse by single heavy logger, that increased latency for other (more latency critical) loggers, by pushing heavy log out into the main pool instead of SLOG. Beside huge latency increase for heavy writers, this implementation caused double write of all data, since the log records were explicitly prepared for SLOG. Since we now have I/O scheduler, I've found it can be much more efficient to reduce priority of heavy logger SLOG writes from ZIO_PRIORITY_SYNC_WRITE to ZIO_PRIORITY_ASYNC_WRITE, while still leave them on SLOG. Existing ZIL implementation had problem with space efficiency when it has to write large chunks of data into log blocks of limited size. In some cases efficiency stopped to almost as low as 50%. In case of ZIL stored on spinning rust, that also reduced log write speed in half, since head had to uselessly fly over allocated but not written areas. This change improves the situation by offloading problematic operations from z_log_write() to zil_lwb_commit(), which knows real situation of log blocks allocation and can split large requests into pieces much more efficiently. Also as side effect it removes one of two data copy operations done by ZIL code WR_COPIED case. While there, untangle and unify code of z_log_write() functions. Also zfs_log_write() alike to zvol_log_write() can now handle writes crossing block boundary, that may also improve efficiency if ZPL is made to do that. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Andriy Gapon <avg@FreeBSD.org> Reviewed by: Steven Hartland <steven.hartland@multiplay.co.uk> Reviewed by: Brad Lewis <brad.lewis@delphix.com> Reviewed by: Richard Elling <Richard.Elling@RichardElling.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Alexander Motin <mav@FreeBSD.org>	2017-05-26 12:13:58 +00:00
Andriy Gapon	d7f3871103	8021 ARC buf data scatter-ization 8100 8021 seems to cause random BAD TRAP: type=d (#gp General protection) illumos/illumos-gate@770499e185 `770499e185` https://www.illumos.org/issues/8021 The ARC buf data project (known simply as "ABD" since its genesis in the ZoL community) changes the way the ARC allocates `b_pdata` memory from using linear `void *` buffers to using scatter/gather lists of fixed-size 1KB chunks. This improves ZFS's performance by helping to defragment the address space occupied by the ARC, in particular for cases where compressed ARC is enabled. It could also ease future work to allocate pages directly from `segkpm` for minimal- overhead memory allocations, bypassing the `kmem` subsystem. This is essentially the same change as the one which recently landed in ZFS on Linux, although they made some platform-specific changes while adapting this work to their codebase: 1. Implemented the equivalent of the `segkpm` suggestion for future work mentioned above to bypass issues that they've had with the Linux kernel memory allocator. 2. Changed the internal representation of the ABD's scatter/gather list so it could be used to pass I/O directly into Linux block device drivers. (This feature is not available in the illumos block device interface yet.) https://www.illumos.org/issues/8100 My supermicro system is getting random BAD TRAP: type=d (#gp General protection) at about the stage where ZFS filesystems are mounted - usually console login prompt is already present but the services are still starting. After backing out 8021, the boot is completed and no panics do occur. Machine does dump, however savecore fails: savecore: bad magic number baddcafe I can get more data out with boot -k, if needed. # psrinfo -vp The physical processor has 4 cores and 8 virtual processors (0-7) The core has 2 virtual processors (0 4) The core has 2 virtual processors (1 5) The core has 2 virtual processors (2 6) The core has 2 virtual processors (3 7) x86 (GenuineIntel 306C3 family 6 model 60 step 3 clock 3500 MHz) Intel(r) Xeon(r) CPU E3-1246 v3 @ 3.50GHz # prtconf -m 32657 $ zpool status pool: rpool state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 c3t0d0 ONLINE 0 0 0 c3t1d0 ONLINE 0 0 0 Reviewed by: Matthew Ahrens mahrens@delphix.com Reviewed by: George Wilson george.wilson@delphix.com Reviewed by: Paul Dagnelie pcd@delphix.com Reviewed by: John Kennedy john.kennedy@delphix.com Reviewed by: Prakash Surya prakash.surya@delphix.com Reviewed by: Prashanth Sreenivasa pks@delphix.com Reviewed by: Pavel Zakharov pavel.zakharov@delphix.com Reviewed by: Chris Williamson chris.williamson@delphix.com Approved by: Richard Lowe <richlowe@richlowe.net> Author: Dan Kimmel <dan.kimmel@delphix.com>	2017-05-26 12:13:27 +00:00
Andriy Gapon	a8aa933b61	8265 Reserve send stream flag for large dnode feature illumos/illumos-gate@bc83969fdb `bc83969fdb` https://www.illumos.org/issues/8265 Reserve bit 23 in the zfs send stream flags for the large dnode feature which has been implemented for Linux. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Brian Behlendorf <behlendorf1@llnl.gov>	2017-05-26 12:07:47 +00:00
Andriy Gapon	f45d37d04e	8166 zpool scrub thinks it repaired offline device illumos/illumos-gate@2d2f193a21 `2d2f193a21` https://www.illumos.org/issues/8166 If we do a scrub while a leaf device is offline (via "zpool offline"), we will inadvertently clear the DTL (dirty time log) of the offline device, even though it is still damaged. When the device comes back online, we will incompletely resilver it, thinking that the scrub repaired blocks written before the scrub was started. The incomplete resilver can lead to data loss if there is a subsequent failure of a different leaf device. The fix is to never clear the DTL of offline devices. Note that if a device is onlined while a scrub is in progress, the scrub will be restarted. The problem can be worked around by running "zpool scrub" after "zpool online". See also https://github.com/zfsonlinux/zfs/issues/5806 Reviewed by: George Wilson george.wilson@delphix.com Reviewed by: Brad Lewis <brad.lewis@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Matthew Ahrens <mahrens@delphix.com>	2017-05-26 12:02:51 +00:00
Andriy Gapon	9df2d6f729	7446 zpool create should support efi system partition illumos/illumos-gate@7855d95b30 `7855d95b30` https://www.illumos.org/issues/7446 Since we support whole-disk configuration for boot pool, we also will need whole disk support with UEFI boot and for this, zpool create should create efi- system partition. I have borrowed the idea from oracle solaris, and introducing zpool create - B switch to provide an way to specify that boot partition should be created. However, there is still an question, how big should the system partition be. For time being, I have set default size 256MB (thats minimum size for FAT32 with 4k blocks). To support custom size, the set on creation "bootsize" property is created and so the custom size can be set as: zpool create B - o bootsize=34MB rpool c0t0d0 After pool is created, the "bootsize" property is read only. When -B switch is not used, the bootsize defaults to 0 and is shown in zpool get output with value ''. Older zfs/zpool implementations are ignoring this property. https://www.illumos.org/rb/r/219/ Reviewed by: Andrew Stormont <andyjstormont@gmail.com> Reviewed by: Yuri Pankov <yuri.pankov@gmail.com> Approved by: Dan McDonald <danmcd@kebe.com> Author: Toomas Soome <tsoome@me.com>	2017-05-26 12:02:14 +00:00
Andriy Gapon	ab57ddbb43	7956 "minimum" is misspelled in zpool manpage illumos/illumos-gate@eba8726136 `eba8726136` https://www.illumos.org/issues/7956 Reviewed by: Prashanth Sreenivasa <pks@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Sebastien Roy <sebastien.roy@delphix.com> Reviewed by: Yuri Pankov <yuri.pankov@gmail.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Brad Lewis <blewis@delphix.com>	2017-05-26 12:00:31 +00:00
Andriy Gapon	5fbf54a543	6418 zpool should have a label clearing command illumos/illumos-gate@6401734d54 `6401734d54` https://www.illumos.org/issues/6418 An easy, direct means of sanitizing pool vdevs can be helpful for management purposes. FreeBSD has had a 'zpool labelclear' for some time, see: https:// svnweb.freebsd.org/base?view=revision&revision=224171 SpectraBSD has a slightly updated version, which I propose for inclusion. Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Will Andrews <will@firepipe.net> Note: the bulk of the change has been already imported, this is a follow up that imports zpool.1m changes.	2017-05-26 11:59:20 +00:00
Andriy Gapon	23d34c4278	6781 zpool man page needs updated to remove duplicate entry of "cannot be" where it discusses cache devices illumos/illumos-gate@e4cb59f791 `e4cb59f791` https://www.illumos.org/issues/6781 cache A device used to cache storage pool data. A cache device cannot be cannot be configured as a mirror or raidz group. For more information, see the "Cache Devices" section. needs changed to cache A device used to cache storage pool data. A cache device cannot be configured as a mirror or raidz group. For more information, see the "Cache Devices" section. Reviewed by: Toomas Soome <tsoome@me.com> Reviewed by: Robert Mustacchi <rm@joyent.com> Approved by: Matthew Ahrens <mahrens@delphix.com> Author: Alexander Pyhalov <apyhalov@gmail.com>	2017-05-26 11:56:28 +00:00
Andriy Gapon	15a3493f4e	2897 "zpool split" documentation missing from manpage illumos/illumos-gate@879bece34e `879bece34e` https://www.illumos.org/issues/2897 Found this option in some Oracle documentation and wanted to check out the zpool manpage on it in OI. Unfortunately it seems to be missing from the manpage, so I first thought it was unsupported. However, "# zpool split" does print the correct usage. Testing with the "-n" switch makes me believe it is supported (I don't actually need to split my pool). Reviewed by: Dan McDonald <danmcd@omniti.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Hans Rosenfeld <rosenfeld@grumpf.hope-2000.org> Author: Steven Burgess <sburgess@datto.com>	2017-05-26 11:55:31 +00:00
Andriy Gapon	a1a27d6fd4	4465 zpool(1M) is able to offline cache vdevs despite what man page says 5659 in the manual page for zpool(1M), one misuse of the word 'zpool' to describe a pool illumos/illumos-gate@c8323d4323 `c8323d4323` https://www.illumos.org/issues/4465 zpool(1M) is able to offline cache vdevs despite man page saying that it isn't: zpool offline [-t] pool device ... Takes the specified physical device offline. While the device is offline, no attempt is made to read or write to the device. This command is not applicable to spares or cache devices. altair:root:~# zpool create testoff c9t67d0 cache c9t71d0 altair:root:~# zpool status testoff pool: testoff state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM testoff ONLINE 0 0 0 c9t67d0 ONLINE 0 0 0 cache c9t71d0 ONLINE 0 0 0 errors: No known data errors altair:root:~# zpool offline testoff c9t71d0 altair:root:~# zpool status testoff pool: testoff state: ONLINE status: One or more devices has been taken offline by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Online the device using 'zpool online' or replace the device with 'zpool replace'. scan: none requested https://www.illumos.org/issues/5659 At https://github.com/illumos/illumos-gate/blob/master/usr/src/man/man1m/ zpool.1m#L931 Do not add a disk that is currently configured as a quorum device to a zpool. – should be: Do not add a disk that is currently configured as a quorum device to a pool. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Hans Rosenfeld <rosenfeld@grumpf.hope-2000.org> Author: Yuri Pankov <yuri.pankov@nexenta.com>	2017-05-26 11:54:42 +00:00
Andriy Gapon	2053e2d0d0	8070 Add some ZFS comments illumos/illumos-gate@40713f2b24 `40713f2b24` https://www.illumos.org/issues/8070 Add some ZFS comments left by various developers at different times Reviewed by: Yuri Pankov <yuri.pankov@gmail.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Alan Somers <asomers@gmail.com>	2017-05-26 11:48:29 +00:00
Andriy Gapon	5ce561a2f5	8064 need a static DTrace probe in VN_HOLD illumos/illumos-gate@ade42b557a `ade42b557a` https://www.illumos.org/issues/8064 It's currently nearly impossible to trace what process places a hold on a vnode, as the only ways holds are place is via the `VN_HOLD()` and `VN_HOLD_CALLER()` macros, which inline the bumping of `v_count`. Adding static DTrace probes to these macros would enable tracing of where specific vnode references come from. For completeness and symmetry, a similar static probe should be added to `vn_rele()` and `vn_rele_dnlc()`. Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Prashanth Sreenivasa <pks@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Sebastien Roy <seb@delphix.com>	2017-05-26 11:39:34 +00:00
Andriy Gapon	f9e7ac9d61	8063 verify that we do not attempt to access inactive txg illumos/illumos-gate@b7b2590dd9 `b7b2590dd9` https://www.illumos.org/issues/8063 A standard practice in ZFS is to keep track of "per-txg" state. Any of the 3 active TXG's (open, quiescing, syncing) can have different values for this state. We should assert that we do not attempt to modify other (inactive) TXG's. Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com>	2017-05-26 11:35:34 +00:00
Andriy Gapon	92455d0cca	7786 zfs`vdev_online() needs better notification about state changes illumos/illumos-gate@5f368aef86 `5f368aef86` https://www.illumos.org/issues/7786 Currently, vdev_online() will only post sysevent if previous state was "offline". It should also post the event when the state changes from "removed" or "faulted" to "healthy" or "degraded". This will fix the following scenario: - pull disk from slot A - check that hotspare has taken its place (if available) - insert disk into slot B - check that hotspare moved back to "avail" state (if spare was used) The problem here is that we don't get any ESC_ZFS_VDEV_* notification and fail to update the vdev FRU. Reviewed by: Matthew Ahrens mahrens@delphix.com Reviewed by: George Wilson george.wilson@delphix.com Approved by: Albert Lee <trisk@forkgnu.org> Author: Yuri Pankov <yuri.pankov@nexenta.com>	2017-05-26 11:32:05 +00:00
Andriy Gapon	1c42c71f38	8025 dbuf_read() creates unnecessary zio_root() for bonus buf illumos/illumos-gate@def4fac588 `def4fac588` https://www.illumos.org/issues/8025 dbuf_read() creates a zio_root() to track and wait for all the zio's that may happen as part of this call. However, if the blkptr_t for this buffer is NULL or a hole, we will not create any more zio's, so this zio_root() is unnecessary. This is always the case when calling dbuf_read() on a bonus buffer, because it has no blkptr (it's part of the containing dnode). For workloads that read a lot of bonus buffers (e.g. file creation and removal), creating and destroying these unnecessary zio's can decrease performance by around 3%. Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Prashanth Sreenivasa <pks@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com>	2017-05-26 11:29:31 +00:00
Andriy Gapon	7248081159	5704 libzfs can only handle 255 file descriptors illumos/illumos-gate@bde3d612a7 `bde3d612a7` https://www.illumos.org/issues/5704 libzfs uses fopen(), at least in libzfs_init(). If there are more than 255 filedescriptors open, fopen() will fail unless you give 'F' as the last mode character. The fix would be to give 'rF' instead of 'r' as mode to fopen(). Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com> Reviewed by: John Kennedy <john.kennedy@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Simon Klinkert <simon.klinkert@gmail.com>	2017-04-14 18:56:00 +00:00
Andriy Gapon	89a08ff6ba	7340 receive manual origin should override automatic origin illumos/illumos-gate@ed4e7a6a5c `ed4e7a6a5c` https://www.illumos.org/issues/7340 When -o origin=<snapshot> is specified as part of a ZFS receive, that origin should override the automatic detection in libzfs. Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Paul Dagnelie <pcd@delphix.com>	2017-04-14 18:54:11 +00:00
Andriy Gapon	5fab759ec6	5142 libzfs support raidz root pool (loader project) illumos/illumos-gate@d5f26ad812 `d5f26ad812` https://www.illumos.org/issues/5142 the current libzfs only allows simple disk and mirror setup for boot pool, as loader does support booting from raidz, this feature will remove raidz restriction from boot pool setup. Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com> Reviewed by: Andrew Stormont <andyjstormont@gmail.com> Reviewed by: Albert Lee <trisk@omniti.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Toomas Soome <tsoome@me.com>	2017-04-14 18:52:48 +00:00
Andriy Gapon	5b3ff7eced	6280 libzfs: unshare_one() could fail with EZFS_SHARENFSFAILED illumos/illumos-gate@d1672efb6f `d1672efb6f` https://www.illumos.org/issues/6280 The unshare_one() in libzfs could fail with EZFS_SHARENFSFAILED at line 834 here: 831 /* make sure libshare initialized / 832 if ((err = zfs_init_libshare(hdl, SA_INIT_SHARE_API)) != SA_OK) { 833 free(mntpt); / don't need the copy anymore */ 834 return (zfs_error_fmt(hdl, EZFS_SHARENFSFAILED, 835 dgettext(TEXT_DOMAIN, "cannot unshare '%s': %s"), 836 name, _sa_errorstr(err))); 837 } The correct error should be EZFS_UNSHARENFSFAILED instead. Reviewed by: Toomas Soome <tsoome@me.com> Reviewed by: Dan McDonald <danmcd@omniti.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Gordon Ross <gwr@nexenta.com> Author: Marcel Telka <marcel.telka@nexenta.com>	2017-04-14 18:51:16 +00:00
Andriy Gapon	aa64c9679a	6268 zfs diff confused by moving a file to another directory illumos/illumos-gate@aab04418a7 `aab04418a7` https://www.illumos.org/issues/6268 The zfs diff command presents a description of the changes that have occurred to files within a filesystem between two snapshots. If a file is renamed, the tool is capable of reporting this, e.g.: cd /some/zfs/dataset/subdir mv file0 file1 Will result in a diff record like: R /some/zfs/dataset/subdir/file0 -> /some/zfs/dataset/subdir/file1 Unfortunately, it seems that rename detection only uses the base filename to determine if a file has been renamed or simply modified. This leads to misreporting only the original filename, omitting the more relevant destination filename entirely. For example: cd /some/zfs/dataset/subdir mv file0 ../otherdir/file0 Will result in a diff entry: M /some/zfs/dataset/subdir/file0 But it should really emit: R /some/zfs/dataset/subdir/file0 -> /some/zfs/dataset/otherdir/file0 Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Justin Gibbs <gibbs@scsiguy.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Joshua M. Clulow <josh@sysmgr.org>	2017-04-14 18:49:44 +00:00
Andriy Gapon	c1290d6583	5814 bpobj_iterate_impl(): Close a refcount leak iterating on a sublist. illumos/illumos-gate@b67dde11a7 `b67dde11a7` https://www.illumos.org/issues/5814 Lets pull in this patch from freebsd: http://svnweb.freebsd.org/base?view=revision&revision=271781 bpobj_iterate_impl(): Close a refcount leak iterating on a sublist. If bpobj_space() returned non-zero here, the sublist would have been left open, along with the bonus buffer hold it requires. This call does not invoke any calls to bpobj_close() itself. This bug doesn't have any known vector, but was found on inspection. MFC after: 1 week Sponsored by: Spectra Logic Affects: All ZFS versions starting 21 May 2010 (illumos cde58dbc) MFSpectraBSD: r1050998 on 2014/03/26 Fix bpobj_iterate_impl() to properly call bpobj_close() if bpobj_space() returns an error. Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Paul Dagnelie <paul.dagnelie@delphix.com> Reviewed by: Simon Klinkert <simon.klinkert@gmail.com> Approved by: Gordon Ross <gwr@nexenta.com> Author: Will Andrews <will@freebsd.org>	2017-04-14 18:43:10 +00:00
Andriy Gapon	9a884731a3	6914 kernel virtual memory fragmentation leads to hang illumos/illumos-gate@af868f46a5 `af868f46a5` https://www.illumos.org/issues/6914 This change allows the kernel to use more virtual address space. This will allow us to devote 1.5x physmem for the zio arena, and an additional 1.5x physmem for the kernel heap. We saw a hang when unable to find any 128K contiguous memory segments. Looking at the core file we see many threads in stacks similar to this: > ffffff68c9c87c00::findstack -v stack pointer for thread ffffff68c9c87c00: ffffff02cd63d8b0 [ ffffff02cd63d8b0 _resume_from_idle+0xf4() ] ffffff02cd63d8e0 swtch+0x141() ffffff02cd63d920 cv_wait+0x70(ffffff6009b1b01e, ffffff6009b1b020) ffffff02cd63da50 vmem_xalloc+0x640(ffffff6009b1b000, 20000, 1000, 0, 0, 0, 0, ffffff0200000004) ffffff02cd63dac0 vmem_alloc+0x135(ffffff6009b1b000, 20000, 4) ffffff02cd63db60 segkmem_xalloc+0x171(ffffff6009b1b000, 0, 20000, 4, 0, fffffffffb885fe0, fffffffffbcefa10) ffffff02cd63dbc0 segkmem_alloc_vn+0x4a(ffffff6009b1b000, 20000, 4, fffffffffbcefa10) ffffff02cd63dbf0 segkmem_zio_alloc+0x20(ffffff6009b1b000, 20000, 4) ffffff02cd63dd20 vmem_xalloc+0x5b1(ffffff6009b1c000, 20000, 1000, 0, 0, 0, 0, 4) ffffff02cd63dd90 vmem_alloc+0x135(ffffff6009b1c000, 20000, 4) ffffff02cd63de20 kmem_slab_create+0x8d(ffffff605fd37008, 4) ffffff02cd63de80 kmem_slab_alloc+0x11e(ffffff605fd37008, 4) ffffff02cd63dee0 kmem_cache_alloc+0x233(ffffff605fd37008, 4) ffffff02cd63df10 zio_data_buf_alloc+0x5b(20000) ffffff02cd63df70 arc_get_data_buf+0x92(ffffff6265a70588, 20000, ffffff901fd796f8) ffffff02cd63dfb0 arc_buf_alloc_impl+0x9c(ffffff6265a70588, ffffff6d233ab0b8) Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Adam Leventhal <ahl@delphix.com> Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com> Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com> Approved by: Garrett D'Amore <garrett@damore.org> Author: Matthew Ahrens <mahrens@delphix.com>	2017-04-14 18:41:37 +00:00
Andriy Gapon	9f33176926	7256 low probability race in zfs_get_data illumos/illumos-gate@0c94e1af67 `0c94e1af67` https://www.illumos.org/issues/7256 error = dmu_sync(zio, lr->lr_common.lrc_txg, zfs_get_done, zgd); ASSERT(error \|\| lr->lr_length <= zp->z_blksz); It's possible, although extremely rare, that the zfs_get_done() callback is executed before dmu_sync() returns. In that case the znode's range lock is dropped and the znode is unreferenced. Thus, the assertion can access some invalid or wrong data via the zp pointer. size variable caches the correct value of z_blksz and can be safely used here. Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Andriy Gapon <andriy.gapon@clusterhq.com>	2017-04-14 18:38:53 +00:00
Andriy Gapon	2ccc5f6fd1	5379 modifying a mmap()-ed file does not update its timestamps illumos/illumos-gate@80e10fd0d2 `80e10fd0d2` https://www.illumos.org/issues/5379 The following is based on a review of the illumos code and on a similar problem reported for FreeBSD where the relevant code is different. Looking at this block of code http://src.illumos.org/source/xref/illumos-gate/ usr/src/uts/common/fs/zfs/zfs_vnops.c#4187 I see code to set up an sa_bulk_attr_t object, I see code to set up mtime and ctime values, but I do not see code to actually apply the attributes... I would expect there to be a call to sa_bulk_update(), there is such a call in zfs_write() for instance. mmap_write.c [Magnifier] - demo (1.42 KB) Andriy Gapon, 2015-11-11 01:53 PM Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Prashanth Sreenivasa <pks@delphix.com> Reviewed by: Dan McDonald <danmcd@omniti.com> Approved by: Gordon Ross <gordon.w.ross@gmail.com> Author: Andriy Gapon <andriy.gapon@clusterhq.com>	2017-04-14 18:38:21 +00:00
Andriy Gapon	847b286425	7955 libshare needs to initialize only those datasets being modified by the consumer illumos/illumos-gate@8a981c3356 `8a981c3356` https://www.illumos.org/issues/7955 Libshare currently initializes all available filesystems when doing any libshare operation. This requires iterating through all the filesystem multiple times, which is a huge performance problem for sharing and unsharing operations. Reviewed by: Steve Gonczi <steve.gonczi@delphix.com> Reviewed by: Sebastien Roy <sebastien.roy@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Yuri Pankov <yuri.pankov@gmail.com> Approved by: Gordon Ross <gordon.w.ross@gmail.com> Author: Daniel Hoffman <dj.hoffman@delphix.com>	2017-04-14 18:34:03 +00:00
Andriy Gapon	b7db959db3	6101 attempt to lzc_create() a filesystem under a volume results in a panic illumos/illumos-gate@b127fe3c05 `b127fe3c05` https://www.illumos.org/issues/6101 lzc_create(), or more correctly, zfs_ioc_create() does not reject an attempt to create a filesystem as a child of a volume, instead it proceeds to a crash. A crash stack obtained on FreeBSD: page fault while in kernel mode zap_leaf_lookup() fzap_lookup() zap_lookup_norm() zap_lookup() zfs_get_zplprop() zfs_fill_zplprops_impl() zfs_ioc_create() zfsdev_ioctl() devfs_ioctl_f() kern_ioctl() sys_ioctl() This crash happened with a kernel without debugging assertions. The immediate cause of crash appears to an attempt to interpret a zvol object as a zap object. For filesystems: #define MASTER_NODE_OBJ 1 For zvols: #define ZVOL_OBJ 1ULL #define ZVOL_ZAP_OBJ 2ULL So, I see two problems here: 1. an attempt to create a filesystem under a zvol should be rejected as early as possible, maybe in zfs_fill_zplprops() 2. maybe zap_lookup / zap_lockdir should reject objects that are not of one of the zap object types Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Andriy Gapon <avg@FreeBSD.org>	2017-04-14 18:33:20 +00:00
Andriy Gapon	d72c1a5bb1	8061 sa_find_idx_tab can be declared more type-safely illumos/illumos-gate@7f0bdb4257 `7f0bdb4257` https://www.illumos.org/issues/8061 sa_find_idx_tab() is declared as taking and returning "void *" parameters. These can be declared to be the specific types. Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Chris Williamson <chris.williamson@delphix.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Matthew Ahrens <mahrens@delphix.com>	2017-04-14 18:32:38 +00:00
Andriy Gapon	0843b4c8ae	8026 retire zfs_throttle_delay and zfs_throttle_resolution illumos/illumos-gate@6b03625981 `6b03625981` https://www.illumos.org/issues/8026 zfs_throttle_delay and zfs_throttle_resolution became disused since the new write throttling mechanism was introduced. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Andriy Gapon <avg@FreeBSD.org>	2017-04-14 18:32:12 +00:00
Andriy Gapon	c2745459d3	5380 receive of a send -p stream doesn't need to try renaming snapshots illumos/illumos-gate@471a88e499 `471a88e499` https://www.illumos.org/issues/5380 A stream created with zfs send -p -I contains properties of all snapshots of a given dataset as opposed to only properties of snapshots in a given range. Not only this is suboptimal but the receive code also does not filter properties by the range. So, properties of earlier snapshots would be updated even though the snapshots themselves are not in the stream (just their properties). Given that modifying the snapshot properties requires a TXG sync and that the snapshots are updated one by one the described behavior may lead to a sever performance penalty. Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Matt Ahrens <mahrens@delphix.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Andriy Gapon <avg@FreeBSD.org>	2017-04-14 18:30:22 +00:00
Andriy Gapon	45967327e2	8027 tighten up dsl_pool_dirty_delta illumos/illumos-gate@313ae1e182 `313ae1e182` https://www.illumos.org/issues/8027 dsl_pool_dirty_delta() should not wake up waiters when dp->dp_dirty_total == zfs_dirty_data_max, because they wait for dp_dirty_total to fall strictly below the threshold. It's probably very rare for that condition to occur, but it's better to have more accurate code. Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Andriy Gapon <avg@FreeBSD.org>	2017-04-14 18:29:35 +00:00
Andriy Gapon	bf644e9b6f	8023 Panic destroying a metaslab deferred range tree illumos/illumos-gate@3991b535a8 `3991b535a8` https://www.illumos.org/issues/8023 $C ffffff0011bc0970 vpanic() ffffff0011bc0a00 strlog() ffffff0011bc0a30 range_tree_destroy+0x72(ffffff043769ad00) ffffff0011bc0a70 metaslab_fini+0xd5(ffffff0449acf380) ffffff0011bc0ab0 vdev_metaslab_fini+0x56(ffffff0462bae800) ffffff0011bc0af0 spa_unload+0x9b(ffffff03e3dac000) ffffff0011bc0b70 spa_export_common+0x115(ffffff047f4b4000, 2, 0, 0, 0) ffffff0011bc0b90 spa_destroy+0x1d(ffffff047f4b4000) ffffff0011bc0bd0 zfs_ioc_pool_destroy+0x20(ffffff047f4b4000) ffffff0011bc0c80 zfsdev_ioctl+0x4d7(11400000000, 5a01, 8040190, 100003, ffffff03e1956b10, ffffff0011bc0e68) ffffff0011bc0cc0 cdev_ioctl+0x39(11400000000, 5a01, 8040190, 100003, ffffff03e1956b10, ffffff0011bc0e68) ffffff0011bc0d10 spec_ioctl+0x60(ffffff03d9153b00, 5a01, 8040190, 100003, ffffff03e1956b10, ffffff0011bc0e68, 0) ffffff0011bc0da0 fop_ioctl+0x55(ffffff03d9153b00, 5a01, 8040190, 100003, ffffff03e1956b10, ffffff0011bc0e68, 0) ffffff0011bc0ec0 ioctl+0x9b(3, 5a01, 8040190) ffffff0011bc0f10 _sys_sysenter_post_swapgs+0x149() Reviewed by: Brad Lewis <brad.lewis@delphix.com> Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: George Wilson <george.wilson@delphix.com>	2017-04-14 18:29:13 +00:00
Andriy Gapon	99988b024d	7885 zpool list can report 16.0e for expandsz illumos/illumos-gate@c040c10cdd `c040c10cdd` https://www.illumos.org/issues/7885 When a member of a RAIDZ has been replaced with a device smaller than the original, then the top level vdev can report its expand size as 16.0E. The reduced child asize causes the RAIDZ to have a vdev_asize lower than its vdev_max_asize which then results in an underflow during the calculation of the parents expand size. Also for RAIDZ vdevs the sum of their child vdev_min_asize could be smaller than the parents vdev_min_size. Fixed by: https://github.com/openzfs/openzfs/pull/296 Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Gordon Ross <gordon.w.ross@gmail.com> Author: Steven Hartland <steven.hartland@multiplay.co.uk>	2017-04-14 18:28:40 +00:00
Andriy Gapon	2d3fcc82a1	7990 libzfs: snapspec_cb() does not need to call zfs_strdup() illumos/illumos-gate@d8584ba6fb `d8584ba6fb` https://www.illumos.org/issues/7990 The snapspec_cb() callback function in libzfs does not need to call zfs_strdup(). Reviewed by: Yuri Pankov <yuri.pankov@gmail.com> Reviewed by: Toomas Soome <tsoome@me.com> Approved by: Matthew Ahrens <mahrens@delphix.com> Author: Marcel Telka <marcel@telka.sk>	2017-04-14 18:27:12 +00:00
Andriy Gapon	318cd645a2	7968 multi-threaded spa_sync() illumos/illumos-gate@94c2d0eb22 `94c2d0eb22` https://www.illumos.org/issues/7968 spa_sync() iterates over all the dirty dnodes and processes each of them by calling dnode_sync(). If there are many dirty dnodes (e.g. because we created or removed a lot of files), the single thread of spa_sync() calling dnode_sync () can become a bottleneck. Additionally, if many dnodes are dirtied concurrently in open context (e.g. due to concurrent file creation), the os_lock will experience lock contention via dnode_setdirty(). The solution is to track dirty dnodes on a multilist_t, and for spa_sync() to use separate threads to process each of the sublists in the multilist. On the concurrent file creation microbenchmark, the performance improvement from dnode_setdirty() is up to 7%. Additionally, the wall clock time spent in spa_sync() is reduced to 15%-40% of the single-threaded case. In terms of cost/ reward, once the other bottlenecks are addressed, fixing this bug will provide a medium-large performance gain and require a medium amount of effort to implement. Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Brad Lewis <brad.lewis@delphix.com> Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Dan McDonald <danmcd@omniti.com> Author: Matthew Ahrens <mahrens@delphix.com>	2017-04-14 18:26:24 +00:00
Andriy Gapon	260aaa4c3d	7970 zfs_arc_num_sublists_per_state should be common to all multilists illumos/illumos-gate@10fbdecb05 `10fbdecb05` https://www.illumos.org/issues/7970 The global tunable zfs_arc_num_sublists_per_state is used by the ARC and the dbuf cache, and other users are planned. We should change this tunable to be common to all multilists. Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Brad Lewis <brad.lewis@delphix.com> Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Dan McDonald <danmcd@omniti.com> Author: Matthew Ahrens <mahrens@delphix.com>	2017-04-14 18:25:49 +00:00
Andriy Gapon	bc390f4947	7801 add more by-dnode routines (lint) illumos/illumos-gate@411be58a6e `411be58a6e` https://www.illumos.org/issues/7801 Add _by_dnode() routines for accessing objects given their dnode_t , this is more efficient than accessing the object by (objset_t *, uint64_t object). This change converts some but not all of the existing consumers. As performance-sensitive code paths are discovered they should be converted to use these routines. Ported from: https://github.com/zfsonlinux/zfs/commit/ `0eef1bde31` Author: Matthew Ahrens <mahrens@delphix.com>	2017-04-14 18:25:22 +00:00
Andriy Gapon	fd870ba29a	7801 add more by-dnode routines illumos/illumos-gate@b0c42cd470 `b0c42cd470` https://www.illumos.org/issues/7801 Add _by_dnode() routines for accessing objects given their dnode_t , this is more efficient than accessing the object by (objset_t *, uint64_t object). This change converts some but not all of the existing consumers. As performance-sensitive code paths are discovered they should be converted to use these routines. Ported from: https://github.com/zfsonlinux/zfs/commit/ `0eef1bde31` Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: bzzz77 <bzzz.tomas@gmail.com>	2017-04-14 18:25:02 +00:00
Andriy Gapon	c9327351e8	7869 panic in bpobj_space(): null pointer dereference illumos/illumos-gate@a3905a4592 `a3905a4592` https://www.illumos.org/issues/7869 The issue fixed by this patch is a race condition in the deadlist code. A thread executing an administrative command that uses `dsl_deadlist_space_range()` holds the lock of the whole `deadlist_t` to protect the access of all its entries that the deadlist contains in an avl tree. Sync threads trying to insert a new entry in the deadlist (through `dsl_deadlist_insert()` -> `dle_enqueue()`) do not hold the deadlist lock at that moment. If the `dle_bpobj` is the empty bpobj (our sentinel value), we close and reopen it. Between these two operations, it is possible for the `dsl_deadlist_space_range()` thread to dereference that bpobj which is `NULL` during that window. Threads should hold the a deadlist's `dl_lock` when they manipulate its internal data so scenarios like the one above are avoided. In addition, threads should also hold the bpobj lock whenever they are allocating the subobj list of a bpobj, and not just when they actually insert the subobj to the list. This way we can avoid potential memory leaks. Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Steve Gonczi <steve.gonczi@delphix.com> Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed by: George Melikov <mail@gmelikov.ru> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Dan McDonald <danmcd@omniti.com> Author: Serapheim Dimitropoulos <serapheim@delphix.com>	2017-04-14 18:24:38 +00:00
Andriy Gapon	1ea6ff796d	7793 ztest fails assertion in dmu_tx_willuse_space illumos/illumos-gate@61e255ce72 `61e255ce72` https://www.illumos.org/issues/7793 Background information: This assertion about tx_space_* verifies that we are not dirtying more stuff than we thought we would. We “need” to know how much we will dirty so that we can check if we should fail this transaction with ENOSPC/EDQUOT, in dmu_tx_assign(). While the transaction is open (i.e. between dmu_tx_assign() and dmu_tx_commit() — typically less than a millisecond), we call dbuf_dirty() on the exact blocks that will be modified. Once this happens, the temporary accounting in tx_space_* is unnecessary, because we know exactly what blocks are newly dirtied; we call dnode_willuse_space() to track this more exact accounting. The fundamental problem causing this bug is that dmu_tx_hold_() relies on the current state in the DMU (e.g. dn_nlevels) to predict how much will be dirtied by this transaction, but this state can change before we actually perform the transaction (i.e. call dbuf_dirty()). This bug will be fixed by removing the assertion that the tx_space_ accounting is perfectly accurate (i.e. we never dirty more than was predicted by dmu_tx_hold_()). By removing the requirement that this accounting be perfectly accurate, we can also vastly simplify it, e.g. removing most of the logic in dmu_tx_count_(). The new tx space accounting will be very approximate, and may be more or less than what is actually dirtied. It will still be used to determine if this transaction will put us over quota. Transactions that are marked by dmu_tx_mark_netfree() will be excepted from this check. We won’t make an attempt to determine how much space will be freed by the transaction — this was rarely accurate enough to determine if a transaction should be permitted when we are over quota, which is why dmu_tx_mark_netfree() was introduced in 2014. We also won’t attempt to give “credit” when overwriting existing blocks, if those blocks may be freed. This allows us to remove the do_free_accounting logic in dbuf_dirty(), and associated routines. This Reviewed by: Steve Gonczi <steve.gonczi@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Robert Mustacchi <rm@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com>	2017-04-14 18:24:03 +00:00
Andriy Gapon	59dc4d6bbc	7816 remove static unused variable in zfs_vfsops.c illumos/illumos-gate@2e972bf18f `2e972bf18f` https://www.illumos.org/issues/7816 found by gcc6 build unused static variable: -static const fs_operation_def_t zfs_vfsops_eio_template[] = { - VFSNAME_FREEVFS, { .vfs_freevfs = zfs_freevfs }, - NULL, NULL -}; Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed by: Andy Stormont astormont@racktopsystems.com Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Igor Kozhukhov <igork@argotech.io>	2017-04-14 18:23:18 +00:00
Andriy Gapon	d3f85b54a4	7812 Remove gender specific language illumos/illumos-gate@48bbca8168 `48bbca8168` https://www.illumos.org/issues/7812 This change removes all gendered language that did not refer specifically to an individual person or pet. The convention taken was to use variations on "they" when referring to users and/or human beings, while using "it" when referring to code, functions, and/or libraries. Additionally, we took the liberty to fix up any whitespace issues that were found in any files that were already being modified. Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Steve Gonczi <steve.gonczi@delphix.com> Reviewed by: Chris Williamson <chris.williamson@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Igor Kozhukhov <igor@dilos.org> Reviewed by: Dan McDonald <danmcd@omniti.com> Reviewed by: Robert Mustacchi <rm@joyent.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Daniel Hoffman <dj.hoffman@delphix.com>	2017-04-14 18:22:42 +00:00

1 2 3 4 5 ...

586 Commits