freebsd-dev

Author	SHA1	Message	Date
Andriy Gapon	c01a79842c	6051 lzc_receive: allow the caller to read the begin record illumos/illumos-gate@620f322510 `620f322510` https://www.illumos.org/issues/6051 Currently lzc_receive() requires that its snapname argument is a snapshot name (contains ''). @zfs receive allows to specify just a dataset name and would try to deduce the snapshot name from the stream. I propose to allow lzc_receive() to do the same. That seems to be quite easy to implement, it requires only a small amount of logic, it does not require any additional system calls or any additional data from the stream. The benefit is that the new behavior would allow to keep the snapshot names the same between the sender and receiver at zero cost, without a need to pass the names out of band. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Andriy Gapon <avg@icyb.net.ua>	2016-11-02 17:32:31 +00:00
Andriy Gapon	0a682531f4	3746 ZRLs are racy illumos/illumos-gate@260af64db7 `260af64db7` https://www.illumos.org/issues/3746 From the original change log: It was possible for a reference to be added even with the lock held, and for references added just after a lock release to be lost. This bug was also independently found and reported in wesunsolve.net issues 6985013 6995524. In zrl_add(), always use an atomic operation to update the refcount. The mutex in the ZRL only guarantees that wakeups occur for waiters on the lock. It offers no protection against concurrent updates of the refcount. The only refcount transition that is safe to perform without an atomic operation is from ZRL_LOCKED back to 0, since this can only be performed by the thread which has the ZRL locked. Authored by: Will Andrews <will@freebsd.org> Reviewed by: Boris Protopopov <bprotopopov@hotmail.com> Reviewed by: Pavel Zakharov <pavel.zakha@gmail.com> Reviewed by: Yuri Pankov <yuri.pankov@gmail.com> Reviewed by: Justin T. Gibbs <gibbs@scsiguy.com> Approved by: Matt Ahrens <mahrens@delphix.com> Author: Youzhong Yang <yyang@mathworks.com>	2016-10-27 07:11:31 +00:00
Alexander Motin	760c70cd25	7301 zpool export -f should be able to interrupt file freeing Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com> Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com> Reviewed by: John Kennedy <john.kennedy@delphix.com> Author: Alek Pinchuk <alek@nexenta.com> Closes #175	2016-10-14 11:49:36 +00:00
Alexander Motin	d503f63e1b	6988 spa_sync() spends half its time in dmu_objset_do_userquota_updates Using a benchmark which creates 2 million files in one TXG, I observe that the thread running spa_sync() is on CPU almost the entire time we are syncing, and therefore can be a performance bottleneck. About 50% of the time in spa_sync() is in dmu_objset_do_userquota_updates(). The problem is that dmu_objset_do_userquota_updates() calls zap_increment_int(DMU_USERUSED_OBJECT) once for every file that was modified (or created). In this benchmark, all the files are owned by the same user/group, so all 2 million calls to zap_increment_int() are modifying the same entry in the zap. The same issue exists for the DMU_GROUPUSED_OBJECT. We should keep an in-memory map from user to space delta while we are syncing, and when we finish, iterate over the in-memory map and modify the ZAP once per entry. This reduces the number of calls to zap_increment_int() from "number of objects modified" to "number of owners/groups of modified files". This reduced the time spent in spa_sync() in the file create benchmark by ~33%, from 11 seconds to 7 seconds. Closes #107 Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Steve Gonczi <steve.gonczi@delphix.com> Reviewed by: Ned Bass <bass6@llnl.gov> Reviewed by: Jinshan Xiong <jinshan.xiong@intel.com> Author: Matthew Ahrens <mahrens@delphix.com> openzfs/openzfs@5fc46359c5	2016-10-14 11:46:17 +00:00
Alexander Motin	23f49f4ada	5120 zfs should allow large block/gzip/raidz boot pool (loader project) Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com> Reviewed by: Andrew Stormont <andyjstormont@gmail.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Toomas Soome <tsoome@me.com> openzfs/openzfs@c8811bd3e2	2016-10-14 11:41:06 +00:00
Alexander Motin	7fd98f779f	7402 Create tunable to ignore hole_birth feature Until we can resolve the numerous hole_birth bugs that have cropped up recently, and come up with a way going forwards to protect users from corruption, we should disable the hole_birth feature. Using a tunable allows those who are confident that their data is correct to continue to take advantage of the feature. Closes #188 Reviewed by: Matthew Ahrens <mahrens@delphix.com> Author: Paul Dagnelie <pcd@delphix.com>	2016-09-28 23:46:08 +00:00
Alexander Motin	1c13b49766	7254 ztest failed assertion in ztest_dataset_dirobj_verify: dirobjs + 1 == usedobjs dsl_dataset_space is looking at the ds_bp's fill count while dmu_objset_write_ready() is concurrently modifying it. This fix adds an rrwlock to protect the ds_bp. Closes #180 Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Steve Gonczi <steve.gonczi@delphix.com> Author: Paul Dagnelie <pcd@delphix.com>	2016-09-28 23:44:13 +00:00
Alexander Motin	ccb5d7b8e8	7259 DS_FIELD_LARGE_BLOCKS is unused The DS_FIELD_LARGE_BLOCKS macro has been unused since the integration of this patch: commit ca0cc3918a1789fa839194af2a9245f801a06b1a Author: Matthew Ahrens <mahrens@delphix.com> Date: Fri Jul 24 09:53:55 2015 -0700 5959 clean up per-dataset feature count code Reviewed by: Toomas Soome <tsoome@me.com> Reviewed by: George Wilson <george@delphix.com> Reviewed by: Alex Reece <alex@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> This patch simply removes this macro from dsl_dataset.h. Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Dan McDonald <danmcd@omniti.com> Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com> Author: Matthew Ahrens <mahrens@delphix.com>	2016-09-07 20:07:39 +00:00
Alexander Motin	03d951aef3	7278 tuning zfs_arc_max does not impact arc_c_min When changing zfs_arc_max (e.g. as zdb does), it may be set to less than the default arc_c_min. arc_c_min should decrease to not be more than arc_c_max, but it doesn't; therefore tuning of arc_c_max is ineffective. Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Paul Dagnelie <paul.dagnelie@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com> Author: Matthew Ahrens <mahrens@delphix.com> openzfs/openzfs@608764bead	2016-09-07 20:00:22 +00:00
Alexander Motin	72a9a6ded9	7004 dmu_tx_hold_zap() does dnode_hold() 7x on same object Using a benchmark which has 32 threads creating 2 million files in the same directory, on a machine with 16 CPU cores, I observed poor performance. I noticed that dmu_tx_hold_zap() was using about 30% of all CPU, and doing dnode_hold() 7 times on the same object (the ZAP object that is being held). dmu_tx_hold_zap() keeps a hold on the dnode_t the entire time it is running, in dmu_tx_hold_t:txh_dnode, so it would be nice to use the dnode_t that we already have in hand, rather than repeatedly calling dnode_hold(). To do this, we need to pass the dnode_t down through all the intermediate calls that dmu_tx_hold_zap() makes, making these routines take the dnode_t* rather than an objset_t* and a uint64_t object number. In particular, the following routines will need to have analogous *_by_dnode() variants created: dmu_buf_hold_noread() dmu_buf_hold() zap_lookup() zap_lookup_norm() zap_count_write() zap_lockdir() zap_count_write() This can improve performance on the benchmark described above by 100%, from 30,000 file creations per second to 60,000. (This improvement is on top of that provided by working around the object allocation issue. Peak performance of ~90,000 creations per second was observed with 8 CPUs; adding CPUs past that decreased performance due to lock contention.) The CPU used by dmu_tx_hold_zap() was reduced by 88%, from 340 CPU-seconds to 40 CPU-seconds. Sponsored by: Intel Corp. Closes #109 Reviewed by: Steve Gonczi <steve.gonczi@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Ned Bass <bass6@llnl.gov> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Author: Matthew Ahrens <mahrens@delphix.com> openzfs/openzfs@d3e523d489	2016-09-03 10:54:56 +00:00
Alexander Motin	6ee1596f02	7247 zfs receive of deduplicated stream fails This resolves two 'zfs recv' issues. First, when receiving into an existing filesystem, a snapshot created during the receive process is not added to the guid->dataset map for the stream, resulting in failed lookups for deduped streams when a WRITE_BYREF record refers to a snapshot received earlier in the stream. Second, the newly created snapshot was also not set properly, referencing the snapshot before the new receiving dataset rather than the existing filesystem. Closes #159 Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Author: Chris Williamson <chris.williamson@delphix.com> openzfs/openzfs@b09697c8c1	2016-09-03 10:50:43 +00:00
Alexander Motin	7b84e6dc6f	7003 zap_lockdir() should tag hold zap_lockdir() / zap_unlockdir() should take a "void *tag" argument which tags the hold on the zap. This will help diagnose programming errors which misuse the hold on the ZAP. Sponsored by: Intel Corp. Closes #108 Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Steve Gonczi <steve.gonczi@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Author: Matthew Ahrens <mahrens@delphix.com> openzfs/openzfs@0780b3eab5	2016-09-03 10:48:48 +00:00
Mark Johnston	2e27b48e02	7297 clear() on llquantize aggregation causes dtrace to exit 7298 printa() of multiple aggregations can fail for llquantize() illumos/illumos-gate@0ddc0ebb74 Reviewed by: Patrick Mooney <patrick.mooney@joyent.com> Reviewed by: Robert Mustacchi <rm@joyent.com> Reviewed by: Dan McDonald <danmcd@omniti.com> Reviewed by: Adam Leventhal <adam.leventhal@gmail.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Bryan Cantrill <bryan@joyent.com>	2016-08-26 20:51:09 +00:00
Andriy Gapon	9d4cd9e2d9	7277 zdb should be able to print zfs_dbgmsg's illumos/illumos-gate@29bdd2f916 `29bdd2f916` https://www.illumos.org/issues/7277 ztest always prints the debug messages (zfs_dbgmsg()) by calling zfs_dbgmsg_print(). We should add a flag to zdb to make it do this as well before exiting. Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Pavel Zakharov <pavel.zakharov@delphix.com>	2016-08-15 14:24:47 +00:00
Andriy Gapon	4b498e5e9d	7136 ESC_VDEV_REMOVE_AUX ought to always include vdev information 7115 6922 generates ESC_ZFS_VDEV_REMOVE_AUX a bit too often illumos/illumos-gate@b72b6bb10a `b72b6bb10a` https://www.illumos.org/issues/7136 6922 added ESC_ZFS_VDEV_REMOVE_AUX and ESC_ZFS_VDEV_REMOVE_DEV sysevents whenever an aux device gets removed from a pool. However, those sysevents will be created without the vdev_guid and vdev_path fields. It would be better to always populate those fields. https://www.illumos.org/issues/7115 The addition of spa_event_notify in vdev removal code (see #6922) causes events to be generated even if the spare failed to be removed with EBUSY. Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net> Approved by: Robert Mustacchi <rm@joyent.com> Author: Alan Somers <asomers@gmail.com>	2016-08-15 14:23:50 +00:00
Andriy Gapon	cb0a9ee48f	7230 add assertions to dmu_send_impl() to verify that stream includes BEGIN and END records illumos/illumos-gate@12b90ee2d3 `12b90ee2d3` https://www.illumos.org/issues/7230 A test failure occurred where a send stream had only a BEGIN record. This should not be possible if the send returns without error. Prevented this from happening in the future by adding an assertion to dmu_send_impl() to verify that if the function returns 0 (success) both a BEGIN and END record are present. Did this by adding flags to dmu_sendarg_t (indicating whether BEGIN or END records sent), having dump_record() set flags appropriately, adding VERIFY statement to dmu_send_impl(). Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Matt Krantz <matt.krantz@delphix.com>	2016-08-15 14:22:12 +00:00
Andriy Gapon	2a7aac5a99	7235 remove unused func dsl_dataset_set_blkptr illumos/illumos-gate@bd56f80007 `bd56f80007` https://www.illumos.org/issues/7235 The function dsl_dataset_set_blkptr() is unused. We should remove it. Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Alex Reece <alex@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com>	2016-08-15 14:21:46 +00:00
Andriy Gapon	2e97b962e7	7090 zfs should improve allocation order and throttle allocations illumos/illumos-gate@0f7643c737 `0f7643c737` https://www.illumos.org/issues/7090 When write I/Os are issued, they are issued in block order but the ZIO pipeline will drive them asynchronously through the allocation stage which can result in blocks being allocated out-of-order. It would be nice to preserve as much of the logical order as possible. In addition, the allocations are equally scattered across all top-level VDEVs but not all top-level VDEVs are created equally. The pipeline should be able to detect devices that are more capable of handling allocations and should allocate more blocks to those devices. This allows for dynamic allocation distribution when devices are imbalanced as fuller devices will tend to be slower than empty devices. The change includes a new pool-wide allocation queue which would throttle and order allocations in the ZIO pipeline. The queue would be ordered by issued time and offset and would provide an initial amount of allocation of work to each top-level vdev. The allocation logic utilizes a reservation system to reserve allocations that will be performed by the allocator. Once an allocation is successfully completed it's scheduled on a given top-level vdev. Each top- level vdev maintains a maximum number of allocations that it can handle (mg_alloc_queue_depth). The pool-wide reserved allocations (top-levels * mg_alloc_queue_depth) are distributed across the top-level vdevs metaslab groups and round robin across all eligible metaslab groups to distribute the work. As top-levels complete their work, they receive additional work from the pool-wide allocation queue until the allocation queue is emptied. Reviewed by: Adam Leventhal <ahl@delphix.com> Reviewed by: Alex Reece <alex@delphix.com> Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Paul Dagnelie <paul.dagnelie@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Sebastien Roy <sebastien.roy@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: George Wilson <george.wilson@delphix.com>	2016-08-15 14:21:10 +00:00
Mark Johnston	8a411dff07	7085 add support for "if" and "else" statements in dtrace illumos/illumos-gate@c3bd3abd88 Add syntactic sugar to dtrace: "if" and "else" statements. The sugar is baked down to standard dtrace features by adding additional clauses with the appropriate predicates. Reviewed by: Adam Leventhal <ahl@delphix.com> Reviewed by: Sebastien Roy <sebastien.roy@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Bryan Cantrill <bryan@joyent.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Matthew Ahrens <mahrens@delphix.com>	2016-08-13 19:57:36 +00:00
Mark Johnston	da3d1eecc5	5396 fix longjmp clobbering errors illumos/illumos-gate@67a4bb8f9a Reviewed by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net> Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Gary Mills <gary_mills@fastmail.fm>	2016-08-13 19:54:32 +00:00
Andriy Gapon	053b1b4d59	7164 zdb should be able to open the root dataset illumos/illumos-gate@b702644a6e `b702644a6e` https://www.illumos.org/issues/7164 If the pool/dataset command-line argument is specified with a trailing slash, for example, "tank/", we should interpret it as the topmost dataset (rather than the whole pool) Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Tim Chase <tim@chase2k.com>	2016-07-20 09:58:10 +00:00
Andriy Gapon	0442be9aa8	6391 Override default SPA config location via environment illumos/illumos-gate@ae24175b2b `ae24175b2b` https://www.illumos.org/issues/6391 When using zdb with non-default SPA config file it is not convenient to add -U <non-default-config-file-path> all the time. This commit introduces support for setting/overriding SPA config location via environment variable 'SPA_CONFIG_PATH'. If -U flag is specified in the command line it will override any other value as usual. `64d7b6cf75` Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Richard Yao <ryao@gentoo.org> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Will Andrews <will@freebsd.org> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Cyril Plisko <cyril.plisko@mountall.com>	2016-07-20 09:57:16 +00:00
Andriy Gapon	6c910ac2c4	7163 ztest failures due to excess error injection illumos/illumos-gate@f34284d835 `f34284d835` https://www.illumos.org/issues/7163 Running zloop from zfs-precommit hit this assertion: *panicstr/s 0xfffffd7fd7419370: assertion failed for thread 0xfffffd7fe29ed240, thread-id 577: parent != NULL, file ../../../uts/common/fs/zfs/dbuf.c, line 1827 $c libc.so.1`_lwp_kill+0xa() libc.so.1`_assfail+0x182(fffffd7ffb1c29fa, fffffd7ffb1cc028, 723) libc.so.1`assfail+0x19(fffffd7ffb1c29fa, fffffd7ffb1cc028, 723) libzpool.so.1`dbuf_dirty+0xc69(10e3bc10, 3601700) libzpool.so.1`dbuf_dirty+0x61e(10c73640, 3601700) libzpool.so.1`dbuf_dirty+0x61e(10e28280, 3601700) libzpool.so.1`dmu_buf_will_fill+0x64(10e28280, 3601700) libzpool.so.1`dmu_write+0x1b6(2c7e640, d, 400000002e000000, 200, 3717b40, 3601700) ztest_replay_write+0x568(4950d0, 3717a80, 0) ztest_write+0x125(4950d0, d, 400000002e000000, 200, 413f000) ztest_io+0x1bb(4950d0, d, 400000002e000000) ztest_dmu_write_parallel+0xaa(4950d0, 6) ztest_execute+0x83(1, 420c98, 6) ztest_thread+0xf4(6) libc.so.1`_thrp_setup+0x8a(fffffd7fe29ed240) libc.so.1`_lwp_start() This is another manifestation of ECKSUM in ztest: The lowest level ancestor that’s in memory is the L8 (topmost). The L7 ancestor is blkid 0x10: ::dbufs -O 0x2c7e640 -o d -l 7 \|::dbuf addr object lvl blkid holds os 600be50 d 7 4 1 ztest/ds_6 719d880 d 7 0 4 ztest/ds_6 Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com>	2016-07-20 09:54:18 +00:00
Andriy Gapon	75f99b28a3	6451 ztest fails due to checksum errors illumos/illumos-gate@f9eb9fdf19 `f9eb9fdf19` https://www.illumos.org/issues/6451 Sometimes ztest fails because zdb detects checksum errors. e.g.: Traversing all blocks to verify checksums and verify nothing leaked ... zdb_blkptr_cb: Got error 50 reading <71, 47, 0, 8000160> DVA0=<0:1cc2000: 180000> [L0 other uint64[]] sha256 uncompressed LE contiguou s unique single size=100000L/100000P birth=271L/271P fill=1 cksum=c5a3e27d1ed0f894:843bca3a5473c4bf:f76a19b6830a2e4:91292591613a12bf -- skipping zdb_blkptr_cb: Got error 50 reading <71, 47, 0, 800000180> DVA0=<0:ce16800: 180000> [L0 other uint64[]] sha256 uncompressed LE contigu ous unique single size=100000L/100000P birth=840L/840P fill=1 cksum=5d018f3d061e17f3:6d1584784587bf63:2805a74a0ce37369:ba68a214806c7e75 -- skipping zdb_blkptr_cb: Got error 50 reading <71, 47, 0, 1000000360> DVA0=<0:10d37400: 180000> [L0 other uint64[]] sha256 uncompressed LE conti guous unique single size=100000L/100000P birth=904L/904P fill=1 cksum=fa1e11d4138bd14b:86c9488c444473e3:f31e43c72e72e46b:e3446472d1174d ba -- skipping zdb_blkptr_cb: Got error 50 reading <71, 47, 0, 400000002c0> DVA0=<0:127ef400: 180000> [L0 other uint64[]] sha256 uncompressed LE cont iguous dedup single size=100000L/100000P birth=549L/549P fill=1 cksum=30e14955ebf13522:66dc2ff8067e6810:4607e750abb9d3b3:6582b8af909fcb 58 -- skipping zdb_blkptr_cb: Got error 50 reading <657, 5, 0, 1c0> DVA0=<0:1a180400:180000> [L0 other uint64[]] fletcher4 uncompressed LE contiguou s unique single size=100000L/100000P birth=1091L/1091P fill=1 cksum=a6cf1e50: 29b3bd01c57e5:36779b914035db9a:db61cdcf6bec56f0 -- skippin g The problem is that ztest_fault_inject() can inject multiple faults into the same block. It is designed such that it can inject errors on all leafs of a RAID-Z or mirror, but for a given range of offsets, it will only inject errors Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Jorgen Lundman <lundman@lundman.net> Approved by: Dan McDonald <danmcd@omniti.com> Author: Matthew Ahrens <mahrens@delphix.com>	2016-07-20 09:53:46 +00:00
Andriy Gapon	6778adc153	7147 ztest: ztest_ddt_repair fails with ztest_pattern_match assertion illumos/illumos-gate@aab8072633 `aab8072633` https://www.illumos.org/issues/7147 Here's the dbuf we're currently reading: 966f200::dbuf addr object lvl blkid holds os 966f200 4 0 0 1 ztest/ds_3 966f200::print dmu_buf_t db_data db_data = 0x9ae0400 0x9ae0400/10J 0x9ae0400: c1c7ced932020d c1c7ced932020d c1c7ced932020d c1c7ced932020d c1c7ced932020d c1c7ced932020d c1c7ced932020d c1c7ced932020d c1c7ced932020d c1c7ced932020d The pattern we're expecting is actually this: a34ae10b5f2db2. If we attempt to read the block on disk we find that it has matches what ztest_ddt_repair() would have written: ~c1c7ced932020d=J ff3e383126cdfdf2 966f200::print dmu_buf_impl_t db_blkptr \| ::blkptr DVA0=<0:71d3c00:800> [L0 UINT64_OTHER] SHA256 OFF LE contiguous dedup single size=400L/400P birth=55L/55P fill=1 cksum=18486450d3ce8c6d:75a72f4bbf117b0f:2d3a226314eb5650:2eb0fd68648b1af0 1. zdb -U /rpool/tmp/zpool.cache -R ztest 0:71d3c00:800 \| head Found vdev type: mirror 0:71d3c00:800 0 1 2 3 4 5 6 7 8 9 a b c d e f 0123456789abcdef 000000: ff3e383126cdfdf2 ff3e383126cdfdf2 ...&18>....&18>. 000010: ff3e383126cdfdf2 ff3e383126cdfdf2 ...&18>....&18>. 000020: ff3e383126cdfdf2 ff3e383126cdfdf2 ...&18>....&18>. 000030: ff3e383126cdfdf2 ff3e383126cdfdf2 ...&18>....&18>. 000040: ff3e383126cdfdf2 ff3e383126cdfdf2 ...&18>....&18>. 000050: ff3e383126cdfdf2 ff3e383126cdfdf2 ...&18>....&18>. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: George Wilson <george.wilson@delphix.com>	2016-07-20 09:51:33 +00:00
Andriy Gapon	bcb0ab1eb8	7086 ztest attempts dva_get_dsize_sync on an embedded blockpointer illumos/illumos-gate@926549256b `926549256b` https://www.illumos.org/issues/7086 In dbuf_dirty(), we need to grab the dn_struct_rwlock before looking at the db_blkptr, to prevent it from being changed by syncing context. Otherwise we may see that ztest got a segfault from this stack: libzpool.so.1`dva_get_dsize_sync+0x98(872f000, `b32b240`, fed7811b, 0, b4cda20, 0) libzpool.so.1`bp_get_dsize+0x60(872f000, `b32b240`, 0, 97cb780, 9d4c1a8, 0) libzpool.so.1`dbuf_dirty+0x9b3(ce0a100, 97cb780, 9, fecd2530) libzpool.so.1`dmu_buf_will_dirty+0xc3(ce0a100, 97cb780, ea293d6c, 1) libzpool.so.1`zap_lockdir+0x1a0(8aaa3c0, 1, 0, 97cb780, 1, 1) libzpool.so.1`zap_remove_norm+0x30(8aaa3c0, 1, 0, 8728b10, 0, 97cb780) libzpool.so.1`zap_remove+0x29(8aaa3c0, 1, 0, 8728b10, 97cb780, a) ztest_replay_remove+0x225(ea294588, 8728ae8, 0, 38010000, 0, 0) ztest_remove+0x9f(ea294588, ea293f50, 4, 3) ztest_object_init+0x78(ea294588, ea293f50, 4e0, 1) ztest_dmu_object_alloc_free+0x71(ea294588, 13) ztest_dmu_objset_create_destroy+0x224(80cef08, 13, 0, 805d36c, 9017ad44, 0) ztest_execute+0x89(a, 807c720, 13, 0) ztest_thread+0xea(13, 0, 0, 0) libc.so.1`_thrp_setup+0x88(f0983240) libc.so.1`_lwp_start(f0983240, 0, 0, 0, 0, 0) Looking into it a bit, we see that this is an embedded blockpointer, so BP_GET_NDVAS should have returned 0: b32b240::blkptr EMBEDDED [L0 ZAP_OTHER] et=0 LZ4 size=200L/4aP birth=80L Instead, it looks like another thread is modifying this blockpointer: b32b240::ugrep \| ::whatis f47a0e0c is in [ stack tid=0x19f ] ebd6ec40 is in [ stack tid=0x226 ] ea293bd0 is in [ stack tid=0x244 ] ea293be4 is in [ stack tid=0x244 ] Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com>	2016-07-20 09:49:09 +00:00
Andriy Gapon	76fbf360a4	7072 zfs fails to expand if lun added when os is in shutdown state illumos/illumos-gate@c39a2aae1e `c39a2aae1e` https://www.illumos.org/issues/7072 upstream: 38733 zfs fails to expand if lun added when os is in shutdown state DLPX-36910 spares and caches should not display expandable space DLPX-39262 vdev_disk_open spam zfs_dbgmsg buffer Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Alex Reece <alex@delphix.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: George Wilson <george.wilson@delphix.com>	2016-07-20 09:47:35 +00:00
Andriy Gapon	6155b9e07c	7104 increase indirect block size illumos/illumos-gate@4b5c8e93ca `4b5c8e93ca` https://www.illumos.org/issues/7104 The current default indirect block size is 16KB. We can improve performance by increasing it to 128KB. This is especially helpful for any workload that needs to read most of the metadata, e.g. scrub/resilver, file deletion, filesystem deletion, and zfs send. We also need to fix a few space estimation errors to make the tests pass. Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Dan McDonald <danmcd@omniti.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com>	2016-07-18 07:03:39 +00:00
Andriy Gapon	869ea71a92	7071 lzc_snapshot does not fill in errlist on ENOENT illumos/illumos-gate@25f7d993ad `25f7d993ad` https://www.illumos.org/issues/7071 upstream DLPX-40482 lzc_snapshot does not fill in errlist on ENOENT Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com>	2016-07-18 06:58:39 +00:00
Andriy Gapon	fcc8f0a6e5	6950 ARC should cache compressed data illumos/illumos-gate@dcbf3bd6a1 `dcbf3bd6a1` https://www.illumos.org/issues/6950 When reading compressed data from disk, the ARC should keep the compressed block cached and only decompress it when consumers access the block. The uncompressed data should be short-lived allowing the ARC to cache a much larger amount of data. The DMU would also maintain a smaller cache of uncompressed blocks to minimize the impact of decompressing frequently accessed blocks. Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Don Brady <don.brady@intel.com> Reviewed by: Richard Elling <Richard.Elling@RichardElling.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: George Wilson <george.wilson@delphix.com>	2016-07-18 06:57:24 +00:00
Andriy Gapon	298580cb35	6447 handful of nvpair cleanups illumos/illumos-gate@759e89be35 `759e89be35` https://www.illumos.org/issues/6447 I got a patch from someone who uses nvpair code outside of illumos. It fixes a couple of gcc warnings/bugs for him. 1. silence uninitialized use warnings 2. add parentheses around assignment used as truth value 3. fix printf format specifier (ll is for integers only) 4. strstr, strspn, strcspn, and strcmp are declared in string.h, not strings.h. 5. avoid scanning integer into boolean variable Reviewed by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net> Reviewed by: Andy Stormont <astormont@racktopsystems.com> Reviewed by: Garrett D'Amore <garrett@damore.org> Approved by: Robert Mustacchi <rm@joyent.com> Author: Steve Dougherty <sdougherty@barracuda.com>	2016-07-12 12:05:58 +00:00
Andriy Gapon	0957649a03	7082 bptree_iterate() passes wrong args to zfs_dbgmsg() illumos/illumos-gate@10e67aa0db `10e67aa0db` https://www.illumos.org/issues/7082 upstream DLPX-40542 bptree_iterate() passes wrong args to zfs_dbgmsg() Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Matthew Ahrens <mahrens@delphix.com>	2016-07-12 12:03:00 +00:00
Andriy Gapon	f728d00c4d	6314 buffer overflow in dsl_dataset_name illumos/illumos-gate@9adfa60d48 `9adfa60d48` https://www.illumos.org/issues/6314 Callers of dsl_dataset_name pass a buffer of size ZFS_MAXNAMELEN, but dsl_dataset_name copies the datasets' name PLUS the snapshot name to it, resulting in a max of 2 * ZFS_MAXNAMELEN + '@'. Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Matthew Ahrens <mahrens@delphix.com>	2016-07-12 12:01:54 +00:00
Andriy Gapon	dc71048043	6931 lib/libzfs: cleanup gcc warnings illumos/illumos-gate@88f61dee20 `88f61dee20` https://www.illumos.org/issues/6931 need cleanup: CERRWARN += -_gcc=-Wno-switch CERRWARN += -_gcc=-Wno-parentheses CERRWARN += -_gcc=-Wno-unused-function Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Igor Kozhukhov <ikozhukhov@gmail.com>	2016-07-12 12:00:31 +00:00
Andriy Gapon	febe58b078	6872 zfs libraries should not allow uninitialized variables illumos/illumos-gate@f83b46baf9 `f83b46baf9` https://www.illumos.org/issues/6872 We compile the zfs libraries with -Wno-uninitialized. We should remove this. Change makefiles, fix new warnings, fix pbchk errors. Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Paul Dagnelie <pcd@delphix.com>	2016-07-12 11:59:25 +00:00
Andriy Gapon	aecdfa442a	4521 zfstest is trying to execute evil "zfs unmount -a" illumos/illumos-gate@8808ac5dae `8808ac5dae` https://www.illumos.org/issues/4521 zfstest is trying to execute evil "zfs unmount -a", which fails (fortunately, as it would otherwise leave me with my ~ missing): 03:44:11.86 cannot unmount '/export/home/yuri': Device busy cannot unmount '/ export/home': Device busy 03:44:11.86 ERROR: /usr/sbin/zfs unmount -a exited 1 This affects, at least, zfs_mount_009_neg and zfs_mount_all_001_pos, both failing on that step. The pool containing the /export/home hierarchy is included in KEEP variable, but it doesn't seem to affect anything here. Reviewed by: Andriy Gapon <avg@FreeBSD.org> Reviewed by: Dan McDonald <danmcd@omniti.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: John Kennedy <john.kennedy@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Yuri Pankov <yuri.pankov@nexenta.com>	2016-07-12 11:58:04 +00:00
Andriy Gapon	9b1e23defb	5813 zfs_setprop_error(): Handle errno value E2BIG. illumos/illumos-gate@6fdcb3d1c2 `6fdcb3d1c2` https://www.illumos.org/issues/5813 Lets pull in this patch from freebsd: http://svnweb.freebsd.org/base?view=revision&revision=271764 zfs_setprop_error(): Handle errno value E2BIG. This errno value is emitted by dsl_props_set_check() in sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_prop.c, and is used to mean that the property value is too long. For the record, the maximum length is ZAP_MAXVALUELEN, which is 8*1024 bytes. Instead of claiming an unknown error (and abort()ing), provide something more specific to the scenario involved. As far as I can tell, E2BIG is not emitted for any other scenario. MFC after: 1 week Sponsored by: Spectra Logic Affects: All ZFS versions starting 27 Feb 2009 (illumos ccba0801) This change modified the value returned by dsl_props_set_check(), so that it can distinguish between a name that's too long and a value that's too long, but libzfs was not updated accordingly. MFSpectraBSD: r1051499 on 2014/03/28 11:07:59 Reviewed by: Paul Dagnelie <paul.dagnelie@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Richard Elling <richard.elling@richardelling.com> Approved by: Garrett D'Amore <garrett@damore.org> Author: Will Andrews <will@freebsd.org>	2016-07-12 11:56:45 +00:00
Andriy Gapon	9aa175f249	6873 zfs_destroy_snaps_nvl leaks errlist illumos/illumos-gate@4cde22c299 `4cde22c299` https://www.illumos.org/issues/6873 lzc_destroy_snaps() returns an nvlist in errlist. zfs_destroy_snaps_nvl() should nvlist_free() it before returning. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Chris Williamson <chris.williamson@delphix.com>	2016-07-12 11:54:25 +00:00
Andriy Gapon	8e810a64c6	6879 incorrect endianness swap for drr_spill.drr_length in libzfs_sendrecv.c illumos/illumos-gate@20fea7a474 `20fea7a474` https://www.illumos.org/issues/6879 In libzfs_sendrecv, there's a typo: case DRR_SPILL: if (byteswap) { drr->drr_u.drr_write.drr_length = BSWAP_64(drr->drr_u.drr_spill.drr_length); } Instead of drr_write.drr_length, we should be assigning the result of the byteswap to drr_spill.drr_length. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Dan Kimmel <dan.kimmel@delphix.com>	2016-07-12 11:52:33 +00:00
Andriy Gapon	ec16b455f0	6111 zfs send should ignore datasets created after the ending snapshot illumos/illumos-gate@4a20c933b1 `4a20c933b1` https://www.illumos.org/issues/6111 If you create a zfs child folder, zfs send returns an error when a recursive incremental send is done between two snapshots made prior to the folder creation. The problem can be reproduced with the following steps. root@zfs:/# zfs create pool/test root@zfs:/# zfs snapshot pool/test@snap1 root@zfs:/# zfs snapshot pool/test@snap2 root@zfs:/# zfs create pool/test/child root@zfs:/# zfs send -R -I pool/test@snap1 pool/test@snap2 > /dev/null WARNING: could not send pool/test/child@snap2: does not exist WARNING: could not send pool/test/child@snap2: does not exist root@zfs:/# echo $? 1 root@zfs:/# zfs snapshot -r pool/test@snap3 root@zfs:/# zfs send -R -I pool/test@snap1 pool/test@snap3 > /dev/null root@zfs:/# echo $? 0 root@zfs:/# zfs send -R -I pool/test@snap2 pool/test@snap3 > /dev/null root@zfs:/# echo $? 0 Since pool/test/child was created after snap2, zfs send should not expect snap2 to be in pool/test/child when doing a recursive send. It should examine the compare the creation time of the snapshot and each child folder to decide if the folder will be sent. The next incremental send between snap2 and snap3 would properly create the child folder and snap3 which first appears in the child folder. The problem is identical if '-i' is used instead of '-I'. Reviewed by: Alex Aizman alex.aizman@nexenta.com Reviewed by: Alek Pinchuk alek.pinchuk@nexenta.com Reviewed by: Roman Strashkin roman.strashkin@nexenta.com Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Approved by: Garrett D'Amore <garrett@damore.org> Author: Alex Deiter <alex.deiter@nexenta.com>	2016-07-12 11:48:59 +00:00
Andriy Gapon	2a647cdd05	5768 zfsctl_snapshot_inactive() can leak a vnode hold illumos/illumos-gate@20a95fb2c4 `20a95fb2c4` https://www.illumos.org/issues/5768 zfsctl_snapshot_inactive() leaks a hold on the dvp (directory vnode) if v_count > 1. reproduce by: create a fs with 100 snapshots. have a thread do: while true; do ls -l /test/snaps/.zfs/snapshot >/dev/null; done have another thread do: while true; do zfs promote test/clone; zfs promote test/snaps; done use dtrace to delay & observe: dtrace -w -xd \\ -n 'vn_rele:entry/args0 == (void*)0xffffff01dd42ce80ULL/{[stack()]=count(); chill(100000);}' \\ -n 'zfsctl_snapshot_inactive:entry{ if (args[0]->v_count > 1) trace(args[0]- >v_count); self->vp=args[0];}' \\ -n 'gfs_vop_inactive:entry/callers["zfsctl_snapshot_inactive"]/{self->good=1; [stack()]=count()}' \\ -n 'zfsctl_snapshot_inactive:return{if (self->good) self->good=0; else printf ("bad return");}' \\ -n 'gfs_dir_lookup:return/callers["zfsctl_snapshot_inactive"] && self->vp- >v_count > 1/{trace(self->vp->v_count)}' the address is found by selecting one of the output of this at random: dtrace -n 'zfsctl_snapshot_inactive:entry{print(args[0]);' when you see "bad return", we have hit the bug. Then doing "zfs umount test/ snaps" will fail with EBUSY. When we hit this case, we also leak the hold on the target vnode (vn). When the inactive callback is called on a vnode with v_count > 1, it needs to be decremented. Reviewed by: George Wilson <george@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Adam Leventhal <adam.leventhal@delphix.com> Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com> Approved by: Rich Lowe <richlowe@richlowe.net> Author: Matthew Ahrens <mahrens@delphix.com>	2016-07-12 11:46:13 +00:00
Andriy Gapon	ca17e7086e	7054 dmu_tx_hold_t should use refcount_t to track space illumos/illumos-gate@0c779ad424 `0c779ad424` https://www.illumos.org/issues/7054 upstream: ee0003de7d3e598499be7ac3fe6b61efcc47cb7f DLPX-40399 dmu_tx_hold_t should use refcount_t to track space Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Matthew Ahrens <mahrens@delphix.com>	2016-07-12 11:37:19 +00:00
Andriy Gapon	674c8f370b	6940 Cannot unlink directories when over quota illumos/illumos-gate@99189164df `99189164df` https://www.illumos.org/issues/6940 Similar to #6334, but this time with empty directories: $ zfs create tank/quota $ zfs set quota=10M tank/quota $ zfs snapshot tank/quota@snap1 $ zfs set mountpoint=/mnt/tank/quota tank/quota $ mkdir /mnt/tank/quota/dir # create an empty directory $ mkfile 11M /mnt/tank/quota/11M /mnt/tank/quota/11M: initialized 9830400 of 11534336 bytes: Disc quota exceeded $ rmdir /mnt/tank/quota/dir # now unlink the empty directory rmdir: directory "/mnt/tank/quota/dir": Disc quota exceeded From user perspective, I would expect that ZFS is always able to remove files and directories even when the quota is exceeded. Reviewed by: Dan McDonald <danmcd@omniti.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Simon Klinkert <simon.klinkert@gmail.com>	2016-07-12 11:36:22 +00:00
Andriy Gapon	085f86221f	7016 arc_available_memory is not 32-bit safe illumos/illumos-gate@0dd053d7d8 `0dd053d7d8` https://www.illumos.org/issues/7016 upstream DLPX-39446 arc_available_memory is not 32-bit safe https://github.com/delphix/delphix-os/commit/ 6b353ea3b8a1610be22e71e657d051743c64190b related to this upstream: DLPX-38547 delphix engine hang https://github.com/delphix/delphix-os/commit/ 3183a567b3e8c62a74a65885ca60c86f3d693783 DLPX-38547 delphix engine hang (fix static global) https://github.com/delphix/delphix-os/commit/ 22ac551d8ef085ad66cc8f65e51ac372b12993b9 DLPX-38882 system hung waiting on free segment https://github.com/delphix/delphix-os/commit/ cdd6beef7548cd3b12f0fc0328eeb3af540079c2 Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Gordon Ross <gordon.ross@nexenta.com> Author: Prakash Surya <prakash.surya@delphix.com>	2016-07-12 11:35:07 +00:00
Andriy Gapon	06bad7ebfc	7019 zfsdev_ioctl skips secpolicy when FKIOCTL is set 7020 sdev_cleandir can loop forever Note that the bulk of the upstream change is not applicable to FreeBSD and the affected files are not even in the vendor area. illumos/illumos-gate@45b1747515 `45b1747515` https://www.illumos.org/issues/7019 Currently zfsdev_ioctl, when confronted by a request with the FKIOCTL flag set, skips all processing of secpolicy functions. This means that ZFS is not doing any kind of verification of the credentials or access rights of the caller and assuming that (as it is an in-kernel client) all such checks have already been done. This turns out to be quite a dangerous assumption, especially with respect to sdev. In general I don't think it's particularly reasonable to offload this enforcement of access rights onto other kernel subsystems when ZFS has some particular local semantics in this area (delegated datasets etc) and does not provide any kind of API to allow other subsystems to avoid code duplication when doing it. ZFS should apply its normal access policy to requests from within the kernel, and callers should take care to give it the correct credentials and call it from the correct context in order to get the results they need. You can observe the currently unfortunate consequences of this bug in any non- global zone that has access to /dev/zvol or any subset of it via sdev profiles. In particular, a zone used to contain a KVM or similar which has a single zvol passed through to it using a <device match= block in its zone XML. Even though sdev makes something of an attempt to control for whether the caller should have access to nodes in /dev/zvol, it doesn't do this correctly, or really at all in the lookup call path. So, if we have a zone that's been given access to any part of /dev/zvol, it can simply look up the full path to any other zvol on the entire system, and the node will appear and be able to be used. https://www.illumos.org/issues/7020 sdev_cleandir can currently hang forever when it encounters a child node that is busy, or when it is given a matching expr and the first entry on the list does not match. The previous code (circa 2013) iterated over the children of the node using a for loop with SDEV_NEXT_ENTRY, which was then changed to a while ((dv = SDEV_FIRST_ENTRY(ddv)) { loop. Unfortunately the continue statements that previously made it skip over an entry were left as they were, which now result in an infinite busy-loop in the kernel. You can trigger this pretty easily by setting up an sdev exclude rule in zonecfg. Diagnosis: look for a runaway process consuming 100% CPU in kernel -- they have a distinctive stack: # mdb -k > 0t1234::pid2proc \| ::walk thread \| ::findstack -v [ ffffd001efcd3310 _resume_from_idle+0x112() ] ffffd001efcd3360 apix_hilevel_intr_epilog+0xc1(ffffd001efcd33d0, 0) ffffd001efcd33c0 apix_do_interrupt+0x34a(ffffd001efcd33d0, 0) ffffd001efcd33d0 _sys_rtt_ints_disabled+8() ffffd001efcd3550 rw_enter+0x58() ffffd001efcd35e0 sdev_cleandir+0x60(ffffd0631b6d75d8, 0, 0) ffffd001efcd3630 devzvol_prunedir+0xec(ffffd0631b6d76e8) ffffd001efcd36d0 devzvol_readdir+0x150(ffffd06333250e00, ffffd001efcd3790, ffffd062dc990e18, ffffd001efcd37dc, 0, 0) ffffd001efcd3760 fop_readdir+0x6b(ffffd06333250e00, ffffd001efcd3790, ffffd062dc990e18, ffffd001efcd37dc, 0, 0) ffffd001efcd3830 walk_dir+0xee(ffffd06333250e00, ffffd0669e4483c8, fffffffffbbdf410) ffffd001efcd3850 prof_make_names_walk+0x2e(ffffd0669e4483c8, fffffffffbbdf410) ffffd001efcd38b0 prof_make_names+0xfc(ffffd0669e4483c8) Reviewed by: Robert Mustacchi <rm@joyent.com> Reviewed by: Richard Lowe <richlowe@richlowe.net> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Alex Wilson <alex.wilson@joyent.com>	2016-07-12 11:34:05 +00:00
Andriy Gapon	efbd4f6270	6922 Emit ESC_ZFS_VDEV_REMOVE_AUX after removing an aux device illumos/illumos-gate@63364b0ee2 `63364b0ee2` https://www.illumos.org/issues/6922 ZFS does not do a config_sync after removing an aux (spare, log, or cache) device. AFAICT this isn't being done because it is slow and was deemed unnecessary. However, it should be such a rare operation that speed doesn't matter, and not doing it results in two problems: 1) It is theoretically possible to remove an aux device from one pool and attach it to another, then lose power. When power is restored, both pools would think that they own the aux device. 2) Removal of the aux device doesn't send any useful sysevents to userland. Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Alan Somers <asomers@gmail.com>	2016-07-12 11:29:19 +00:00
Andriy Gapon	e241cc40c4	6980 6902 causes zfs send to break due to 32-bit/64-bit struct mismatch illumos/illumos-gate@ea4a67f462 `ea4a67f462` https://www.illumos.org/issues/6980 doing zfs send -i snap1 snap2 >testfile results in internal error: Invalid argument Abort (core dumped) Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com>	2016-07-12 11:27:17 +00:00
Andriy Gapon	16af19f6c3	6878 Add scrub completion info to "zpool history" illumos/illumos-gate@1825bc56e5 `1825bc56e5` https://www.illumos.org/issues/6878 Summary of changes: * Replace generic "scan done" message with "scan aborted, restarting", "scan cancelled", or "scan done" * Log number of errors using spa_get_errlog_size * Refactor scan restarting check into static function Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Nav Ravindranath <nav@delphix.com>	2016-07-12 11:25:55 +00:00
Andriy Gapon	89ee42219a	6513 partially filled holes lose birth time illumos/illumos-gate@8df0bcf0df `8df0bcf0df` https://www.illumos.org/issues/6513 If a ZFS object contains a hole at level one, and then a data block is created at level 0 underneath that l1 block, l0 holes will be created. However, these l0 holes do not have the birth time property set; as a result, incremental sends will not send those holes. Fix is to modify the dbuf_read code to fill in birth time data. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Boris Protopopov <bprotopopov@hotmail.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Paul Dagnelie <pcd@delphix.com>	2016-07-12 11:24:55 +00:00
Andriy Gapon	73f0e3e3e5	6902 speed up listing of snapshots if requesting name only and sorting by name illumos/illumos-gate@0d8fa8f8eb `0d8fa8f8eb` https://www.illumos.org/issues/6902 pjd has authored and commited a patch in Jan 21, 2012 that substanially speeds up zfs snapshot listing if requesting only the name property and sorting by name. In this special case, the snapshot properties do not need to be loaded. This code has been adopted by zfsonlinux on May 29, 2012. Commit message from pjd: Dramatically optimize listing snapshots when user requests only snapshot names and wants to sort them by name, ie. when executes: 1. zfs list -t snapshot -o name -s name Because only name is needed we don't have to read all snapshot properties. Below you can find how long does it take to list 34509 snapshots from a single disk pool before and after this change with cold and warm cache: before: 1. time zfs list -t snapshot -o name -s name > /dev/null cold cache: 525s warm cache: 218s after: 1. time zfs list -t snapshot -o name -s name > /dev/null cold cache: 1.7s warm cache: 1.1s References: http://svnweb.freebsd.org/base?view=revision&revision=230438 https://github.com/freebsd/freebsd/commit/8e3e9863 https://github.com/zfsonlinux/zfs/commit/0cee2406 Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Pawel Dawidek <pjd@freebsd.org> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Garrett D'Amore <garrett@damore.org> Author: Martin Matuska <martin@matuska.org>	2016-07-12 11:21:41 +00:00

... 3 4 5 6 7 ...

669 Commits