freebsd-dev

Author	SHA1	Message	Date
Mark Johnston	375c8b20dc	Remove the DTrace printt and typeref actions. These are FreeBSD-specific and were added in r178576 to provide the ability to pretty-print instances of compound types. However, the print action has long since been augmented to provide this functionality with a simpler interface. Discussed with: gnn Differential Revision: https://reviews.freebsd.org/D8478	2016-11-12 19:26:12 +00:00
Bryan Drewery	28323add09	Fix improper use of "its". Sponsored by: Dell EMC Isilon	2016-11-08 23:59:41 +00:00
Alexander Motin	8acf168aab	Fix ZIL records ordering when ZVOL opened both with and without FSYNC. Before this an earlier writes to a ZVOL opened without FSYNC could get to ZIL after later writes to the same ZVOL opened with FSYNC. Fix this by replicating functionality of ZPL (zv_sync_cnt equivalent to z_sync_cnt), marking all log records sync if anybody opened the ZVOL with FSYNC. MFC after: 2 weeks	2016-11-01 16:03:31 +00:00
Alexander Motin	2d1d8f4c8f	Pass to zvol_log_truncate() same sync values as to zvol_log_write(). Surplus marking of TX_TRUNCATE records as sync could result in putting them into ZIL before previous writes if ones were async. MFC after: 2 weeks	2016-11-01 12:47:19 +00:00
Alexander Motin	74a148f46f	Add sysctls for zfs_immediate_write_sz and zvol_immediate_write_sz.	2016-10-29 23:25:12 +00:00
Andriy Gapon	97371ba2a9	zfsbootcfg: a simple tool to set next boot (one time) options for zfsboot (gpt)zfsboot will read one-time boot directives from a special ZFS pool area. The area was previously described as "Boot Block Header", but currently it is know as Pad2, marked as reserved and is zeroed out on pool creation. The new code interprets data in this area, if any, using the same format as boot.config. The area is immediately wiped out. Failure to parse the directives results in a reboot right after the cleanup. Otherwise the boot sequence proceeds as usual. zfsbootcfg writes zfsboot arguments specified on its command line to the Pad2 area of a disk identified by vfs.zfs.boot.primary_pool and vfs.zfs.boot.primary_vdev kenv variables that are set by loader during boot. Please see the manual page for more. Thanks to all who reviewed, contributed and made suggestions! There are many potential improvements to the feature, please see the review for details. Reviewed by: wblock (docs) Discussed with: jhb, tsoome MFC after: 3 weeks Relnotes: yes Differential Revision: https://reviews.freebsd.org/D7612	2016-10-29 14:09:32 +00:00
Alexander Motin	471cf6ce7d	Add vdev_reopening support to vdev_geom. It allows to avoid extra GEOM providers flapping without significant need. Since GEOM got resize support, we don't need to reopen provider to get new size. If provider was orphaned and no longer valid, ZFS should already know that, and in such case reopen should be done in full as expected. MFC after: 2 weeks	2016-10-28 17:05:14 +00:00
Alexander Motin	f106f43aa2	Matching GUIDs, handle possible race on vdev detach. In case of vdev detach, causing top level mirror vdev destruction, leaf vdev changes its GUID to one of the destroyed mirror, that creates race condition when GUID in vdev label may not match one in the pool config. This change replicates logic nuance of vdev_validate() by adding special exception, matching the vdev GUID against the top level vdev GUID. Since this exception is not completely reliable (may give false positives if we fail to erase label on detached vdev), use it only as last resort. Quick way to reproduce this scenario now is detach vdev from a pool with enabled autoextend. During vdev detach autoextend logic tries to reopen remaining vdev, that always fails now since in-memory configuration is already updated, while on-disk labels are not yet. MFC after: 2 weeks	2016-10-28 16:21:31 +00:00
Alexander Motin	4be4cba048	Improve few debugging log messages.	2016-10-28 15:30:10 +00:00
Andriy Gapon	539fc86f2e	3746 ZRLs are racy illumos/illumos-gate@260af64db7 `260af64db7` https://www.illumos.org/issues/3746 From the original change log: It was possible for a reference to be added even with the lock held, and for references added just after a lock release to be lost. This bug was also independently found and reported in wesunsolve.net issues 6985013 6995524. In zrl_add(), always use an atomic operation to update the refcount. The mutex in the ZRL only guarantees that wakeups occur for waiters on the lock. It offers no protection against concurrent updates of the refcount. The only refcount transition that is safe to perform without an atomic operation is from ZRL_LOCKED back to 0, since this can only be performed by the thread which has the ZRL locked. Authored by: Will Andrews <will@freebsd.org> Reviewed by: Boris Protopopov <bprotopopov@hotmail.com> Reviewed by: Pavel Zakharov <pavel.zakha@gmail.com> Reviewed by: Yuri Pankov <yuri.pankov@gmail.com> Reviewed by: Justin T. Gibbs <gibbs@scsiguy.com> Approved by: Matt Ahrens <mahrens@delphix.com> Author: Youzhong Yang <yyang@mathworks.com> PR: 204037 MFC after: 1 week	2016-10-27 07:38:07 +00:00
Alexander Motin	f0cbbdecbc	Fix panic after ZVOL renamed to name invalid for DEVFS. MFC after: 2 weeks	2016-10-24 12:24:24 +00:00
Alexander Motin	9be66df1e1	Add vfs.zfs.zil_log_limit sysctl. It is at least partially broken now, but that is another question.	2016-10-16 18:49:15 +00:00
Alexander Motin	a059d8ccbc	Optimize ZIL itx memory allocation on FreeBSD. These allocations can reach up to 128KB, while FreeBSD kernel allocator can cache allocations only up to 64KB. To avoid expensive allocations for each large ZIL write use caching zio_buf_alloc() allocator instead. To make it possible de-inline few instances of zil_itx_destroy().	2016-10-16 10:43:12 +00:00
Alexander Motin	1899e205d1	MFV r307314: 6988 spa_sync() spends half its time in dmu_objset_do_userquota_updates Using a benchmark which creates 2 million files in one TXG, I observe that the thread running spa_sync() is on CPU almost the entire time we are syncing, and therefore can be a performance bottleneck. About 50% of the time in spa_sync() is in dmu_objset_do_userquota_updates(). The problem is that dmu_objset_do_userquota_updates() calls zap_increment_int(DMU_USERUSED_OBJECT) once for every file that was modified (or created). In this benchmark, all the files are owned by the same user/group, so all 2 million calls to zap_increment_int() are modifying the same entry in the zap. The same issue exists for the DMU_GROUPUSED_OBJECT. We should keep an in-memory map from user to space delta while we are syncing, and when we finish, iterate over the in-memory map and modify the ZAP once per entry. This reduces the number of calls to zap_increment_int() from "number of objects modified" to "number of owners/groups of modified files". This reduced the time spent in spa_sync() in the file create benchmark by ~33%, from 11 seconds to 7 seconds. Closes #107 Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Steve Gonczi <steve.gonczi@delphix.com> Reviewed by: Ned Bass <bass6@llnl.gov> Reviewed by: Jinshan Xiong <jinshan.xiong@intel.com> Author: Matthew Ahrens <mahrens@delphix.com> openzfs/openzfs@5fc46359c5	2016-10-14 12:03:04 +00:00
Alexander Motin	b3a8b04807	MFV r307313: 5120 zfs should allow large block/gzip/raidz boot pool (loader project) Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com> Reviewed by: Andrew Stormont <andyjstormont@gmail.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Toomas Soome <tsoome@me.com> openzfs/openzfs@c8811bd3e2 FreeBSD still does not support booting from gzip-compressed datasets, so keep one chunk of this commit out.	2016-10-14 12:01:33 +00:00
Konstantin Belousov	5975e53d40	Fix a race in vm_page_busy_sleep(9). Suppose that we have an exclusively busy page, and a thread which can accept shared-busy page. In this case, typical code waiting for the page xbusy state to pass is again: VM_OBJECT_WLOCK(object); ... if (vm_page_xbusied(m)) { vm_page_lock(m); VM_OBJECT_WUNLOCK(object); <---1 vm_page_busy_sleep(p, "vmopax"); goto again; } Suppose that the xbusy state owner locked the object, unbusied the page and unlocked the object after we are at the line [1], but before we executed the load of the busy_lock word in vm_page_busy_sleep(). If it happens that there is still no waiters recorded for the busy state, the xbusy owner did not acquired the page lock, so it proceeded. More, suppose that some other thread happen to share-busy the page after xbusy state was relinquished but before the m->busy_lock is read in vm_page_busy_sleep(). Again, that thread only needs vm_object lock to proceed. Then, vm_page_busy_sleep() reads busy_lock value equal to the VPB_SHARERS_WORD(1). In this case, all tests in vm_page_busy_sleep(9) pass and we are going to sleep, despite the page being share-busied. Update check for m->busy_lock == VPB_UNBUSIED in vm_page_busy_sleep(9) to also accept shared-busy state if we only wait for the xbusy state to pass. Merge sequential if()s with the same 'then' clause in vm_page_busy_sleep(). Note that the current code does not share-busy pages from parallel threads, the only way to have more that one sbusy owner is right now is to recurse. Reported and tested by: pho (previous version) Reviewed by: alc, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D8196	2016-10-13 14:41:05 +00:00
Konstantin Belousov	f71d08566c	Limit scope of the optimization in r306608 to dounmount() caller only. Other uses of cache_purgevfs() do rely on the cache purge for correct operations, when paths are invalidated without unmount. Reported and tested by: jkim Discussed with: mjg Sponsored by: The FreeBSD Foundation	2016-10-07 11:38:28 +00:00
Andriy Gapon	6f98c83306	implement zfs_vptocnp() using z_parent property This should allow vn_fullpath() to work even when vfs name cache is disabled for zfs, which is the case when zfs properties like casesensitivity and normalization are set non-default values. The new code should be 100% reliable for directories and "mostly" reliable for files, that is, when hardlinks across directories are not used. Reported by: Frederic Chardon <chardon.frederic@gmail.com> Reviewed by: kib (vfs contract) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D8146	2016-10-07 06:29:24 +00:00
Andriy Gapon	9ba3abc30e	zfs: fix a wrong assertion for extended attributes For the extended attributes the order between z_teardown_lock and the vnode lock is different. The bug was triggered only with DIAGNOSTIC turned on. This fix is developed in cooperation with avos. PR: 213112 Reported by: avos Tested by: avos MFC after: 1 week	2016-10-04 08:09:25 +00:00
Alexander Motin	863ef2ca62	Add #ifdef _KERNEL around send_holes_without_birth_time sysctl. Reported by: avg@	2016-09-29 17:48:53 +00:00
Alexander Motin	226a11f81e	MFV r306423: 7402 Create tunable to ignore hole_birth feature Until we can resolve the numerous hole_birth bugs that have cropped up recently, and come up with a way going forwards to protect users from corruption, we should disable the hole_birth feature. Using a tunable allows those who are confident that their data is correct to continue to take advantage of the feature. Closes #188 Reviewed by: Matthew Ahrens <mahrens@delphix.com> Author: Paul Dagnelie <pcd@delphix.com>	2016-09-29 00:00:37 +00:00
Alexander Motin	bb97118138	MFV r306422: 7254 ztest failed assertion in ztest_dataset_dirobj_verify: dirobjs + 1 == usedobjs dsl_dataset_space is looking at the ds_bp's fill count while dmu_objset_write_ready() is concurrently modifying it. This fix adds an rrwlock to protect the ds_bp. Closes #180 Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Steve Gonczi <steve.gonczi@delphix.com> Author: Paul Dagnelie <pcd@delphix.com>	2016-09-28 23:54:47 +00:00
Mark Johnston	9e579a58c3	Move implementations of uread() and uwrite() to the illumos compat layer. MFC after: 1 week	2016-09-24 21:40:14 +00:00
Andriy Gapon	d26312a4e4	fix vnode lock assertion for extended attributes directory Background. In ZFS a file with extended attributes has a special directory associated with it where each extended attribute is a file. The attribute's name is a file name and its value is a file content. When the ownership of a file with extended attributes is changed, ZFS also changes ownership of the special directory. This is where the bug was hit. The bug was introduced in r209158. Nota bene. ZFS vnode locks are typically acquired before z_teardown_lock (i.e., before ZFS_ENTER). But this is not the case for the vnodes that represent the extended attribute directory and files. Those are always locked after ZFS_ENTER. This is confusing and fragile. PR: 212702 Reported by: Christian Fuss to FreeNAS Tested by: mav MFC after: 1 week	2016-09-24 08:13:15 +00:00
Allan Jude	c2b475d0ee	MFV r268120: 4936 lz4 could theoretically overflow a pointer with a certain input illumos/illumos-gate@58d0718061 Reviewed by: delphij MFC after: 2 weeks Sponsored by: ScaleEngine Inc. Differential Revision: https://reviews.freebsd.org/D7850	2016-09-11 17:48:06 +00:00
Alexander Motin	4605bf63c4	MFV r305562: 7259 DS_FIELD_LARGE_BLOCKS is unused The DS_FIELD_LARGE_BLOCKS macro has been unused since the integration of this patch: commit ca0cc3918a1789fa839194af2a9245f801a06b1a Author: Matthew Ahrens <mahrens@delphix.com> Date: Fri Jul 24 09:53:55 2015 -0700 5959 clean up per-dataset feature count code Reviewed by: Toomas Soome <tsoome@me.com> Reviewed by: George Wilson <george@delphix.com> Reviewed by: Alex Reece <alex@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> This patch simply removes this macro from dsl_dataset.h. Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Dan McDonald <danmcd@omniti.com> Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com> Author: Matthew Ahrens <mahrens@delphix.com>	2016-09-07 20:09:24 +00:00
Alexander Motin	de1fdddeda	MFV r305560: 7278 tuning zfs_arc_max does not impact arc_c_min When changing zfs_arc_max (e.g. as zdb does), it may be set to less than the default arc_c_min. arc_c_min should decrease to not be more than arc_c_max, but it doesn't; therefore tuning of arc_c_max is ineffective. Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Paul Dagnelie <paul.dagnelie@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com> Author: Matthew Ahrens <mahrens@delphix.com> openzfs/openzfs@608764bead	2016-09-07 20:05:10 +00:00
Andriy Gapon	1a82707cd7	fix zfs pool creation accidentally broken by r305331 The upstream change introduced a new load state, SPA_LOAD_CREATE, and vdev_geom code needs to be aware of it. Tested by: cy MFC after: 1 week X-MFC with: r305331	2016-09-06 06:09:12 +00:00
Alexander Motin	9b9258a12a	Missed FreeBSD-specific piece of r305338.	2016-09-03 11:17:33 +00:00
Alexander Motin	d7e781bda3	MFC r305337: 7004 dmu_tx_hold_zap() does dnode_hold() 7x on same object Using a benchmark which has 32 threads creating 2 million files in the same directory, on a machine with 16 CPU cores, I observed poor performance. I noticed that dmu_tx_hold_zap() was using about 30% of all CPU, and doing dnode_hold() 7 times on the same object (the ZAP object that is being held). dmu_tx_hold_zap() keeps a hold on the dnode_t the entire time it is running, in dmu_tx_hold_t:txh_dnode, so it would be nice to use the dnode_t that we already have in hand, rather than repeatedly calling dnode_hold(). To do this, we need to pass the dnode_t down through all the intermediate calls that dmu_tx_hold_zap() makes, making these routines take the dnode_t* rather than an objset_t* and a uint64_t object number. In particular, the following routines will need to have analogous *_by_dnode() variants created: dmu_buf_hold_noread() dmu_buf_hold() zap_lookup() zap_lookup_norm() zap_count_write() zap_lockdir() zap_count_write() This can improve performance on the benchmark described above by 100%, from 30,000 file creations per second to 60,000. (This improvement is on top of that provided by working around the object allocation issue. Peak performance of ~90,000 creations per second was observed with 8 CPUs; adding CPUs past that decreased performance due to lock contention.) The CPU used by dmu_tx_hold_zap() was reduced by 88%, from 340 CPU-seconds to 40 CPU-seconds. Sponsored by: Intel Corp. Closes #109 Reviewed by: Steve Gonczi <steve.gonczi@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Ned Bass <bass6@llnl.gov> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Author: Matthew Ahrens <mahrens@delphix.com> openzfs/openzfs@d3e523d489	2016-09-03 11:00:29 +00:00
Alexander Motin	4ad4b70e77	MFV r305336: 7247 zfs receive of deduplicated stream fails This resolves two 'zfs recv' issues. First, when receiving into an existing filesystem, a snapshot created during the receive process is not added to the guid->dataset map for the stream, resulting in failed lookups for deduped streams when a WRITE_BYREF record refers to a snapshot received earlier in the stream. Second, the newly created snapshot was also not set properly, referencing the snapshot before the new receiving dataset rather than the existing filesystem. Closes #159 Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Author: Chris Williamson <chris.williamson@delphix.com> openzfs/openzfs@b09697c8c1	2016-09-03 10:59:05 +00:00
Alexander Motin	070da3f779	MFV r305335: 7003 zap_lockdir() should tag hold zap_lockdir() / zap_unlockdir() should take a "void *tag" argument which tags the hold on the zap. This will help diagnose programming errors which misuse the hold on the ZAP. Sponsored by: Intel Corp. Closes #108 Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Steve Gonczi <steve.gonczi@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Author: Matthew Ahrens <mahrens@delphix.com> openzfs/openzfs@0780b3eab5	2016-09-03 10:58:14 +00:00
Alexander Motin	d3ec2cdb4a	MFV r304157: 7230 add assertions to dmu_send_impl() to verify that stream includes BEGIN and END records illumos/illumos-gate@12b90ee2d3 https://github.com/illumos/illumos-gate/commit/12b90ee2d3b10689fc45f4930d2392f5f e1d9cfa https://www.illumos.org/issues/7230 A test failure occurred where a send stream had only a BEGIN record. This should not be possible if the send returns without error. Prevented this from happening in the future by adding an assertion to dmu_send_impl() to verify that if the function returns 0 (success) both a BEGIN and END record are present. Did this by adding flags to dmu_sendarg_t (indicating whether BEGIN o r END records sent), having dump_record() set flags appropriately, adding VERIFY statement to dmu_send_impl(). Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Matt Krantz <matt.krantz@delphix.com>	2016-09-03 10:10:58 +00:00
Alexander Motin	7aafc9d4c8	MFV r304156: 7235 remove unused func dsl_dataset_set_blkptr illumos/illumos-gate@bd56f80007 https://github.com/illumos/illumos-gate/commit/bd56f80007857b960e0981ed0797ad8ec 844a96b https://www.illumos.org/issues/7235 The function dsl_dataset_set_blkptr() is unused. We should remove it. Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Alex Reece <alex@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com>	2016-09-03 10:09:23 +00:00
Alexander Motin	c9fa25c110	MFV r304155: 7090 zfs should improve allocation order and throttle allocations illumos/illumos-gate@0f7643c737 https://github.com/illumos/illumos-gate/commit/0f7643c7376dd69a08acbfc9d1d7d548b 10c846a https://www.illumos.org/issues/7090 When write I/Os are issued, they are issued in block order but the ZIO pipelin e will drive them asynchronously through the allocation stage which can result i n blocks being allocated out-of-order. It would be nice to preserve as much of the logical order as possible. In addition, the allocations are equally scattered across all top-level VDEVs but not all top-level VDEVs are created equally. The pipeline should be able t o detect devices that are more capable of handling allocations and should allocate more blocks to those devices. This allows for dynamic allocation distribution when devices are imbalanced as fuller devices will tend to be slower than empty devices. The change includes a new pool-wide allocation queue which would throttle and order allocations in the ZIO pipeline. The queue would be ordered by issued time and offset and would provide an initial amount of allocation of work to each top-level vdev. The allocation logic utilizes a reservation system to reserve allocations that will be performed by the allocator. Once an allocatio n is successfully completed it's scheduled on a given top-level vdev. Each top- level vdev maintains a maximum number of allocations that it can handle (mg_alloc_queue_depth). The pool-wide reserved allocations (top-levels * mg_alloc_queue_depth) are distributed across the top-level vdevs metaslab groups and round robin across all eligible metaslab groups to distribute the work. As top-levels complete their work, they receive additional work from the pool-wide allocation queue until the allocation queue is emptied. Reviewed by: Adam Leventhal <ahl@delphix.com> Reviewed by: Alex Reece <alex@delphix.com> Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Paul Dagnelie <paul.dagnelie@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Sebastien Roy <sebastien.roy@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: George Wilson <george.wilson@delphix.com>	2016-09-03 10:04:37 +00:00
Alexander Motin	0b51a59fc7	MFV r303078: 7086 ztest attempts dva_get_dsize_sync on an embedded blockpointer illumos/illumos-gate@926549256b https://github.com/illumos/illumos-gate/commit/926549256b71acd595f69b236779ff6b7 8fa08ef https://www.illumos.org/issues/7086 In dbuf_dirty(), we need to grab the dn_struct_rwlock before looking at the db_blkptr, to prevent it from being changed by syncing context. Otherwise we may see that ztest got a segfault from this stack: libzpool.so.1`dva_get_dsize_sync+0x98(872f000, `b32b240`, fed7811b, 0, b4cda20, 0) libzpool.so.1`bp_get_dsize+0x60(872f000, `b32b240`, 0, 97cb780, 9d4c1a8, 0) libzpool.so.1`dbuf_dirty+0x9b3(ce0a100, 97cb780, 9, fecd2530) libzpool.so.1`dmu_buf_will_dirty+0xc3(ce0a100, 97cb780, ea293d6c, 1) libzpool.so.1`zap_lockdir+0x1a0(8aaa3c0, 1, 0, 97cb780, 1, 1) libzpool.so.1`zap_remove_norm+0x30(8aaa3c0, 1, 0, 8728b10, 0, 97cb780) libzpool.so.1`zap_remove+0x29(8aaa3c0, 1, 0, 8728b10, 97cb780, a) ztest_replay_remove+0x225(ea294588, 8728ae8, 0, 38010000, 0, 0) ztest_remove+0x9f(ea294588, ea293f50, 4, 3) ztest_object_init+0x78(ea294588, ea293f50, 4e0, 1) ztest_dmu_object_alloc_free+0x71(ea294588, 13) ztest_dmu_objset_create_destroy+0x224(80cef08, 13, 0, 805d36c, 9017ad44, 0) ztest_execute+0x89(a, 807c720, 13, 0) ztest_thread+0xea(13, 0, 0, 0) libc.so.1`_thrp_setup+0x88(f0983240) libc.so.1`_lwp_start(f0983240, 0, 0, 0, 0, 0) Looking into it a bit, we see that this is an embedded blockpointer, so BP_GET_NDVAS should have returned 0: b32b240::blkptr EMBEDDED [L0 ZAP_OTHER] et=0 LZ4 size=200L/4aP birth=80L Instead, it looks like another thread is modifying this blockpointer: b32b240::ugrep \| ::whatis f47a0e0c is in [ stack tid=0x19f ] ebd6ec40 is in [ stack tid=0x226 ] ea293bd0 is in [ stack tid=0x244 ] ea293be4 is in [ stack tid=0x244 ] Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com>	2016-09-03 08:43:43 +00:00
Alexander Motin	84c3781ac9	MFV r303077: 7072 zfs fails to expand if lun added when os is in shutdown state illumos/illumos-gate@c39a2aae1e `c39a2aae1e` https://www.illumos.org/issues/7072 upstream: 38733 zfs fails to expand if lun added when os is in shutdown state DLPX-36910 spares and caches should not display expandable space DLPX-39262 vdev_disk_open spam zfs_dbgmsg buffer Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Alex Reece <alex@delphix.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: George Wilson <george.wilson@delphix.com>	2016-09-03 08:42:12 +00:00
Alexander Motin	efa0867fb0	MFV r302991: 6950 ARC should cache compressed data illumos/illumos-gate@dcbf3bd6a1 `dcbf3bd6a1` https://www.illumos.org/issues/6950 When reading compressed data from disk, the ARC should keep the compressed block cached and only decompress it when consumers access the block. The uncompressed data should be short-lived allowing the ARC to cache a much larger amount of data. The DMU would also maintain a smaller cache of uncompressed blocks to minimize the impact of decompressing frequently accessed blocks. Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Don Brady <don.brady@intel.com> Reviewed by: Richard Elling <Richard.Elling@RichardElling.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: George Wilson <george.wilson@delphix.com>	2016-09-03 08:30:51 +00:00
Alexander Motin	c543b519be	MFV r304158: 7136 ESC_VDEV_REMOVE_AUX ought to always include vdev information 7115 6922 generates ESC_ZFS_VDEV_REMOVE_AUX a bit too often illumos/illumos-gate@b72b6bb10a https://github.com/illumos/illumos-gate/commit/b72b6bb10ad55121a1b352c6f68ebdc8e 20c9086 https://www.illumos.org/issues/7136 6922 added ESC_ZFS_VDEV_REMOVE_AUX and ESC_ZFS_VDEV_REMOVE_DEV sysevents whenever an aux device gets removed from a pool. However, those sysevents will be created without the vdev_guid and vdev_path fields. It would be better to always populate those fields. https://www.illumos.org/issues/7115 The addition of spa_event_notify in vdev removal code (see #6922) causes event s to be generated even if the spare failed to be removed with EBUSY. Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net> Approved by: Robert Mustacchi <rm@joyent.com> Author: Alan Somers <asomers@gmail.com>	2016-09-01 18:37:11 +00:00
Alexander Motin	25584d12e7	MFV r302993: 7104 increase indirect block size illumos/illumos-gate@4b5c8e93ca https://github.com/illumos/illumos-gate/commit/4b5c8e93cab28d3c65ba9d407fd8f46e3 be1db1c https://www.illumos.org/issues/7104 The current default indirect block size is 16KB. We can improve performance by increasing it to 128KB. This is especially helpful for any workload that needs to read most of the metadata, e.g. scrub/resilver, file deletion, filesystem deletion, and zfs send. We also need to fix a few space estimation errors to make the tests pass. Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Dan McDonald <danmcd@omniti.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com>	2016-09-01 18:33:39 +00:00
Alexander Motin	dd7f7cb7ac	MFV r302992: 7071 lzc_snapshot does not fill in errlist on ENOENT illumos/illumos-gate@25f7d993ad https://github.com/illumos/illumos-gate/commit/25f7d993adbfb3452ac4625b379167074 6d35ae3 https://www.illumos.org/issues/7071 upstream DLPX-40482 lzc_snapshot does not fill in errlist on ENOENT Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com>	2016-09-01 18:25:49 +00:00
Alexander Motin	3d1e0e0830	MFV r302662: 6447 handful of nvpair cleanups illumos/illumos-gate@759e89be35 https://github.com/illumos/illumos-gate/commit/759e89be359f2af635e4122d147df56bc e948773 https://www.illumos.org/issues/6447 I got a patch from someone who uses nvpair code outside of illumos. It fixes a couple of gcc warnings/bugs for him. 1. silence uninitialized use warnings 2. add parentheses around assignment used as truth value 3. fix printf format specifier (ll is for integers only) 4. strstr, strspn, strcspn, and strcmp are declared in string.h, not strings.h. 5. avoid scanning integer into boolean variable Reviewed by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net> Reviewed by: Andy Stormont <astormont@racktopsystems.com> Reviewed by: Garrett D'Amore <garrett@damore.org> Approved by: Robert Mustacchi <rm@joyent.com> Author: Steve Dougherty <sdougherty@barracuda.com>	2016-09-01 15:17:39 +00:00
Alexander Motin	3421688c2d	MFV r302661: 7082 bptree_iterate() passes wrong args to zfs_dbgmsg() illumos/illumos-gate@10e67aa0db https://github.com/illumos/illumos-gate/commit/10e67aa0db0823d5464aafdd681f3c966 155c68e https://www.illumos.org/issues/7082 upstream DLPX-40542 bptree_iterate() passes wrong args to zfs_dbgmsg() Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Matthew Ahrens <mahrens@delphix.com>	2016-09-01 15:10:40 +00:00
Alexander Motin	41b9077ef6	MFV r302660: 6314 buffer overflow in dsl_dataset_name illumos/illumos-gate@9adfa60d48 https://github.com/illumos/illumos-gate/commit/9adfa60d484ce2435f5af77cc99dcd4e6 92b6660 https://www.illumos.org/issues/6314 Callers of dsl_dataset_name pass a buffer of size ZFS_MAXNAMELEN, but dsl_dataset_name copies the datasets' name PLUS the snapshot name to it, resulting in a max of 2 * ZFS_MAXNAMELEN + '@'. Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Matthew Ahrens <mahrens@delphix.com>	2016-09-01 15:08:27 +00:00
Alexander Motin	e12a269749	MFV r302659: 6931 lib/libzfs: cleanup gcc warnings illumos/illumos-gate@88f61dee20 `88f61dee20` https://www.illumos.org/issues/6931 need cleanup: CERRWARN += -_gcc=-Wno-switch CERRWARN += -_gcc=-Wno-parentheses CERRWARN += -_gcc=-Wno-unused-function Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Igor Kozhukhov <ikozhukhov@gmail.com>	2016-09-01 14:57:06 +00:00
Alexander Motin	a95a9fe945	MFV r302651: 7054 dmu_tx_hold_t should use refcount_t to track space illumos/illumos-gate@0c779ad424 https://github.com/illumos/illumos-gate/commit/0c779ad424a92a84d1e07d47cab7f8009 189202b https://www.illumos.org/issues/7054 upstream: ee0003de7d3e598499be7ac3fe6b61efcc47cb7f DLPX-40399 dmu_tx_hold_t should use refcount_t to track space Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Matthew Ahrens <mahrens@delphix.com>	2016-09-01 14:38:25 +00:00
Alexander Motin	96bf48b8cb	MFV r302648: 7019 zfsdev_ioctl skips secpolicy when FKIOCTL is set Note that the bulk of the upstream change is not applicable to FreeBSD and the affected files are not even in the vendor area. illumos/illumos-gate@45b1747515 `45b1747515` https://www.illumos.org/issues/7019 Currently zfsdev_ioctl, when confronted by a request with the FKIOCTL flag set, skips all processing of secpolicy functions. This means that ZFS is not doing any kind of verification of the credentials or access rights of the caller and assuming that (as it is an in-kernel client) all such checks have already been done. This turns out to be quite a dangerous assumption, especially with respect to sdev. In general I don't think it's particularly reasonable to offload this enforcement of access rights onto other kernel subsystems when ZFS has some particular local semantics in this area (delegated datasets etc) and does not provide any kind of API to allow other subsystems to avoid code duplication when doing it. ZFS should apply its normal access policy to requests from within the kernel, and callers should take care to give it the correct credentials and call it from the correct context in order to get the results they need. You can observe the currently unfortunate consequences of this bug in any non- global zone that has access to /dev/zvol or any subset of it via sdev profiles. In particular, a zone used to contain a KVM or similar which has a single zvol passed through to it using a <device match= block in its zone XML. Even though sdev makes something of an attempt to control for whether the caller should have access to nodes in /dev/zvol, it doesn't do this correctly, or really at all in the lookup call path. So, if we have a zone that's been given access to any part of /dev/zvol, it can simply look up the full path to any other zvol on the entire system, and the node will appear and be able to be used. Reviewed by: Robert Mustacchi <rm@joyent.com> Reviewed by: Richard Lowe <richlowe@richlowe.net> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Alex Wilson <alex.wilson@joyent.com>	2016-09-01 14:24:54 +00:00
Alexander Motin	13876b47d7	MFV r302647: 6922 Emit ESC_ZFS_VDEV_REMOVE_AUX after removing an aux device illumos/illumos-gate@63364b0ee2 https://github.com/illumos/illumos-gate/commit/63364b0ee2604783e7a55f84258888677 68eafa4 https://www.illumos.org/issues/6922 ZFS does not do a config_sync after removing an aux (spare, log, or cache) device. AFAICT this isn't being done because it is slow and was deemed unnecessary. However, it should be such a rare operation that speed doesn't matter, and not doing it results in two problems: 1) It is theoretically possible to remove an aux device from one pool and attach it to another, then lose power. When power is restored, both pools woul d think that they own the aux device. 2) Removal of the aux device doesn't send any useful sysevents to userland. Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Alan Somers <asomers@gmail.com>	2016-09-01 14:17:30 +00:00
Alexander Motin	1c7d88abed	MFV r302646: 6980 6902 causes zfs send to break due to 32-bit/64-bit struct mismatch illumos/illumos-gate@ea4a67f462 https://github.com/illumos/illumos-gate/commit/ea4a67f462de0a39a9adea8197bcdef84 9de5371 https://www.illumos.org/issues/6980 doing zfs send -i snap1 snap2 >testfile results in internal error: Invalid argument Abort (core dumped) Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com>	2016-09-01 14:06:30 +00:00
Alexander Motin	4536fd9bed	MFV r302643: 6902 speed up listing of snapshots if requesting name only and sorting by name This was our change from the beginning, so just reduce the upstream diff.	2016-09-01 13:29:53 +00:00
Alexander Motin	5fd28943d6	MFV r302642: 6876 Stack corruption after importing a pool with a too-long name illumos/illumos-gate@c971037baa `c971037baa` https://www.illumos.org/issues/6876 Calling dsl_dataset_name on a dataset with a 256 byte buffer is asking for trouble. We should check every dataset on import, using a 1024 byte buffer and checking each time to see if the dataset's new name is longer than 256 bytes. Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Paul Dagnelie <pcd@delphix.com>	2016-09-01 13:04:36 +00:00
Alexander Motin	9007a8679a	Fix kernel panic when inheriting properties without default. There are two writable hidden properties "iscsioptions" and "stmf_sbd_lu", that have no default string value. Attempt to unset them or replicate caused kernel panic. This simple bandaid seems fixes the problem nicely. MFC after: 2 weeks	2016-08-31 11:55:31 +00:00
Mark Johnston	59ceeddecf	MFV r301526: 7035 string-related subroutines should validate input earlier Reviewed by: Alex Wilson <alex.wilson@joyent.com> Reviewed by: Bryan Cantrill <bryan@joyent.com> Approved by: Matthew Ahrens <mahrens@delphix.com> Author: Patrick Mooney <pmooney@pfmooney.com> illumos/illumos-gate@771e39c3b1 MFC after: 2 weeks	2016-08-16 02:25:19 +00:00
Mark Johnston	f66200ee22	MFV r301525: 7033 ustack helper should fault on bad return values Reviewed by: Patrick Mooney <patrick.mooney@joyent.com> Reviewed by: Bryan Cantrill <bryan@joyent.com> Approved by: Matthew Ahrens <mahrens@delphix.com> Author: Alex Wilson <alex.wilson@joyent.com> illumos/illumos-gate@a2f72b65eb MFC after: 2 weeks	2016-08-16 02:20:02 +00:00
Mark Johnston	4aea8f31b1	MFV r301524: 7034 negative record sizes should be rejected Reviewed by: Patrick Mooney <patrick.mooney@joyent.com> Reviewed by: Bryan Cantrill <bryan@joyent.com> Approved by: Matthew Ahrens <mahrens@delphix.com> Author: Alex Wilson <alex.wilson@joyent.com> illumos/illumos-gate@0b8049bfb0 MFC after: 2 weeks	2016-08-16 02:18:34 +00:00
Mark Johnston	b7125fa9cd	MFV r296989: 6734 dtrace_canstore_statvar() fails for some valid static variables Reviewed by: Dan McDonald <danmcd@omniti.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Bryan Cantrill <bryan@joyent.com> illumos/illumos-gate@d65f2bb4e5 MFC after: 2 weeks	2016-08-16 02:16:54 +00:00
Andriy Gapon	96762fe314	fix a zfs cross-device rename crash introduced in r303763 The problem was that 'zfsvfs' variable was not initialized if the error was detected, but in the exit path the variable was dereferenced before the error code was checked. Reported by: np MFC after: 3 days X-MFC with: r303763	2016-08-09 06:11:24 +00:00
Andriy Gapon	4fb51b52ef	fix .zfs-related cases in zfs_lookup that were broken by r303763 The problem is that the special .zfs nodes are not represented by znodes but by special gfs-based nodes. r303763 changed interface of zfs_dirlook such that started operating on znodes rather than on vnodes and, thus, the function became unsuitable for handling .zfs entities. The solution is to move the handling of the special cases to zfs_lookup, the only consumer of zfs_dirlook. I already had this solution applied in D7421, but for different reasons. Reported by: asomers MFC after: 3 days X-MFC with: r303763	2016-08-06 11:02:07 +00:00
Andriy Gapon	f79bc17233	zfs: honour and make use of vfs vnode locking protocol ZFS POSIX Layer is originally written for Solaris VFS which is very different from FreeBSD VFS. Most importantly many things that FreeBSD VFS manages on behalf of all filesystems are implemented in ZPL in a different way. Thus, ZPL contains code that is redundant on FreeBSD or duplicates VFS functionality or, in the worst cases, badly interacts / interferes with VFS. The most prominent problem is a deadlock caused by the lock order reversal of vnode locks that may happen with concurrent zfs_rename() and lookup(). The deadlock is a result of zfs_rename() not observing the vnode locking contract expected by VFS. This commit removes all ZPL internal locking that protects parent-child relationships of filesystem nodes. These relationships are protected by vnode locks and the code is changed to take advantage of that fact and to properly interact with VFS. Removal of the internal locking allowed all ZPL dmu_tx_assign calls to use TXG_WAIT mode. Another victim, disputable perhaps, is ZFS support for filesystems with mixed case sensitivity. That support is not provided by the OS anyway, so in ZFS it was a buch of dead code. To do: - replace ZFS_ENTER mechanism with VFS managed / visible mechanism - replace zfs_zget with zfs_vget[f] as much as possible - get rid of not really useful now zfs_freebsd_* adapters - more cleanups of unneeded / unused code - fix / replace .zfs support PR: 209158 Reported by: many Tested by: many (thank you all!) MFC after: 5 days Sponsored by: HybridCluster / ClusterHQ Differential Revision: https://reviews.freebsd.org/D6533	2016-08-05 06:23:06 +00:00
Ruslan Bukin	98f50c44e3	Update RISC-V port to Privileged Architecture Version 1.9. Sponsored by: DARPA, AFRL Sponsored by: HEIF5	2016-08-02 14:50:14 +00:00
Enji Cooper	4fcae4df7e	Conditionalize code which defines sysctls per _KERNEL #ifdef guard This resolves several issues when compiling libzpool (userspace library), i.e. -Wimplicit-function-declaration and -Wmissing-declarations issues. MFC after: 2 weeks Reported by: clang Tested with: clang 3.8.1, gcc 4.2.1, gcc 5.3.0 Sponsored by: EMC / Isilon Storage Division	2016-07-31 06:34:49 +00:00
Mark Johnston	57185c52de	Restore an ifdef that should not have been removed in r303535. X-MFC-With: r303535	2016-07-30 07:05:32 +00:00
Mark Johnston	6d1ffb50fc	Include fasttrap handling for DATAMODEL_ILP32 when compiling for amd64. MFC after: 1 month	2016-07-30 03:11:53 +00:00
Andriy Gapon	70e3da3892	MFV r302645: 6878 Add scrub completion info to "zpool history" illumos/illumos-gate@1825bc56e5 `1825bc56e5` https://www.illumos.org/issues/6878 Summary of changes: * Replace generic "scan done" message with "scan aborted, restarting", "scan cancelled", or "scan done" * Log number of errors using spa_get_errlog_size * Refactor scan restarting check into static function Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Nav Ravindranath <nav@delphix.com> MFC after: 2 weeks	2016-07-14 11:53:39 +00:00
Andriy Gapon	39a6b17491	MFV r302650: 6940 Cannot unlink directories when over quota illumos/illumos-gate@99189164df `99189164df` https://www.illumos.org/issues/6940 Similar to #6334, but this time with empty directories: $ zfs create tank/quota $ zfs set quota=10M tank/quota $ zfs snapshot tank/quota@snap1 $ zfs set mountpoint=/mnt/tank/quota tank/quota $ mkdir /mnt/tank/quota/dir # create an empty directory $ mkfile 11M /mnt/tank/quota/11M /mnt/tank/quota/11M: initialized 9830400 of 11534336 bytes: Disc quota exceeded $ rmdir /mnt/tank/quota/dir # now unlink the empty directory rmdir: directory "/mnt/tank/quota/dir": Disc quota exceeded From user perspective, I would expect that ZFS is always able to remove files and directories even when the quota is exceeded. Reviewed by: Dan McDonald <danmcd@omniti.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Simon Klinkert <simon.klinkert@gmail.com> MFC after: 2 weeks	2016-07-14 11:51:01 +00:00
Andriy Gapon	fe0cc75230	MFV r302644: 6513 partially filled holes lose birth time illumos/illumos-gate@8df0bcf0df `8df0bcf0df` https://www.illumos.org/issues/6513 If a ZFS object contains a hole at level one, and then a data block is created at level 0 underneath that l1 block, l0 holes will be created. However, these l0 holes do not have the birth time property set; as a result, incremental sends will not send those holes. Fix is to modify the dbuf_read code to fill in birth time data. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Boris Protopopov <bprotopopov@hotmail.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Paul Dagnelie <pcd@delphix.com> MFC after: 3 weeks	2016-07-14 11:48:42 +00:00
Andriy Gapon	e7ed92bbbc	MFV r302641: 6844 dnode_next_offset can detect fictional holes illumos/illumos-gate@11ceac77ea `11ceac77ea` https://www.illumos.org/issues/6844 dnode_next_offset is used in a variety of places to iterate over the holes or allocated blocks in a dnode. It operates under the premise that it can iterate over the blockpointers of a dnode in open context while holding only the dn_struct_rwlock as reader. Unfortunately, this premise does not hold. When we create the zio for a dbuf, we pass in the actual block pointer in the indirect block above that dbuf. When we later zero the bp in zio_write_compress, we are directly modifying the bp. The state of the bp is now inconsistent from the perspective of dnode_next_offset: the bp will appear to be a hole until zio_dva_allocate finally finishes filling it in. In the meantime, dnode_next_offset can detect a hole in the dnode when none exists. I was able to experimentally demonstrate this behavior with the following setup: 1. Create a file with 1 million dbufs. 2. Create a thread that randomly dirties L2 blocks by writing to the first L0 block under them. 3. Observe dnode_next_offset, waiting for it to skip over a hole in the middle of a file. 4. Do dnode_next_offset in a loop until we skip over such a non-existent hole. The fix is to ensure that it is valid to iterate over the indirect blocks in a dnode while holding the dn_struct_rwlock by passing the zio a copy of the BP and updating the actual BP in dbuf_write_ready while holding the lock. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Boris Protopopov <bprotopopov@hotmail.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Alex Reece <alex@delphix.com> MFC after: 3 weeks	2016-07-14 11:42:53 +00:00
Andriy Gapon	875e6e5b04	MFV r302640: 6874 rollback and receive need to reset ZPL state to what's on disk illumos/illumos-gate@1fdcbd00c9 `1fdcbd00c9` https://www.illumos.org/issues/6874 When we do a clone swap (caused by "zfs rollback" or "zfs receive"), the ZPL doesn't completely reload the state from the DMU; some values remain cached in the zfsvfs_t. steps to reproduce: ``` #!/bin/bash -x zfs destroy -R test/fs zfs destroy -R test/recvd zfs create test/fs zfs snapshot test/fs@a zfs set userquota@$USER=1m test/fs zfs snapshot test/fs@b zfs send test/fs@a \| zfs recv test/recvd zfs send -i @a test/fs@b \| zfs recv test/recvd zfs userspace test/recvd 1. should show 1m quota dd if=/dev/urandom of=/test/recvd/file bs=1k count=1024 sync dd if=/dev/urandom of=/test/recvd/file2 bs=1k count=1024 2. should fail with ENOSPC sync zfs unmount test/recvd zfs mount test/recvd zfs userspace test/recvd 3. if bug above, now shows 1m quota dd if=/dev/urandom of=/test/recvd/file3 bs=1k count=1024 4. if bug above, now fails with ENOSPC ``` Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Approved by: Garrett D'Amore <garrett@damore.org> Author: Matthew Ahrens <mahrens@delphix.com> MFC after: 3 weeks	2016-07-14 11:39:36 +00:00
Andriy Gapon	ac3623e090	re-apply r299908: zfsctl_snapdir_lookup: clear VV_ROOT of snapshot's root The change has been undone in r301275 on the assumption that it was no longer required. But that was incorrect, because in this case (and only in this case) the snapshot root vnode is looked up before z_parent is fixed up. MFC after: 5 days	2016-07-13 15:16:51 +00:00
Mark Johnston	ca1ef36cf4	Avoid truncating the return value of DTrace predicates. Predicates are DIF objects whose return value is compared with zero to determine whether the corresponding probe body is to be executed. The return value itself is the contents of a 64-bit DIF register, but it was being truncated to an int before the comparison. This meant that a predicate such as /0x100000000/ would evaluate to false. Reported by: rwatson MFC after: 3 days	2016-07-09 22:41:21 +00:00
Steven Hartland	ae8420ed72	Fix ZFS ARC min / max tunable Due to ARC initial configuration not being done and kmem information not being available we need to blindly set zfs_arc_max and zfs_arc_min when configured via the tunable. This fixes vfs.zfs.arc_(min\|max) configuration via loader.conf broken by r302265. Approved by: re(gjb) MFC after: 1 week	2016-07-06 23:49:19 +00:00
Alexander Motin	e36599916f	Revert r299454 and r299448. Those changes were found confusing FreeBSD libc ACL code, that doesn't differentiate ACL for directories and files, and report ACLs for all directories created after those patches as non-trivial. On the other side these changes were considered wrong from POSIX and NFSv4 points of view. Until further investigation done upstream, revert those changes locally in preparation for FreeBSD 11.0 release. Approved by: re (hrs)	2016-06-30 14:55:49 +00:00
Steven Hartland	f535b2d7d8	Allow ZFS ARC min / max to be tuned at runtime Prior to this change ZFS ARC min / max could only be changed using boot time tunables, this allows the values to be tuned at runtime using the sysctls: * vfs.zfs.arc_max * vfs.zfs.arc_min When adjusting ZFS ARC minimum the memory used will only reduce to the new minimum given memory pressure. Reviewed by: allanjude Approved by: re (gjb) MFC after: 2 weeks Relnotes: yes Sponsored by: Multiplay Differential Revision: https://reviews.freebsd.org/D5907	2016-06-29 07:55:45 +00:00
Andriy Gapon	0f7dcde977	fix deadlock-prone code in getzfsvfs() getzfsvfs() called vfs_busy() in the waiting mode while having a hold on a pool (via a call to dmu_objset_hold). In other words, dp_config_rwlock was held in the shared mode while a thread could be sleeping in vfs_busy(). The pool's txg sync thread needs to take dp_config_rwlock in the exclusive mode for some actions, e.g., for executing sync tasks. If the sync thread gets blocked, then any thread waiting for its sync task to get executed is also blocked. Which, in turn, could mean that vfs_busy() will keep waiting indefinitely. The solution is to use vfs_ref() in the locked section and to call vfs_busy() only after dropping other locks. Note that a reference on a struct mount object does not prevent an associated zfsvfs_t object from being destroyed. So, we have to be careful to operate only on the struct mount object until we successfully vfs_busy it. Approved by: re (gjb) MFC after: 2 weeks	2016-06-23 07:01:54 +00:00
Alan Somers	54edbcfb69	Fix uninitialized variable from r300881 sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c Initialize needs_update in vdev_geom_set_physpath PR: 210409 Reported by: kp Reviewed by: kp Approved by: re (hrs) MFC after: 4 weeks X-MFC-With: 300881 Sponsored by: Spectra Logic Corp	2016-06-21 15:27:16 +00:00
Konstantin Belousov	cacbedfc46	Fix gcc build. Reported andt tested by: swills Sponsored by: The FreeBSD Foundation Approved by: re (gjb)	2016-06-18 20:20:00 +00:00
Konstantin Belousov	4a0d95f810	Use vnlru_free(9) to implement dnlc_reduce_cache(). This apparently puts ARC back under the limits after the vnode pressure rework in r291244, in particular due to the kmem exhaustion. Based on patch by: mckusick Reviewed by: avg, mckusick Tested by: allanjude, madpilot Sponsored by: The FreeBSD Foundation Approved by: re (gjb)	2016-06-17 17:34:28 +00:00
Andriy Gapon	55be2f79e5	l2arc: reset b_tmp_cdata to NULL in the case of unset b_daddr The change is in arc_buf_l2_cdata_free(). Without this we can trip the assertion in arc_hdr_realloc() if INVARIANTS option is enabled. Approved by: re (kib) MFC after: 1 week	2016-06-13 18:39:13 +00:00
Andriy Gapon	aa14503e8e	zfs_vptocnp: check for an invalid znode ... which can arise after the receive or rollback and failed zfs_rezget(). Approved by: re (kib) MFC after: 1 week	2016-06-13 10:53:34 +00:00
Andriy Gapon	a68789426a	zfs: set VROOT / VV_ROOT consistently and in a single place This is a followup to r300131. A filesystem's root vnode can be reached not only through VSF_ROOT, but by other means as well. For example, via a dot-dot lookup. Also, a root vnode can get reclaimed and then re-created. For these reasons it was insufficient to clear VV_ROOT flag from a root vnode of a snapshot mounted under .zfs in zfsctl_snapdir_lookup(). So, now we set the flag in zfs_znode_sa_init() only if a vnode represent a root of a filesystem or a standalone snapshot. That is, the flag is not set for snapshots mounted under .zfs. MFC after: 2 weeks	2016-06-03 14:37:18 +00:00
Andriy Gapon	d1cf30f4f1	zfs_root: fix a potential root vnode reference leak It could happen in an unlikely case that we fail to lock the root vnode with requested flags (which appear to never include LK_NOWAIT). MFC after: 1 week	2016-06-03 14:22:12 +00:00
Alan Somers	b1a6b8dcd2	Improve the English in a comment sys/cddl/contrib/opensolaris/uts/common/sys/acl.h: Improve the english in a comment. No functional changes Submitted by: gibbs MFC after: 4 weeks Sponsored by: Spectra Logic Corp	2016-06-01 22:21:42 +00:00
Allan Jude	0144ad3e78	Connect the SHA-512t256 and Skein hashing algorithms to ZFS Support for the new hashing algorithms in ZFS was introduced in r289422 However it was disconnected because FreeBSD lacked implementations of SHA-512 (truncated to 256 bits), and Skein. These implementations were introduced in r300921 and r300966 respectively This commit connects them to ZFS and enabled these new checksum algorithms This new algorithms are not supported by the boot blocks, so do not use them on your root dataset if you boot from ZFS. Relnotes: yes Sponsored by: ScaleEngine Inc.	2016-05-31 04:12:14 +00:00
Bryan Drewery	fdd9048a63	Avoid more literal-suffix errors with C++11	2016-05-29 00:40:29 +00:00
Alan Somers	7a0c41d5d7	zfsd(8), the ZFS fault management daemon Add zfsd, which deals with hard drive faults in ZFS pools. It manages hotspares and replements in drive slots that publish physical paths. cddl/usr.sbin/zfsd Add zfsd(8) and its unit tests cddl/usr.sbin/Makefile Add zfsd to the build lib/libdevdctl A C++ library that helps devd clients process events lib/Makefile share/mk/bsd.libnames.mk share/mk/src.libnames.mk Add libdevdctl to the build. It's a private library, unusable by out-of-tree software. etc/defaults/rc.conf By default, set zfsd_enable to NO etc/mtree/BSD.include.dist Add a directory for libdevdctl's include files etc/mtree/BSD.tests.dist Add a directory for zfsd's unit tests etc/mtree/BSD.var.dist Add /var/db/zfsd/cases, where zfsd stores case files while it's shut down. etc/rc.d/Makefile etc/rc.d/zfsd Add zfsd's rc script sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c Fix the resource.fs.zfs.statechange message. It had a number of problems: It was only being emitted on a transition to the HEALTHY state. That made it impossible for zfsd to take actions based on drives getting sicker. It compared the new state to vdev_prevstate, which is the state that the vdev had the last time it was opened. That doesn't make sense, because a vdev can change state multiple times without being reopened. vdev_set_state contains logic that will change the device's new state based on various conditions. However, the statechange event was being posted _before_ that logic took effect. Now it's being posted after. Submitted by: gibbs, asomers, mav, allanjude Reviewed by: mav, delphij Relnotes: yes Sponsored by: Spectra Logic Corp, iX Systems Differential Revision: https://reviews.freebsd.org/D6564	2016-05-28 17:43:40 +00:00
Enji Cooper	dfdbdb0c82	Fix up r300870 The sys/types.h fix I proposed was only tested with zfs(4), not with libzpool, which is where the build failure actually existed Remove vm/vm_pageout.h from arc.c and zfs_vnops.c because they're both unneeded MFC after: 1 week X-MFC with: r300865, r300870 In collaboration with: kib Submitted by: alc Sponsored by: EMC / Isilon Storage Division	2016-05-27 22:56:00 +00:00
Alan Somers	151746b244	Avoid issuing spa config updates for physical path when not necessary ZFS's configuration needs to be updated whenever the physical path for a device changes, but not when a new device is introduced. This is because new devices necessarily cause config updates, but only if they are actually accepted into the pool. sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c Split vdev_geom_set_physpath out of vdev_geom_attrchanged. When setting the vdev's physical path, only request a config update if the physical path has changed. Don't request it when opening a device for the first time, because the config sync will happen anyway upstack. sys/geom/geom_dev.c Split g_dev_set_physpath and g_dev_set_media out of g_dev_attrchanged Submitted by: will, asomers MFC after: 4 weeks Sponsored by: Spectra Logic Corp Differential Revision: https://reviews.freebsd.org/D6428	2016-05-27 22:32:44 +00:00
Enji Cooper	765daefd68	Unbreak the zfs(4) build vm/vm_pageout.h grew a dependency on the bool typedef in r300865 arc.c didn't include sys/types.h, which included the definition for the typedef Other items (ofed, drm2) might need to be chased for this commit. X-MFC with: r300865 MFC after: 1 week Pointyhat to: alc Sponsored by: EMC / Isilon Storage Division	2016-05-27 20:33:38 +00:00
Ruslan Bukin	fed1ca4b71	Add initial DTrace support for RISC-V. Sponsored by: DARPA, AFRL Sponsored by: HEIF5	2016-05-24 16:41:37 +00:00
Andriy Gapon	fabe7e4ecc	add vop_print methods to vnode operatios of various zfsctl node types This should help with diagnostics of zfsctl problems. MFC after: 2 weeks	2016-05-18 13:21:29 +00:00
Andriy Gapon	e34c8d727b	move zfsctl_freebsd_root_lookup right next to zfsctl_root_lookup That makes it easier to reason about the code. MFC after: 5 weeks	2016-05-18 08:29:39 +00:00
Andriy Gapon	a4bbed22d2	zfsctl_common_fid: remove redundant assignment "Reinterpret cast" to zfid_short_t and assignment of zf_len do the job already. MFC after: 1 week	2016-05-18 08:26:09 +00:00
Andriy Gapon	e6d4eefe2a	zfsctl: tighten an assertion and remove an unused definition There are only two entries under .zfs and 'shares' has an ID of a special persistent object in its filesystem. MFC after: 1 week	2016-05-18 08:23:39 +00:00
Andriy Gapon	439e9b6804	zfs_root: no need to set the root flag here That was both redundant as zfs_znode_sa_init() already does the job and insufficient as the root vnode can be reached via other means. MFC after: 1 weeks	2016-05-18 08:19:41 +00:00
Andriy Gapon	74a3df2b1f	zfsctl_freebsd_root_lookup: gfs_vop_lookup may return a doomed vnode gfs code is (almsot) completely agnostic of FreeBSD VFS locking, so it does not handle doomed but not yet dead vnodes and may return them. Check for those vnodes here and retry a lookup. Note that ZFS and gfs have additional protections that ensure that a parent vnode of the current vnode is never doomed. The fixed problem is an occasional failure to lookup a 'snapshot' or 'shares' directories under .zfs. Note that for the above reason all uses of zfsctl_root_lookup() are better be replaced with VOP_LOOKUP. MFC after: 5 weeks	2016-05-18 08:02:49 +00:00
Alan Somers	5f7b3969e9	Speed up vdev_geom_open_by_guids Speedup is hard to measure because the only time vdev_geom_open_by_guids gets called on many drives at the same time is during boot. But with vdev_geom_open hacked to always call vdev_geom_open_by_guids, operations like "zpool create" speed up by 65%. sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c * Read all of a vdev's labels in parallel instead of sequentially. * In vdev_geom_read_config, don't read the entire label, including the uberblock. That's a waste of RAM. Just read the vdev config nvlist. Reduces the IO and RAM involved with tasting from 1MB to 448KB. Reviewed by: avg MFC after: 4 weeks Sponsored by: Spectra Logic Corp Differential Revision: https://reviews.freebsd.org/D6153	2016-05-17 15:17:23 +00:00
Andriy Gapon	857a214d03	zfs_ioc_rename: fix a reversed condition FreeBSD zfs_ioc_rename() has an option, not present upstream, that allows to rename snapshots without unmounting them first. I am not sure what is a rationale for that option, but its actual behavior was the opposite of the intended behavior. That is, by default the snapshots were not unmounted. The option was introduced as part of a large update from upstream in r248498. One of the consequences was a havoc under .zfs/snapshot after the rename. The snapshots got new names but were mounted on top of directories with old names, so readdir would list the new names, but lookup would still find the old mounts. PR: 209093 Reported by: Frédéric VANNIÈRE <f.vanniere@planet-work.com> MFC after: 5 days	2016-05-17 07:56:05 +00:00
Andriy Gapon	afe674f089	do not destroy 'snapdir' when it becomes inactive That was just wrong. In fact, we can safely keep this static entry when it's inactive. Now the destructive action is moved to the reclaim method and the function is renamed from zfsctl_snapdir_inactive(0 to zfsctl_snapdir_reclaim(). Also, we can use gfs_vop_reclaim() instead of gfs_dir_inactive() + kmem_free(). Lastly, we can just assert that the node does not any children when it is reclaimed, even on the force unmount. That's because zfs_umount() does an extra vflush() pass which should destroy all snapshot-mountpoint vnodes that are the snapdir's children. MFC after: 5 weeks	2016-05-16 15:48:56 +00:00
Andriy Gapon	9c3e205296	try to recycle "snap" vnodes as soon as possible Those vnodes should not linger. "Stale" nodes may get out of synchronization with actual snapshots. For example if we destroy a snapshot and create a new one with the same name. Or when we rename a snapshot. While there fix the argument type for zfsctl_snapshot_reclaim(). Also, its original argument can be passed to gfs_vop_reclaim() directly. Bug 209093 could be related although I have not specifically verified that. Referencing just in case. PR: 209093 MFC after: 5 weeks	2016-05-16 15:37:41 +00:00
Andriy Gapon	0ab1aa90fa	fix locking in zfsctl_root_lookup Dropping the root vnode's lock after VFS_ROOT() didn't really help the fact that we acquired the lock while holding its child's, .zfs, lock while performing the operaiton. So, directly use zfs_zget() to get the root vnode. While there simplify the code in zfsctl_freebsd_root_lookup. We know that .zfs is always exclusively locked. We know that there is already a reference on *vpp, so no need for an extra one. Account for the fact that .. lookup may ask for a different lock type, not necessarily LK_EXCLUSIVE. And handle a possible failure to acquire the lock given the lock flags. MFC after: 5 weeks	2016-05-16 15:28:39 +00:00

1 2 3 4 5 ...

1371 Commits