freebsd-nq

Author	SHA1	Message	Date
Debabrata Banerjee	4149bf498a	Correct signed operation Could return the wrong pages value AKAMAI: zfs: CR 3695072 Reviewed-by: Tim Chase <tim@chase2k.com> Reviewed-by: Richard Yao <ryao@gentoo.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com> Issue #6035	2017-05-02 15:50:26 -04:00
Debabrata Banerjee	44813aefad	Don't run the reaper if we didn't shrink the cache Calling it when nothing is evictable will cause extra kswapd cpu. Also if we didn't shrink it's unlikely to have memory to reap because we likely just called it microseconds ago. The exception is if we are in direct reclaim. You can see how hard this is being hit in kswapd with a light test workload: 34.95% [zfs] [k] arc_kmem_reap_now 5.40% [spl] [k] spl_kmem_cache_reap_now 3.79% [kernel] [k] _raw_spin_lock 2.86% [spl] [k] __spl_kmem_cache_generic_shrinker.isra.7 2.70% [kernel] [k] shrink_slab.part.37 1.93% [kernel] [k] isolate_lru_pages.isra.43 1.55% [kernel] [k] __wake_up_bit 1.20% [kernel] [k] super_cache_count 1.20% [kernel] [k] __radix_tree_lookup With ZFS just mounted but only ext4/pagecache memory pressure arc_kmem_reap_now still consumes excessive CPU: 12.69% [kernel] [k] isolate_lru_pages.isra.43 10.76% [kernel] [k] free_pcppages_bulk 7.98% [kernel] [k] drop_buffers 7.31% [kernel] [k] shrink_page_list 6.44% [zfs] [k] arc_kmem_reap_now 4.19% [kernel] [k] free_hot_cold_page 4.00% [kernel] [k] __slab_free 3.95% [kernel] [k] __isolate_lru_page 3.09% [kernel] [k] __radix_tree_lookup Same pagecache only workload as above with this patch series: 11.58% [kernel] [k] isolate_lru_pages.isra.43 11.20% [kernel] [k] drop_buffers 9.67% [kernel] [k] free_pcppages_bulk 8.44% [kernel] [k] shrink_page_list 4.86% [kernel] [k] __isolate_lru_page 4.43% [kernel] [k] free_hot_cold_page 4.00% [kernel] [k] __slab_free 3.44% [kernel] [k] __radix_tree_lookup (arc_kmem_reap_now has 0 samples in perf) AKAMAI: zfs: CR 3695042 Reviewed-by: Tim Chase <tim@chase2k.com> Reviewed-by: Richard Yao <ryao@gentoo.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com> Issue #6035	2017-05-02 15:50:13 -04:00
Debabrata Banerjee	1a31dcf53c	Only wakeup waiters if we've actually done work AKAMAI: zfs: CR 3695072 Reviewed-by: Tim Chase <tim@chase2k.com> Reviewed-by: Richard Yao <ryao@gentoo.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com> Issue #6035	2017-05-02 15:50:02 -04:00
Debabrata Banerjee	2e91c2fb1a	Do not stop kernel shrinker on lock contention Lock contention, by itself, shouldn't indicate a stop condition to the kernel's slab shrinker. Doing so can cause stalls when the kernel is trying to free large parts of the cache such as is done by drop_caches Also, perhaps arc_reclaim_lock should be a spinlock, and this code eliminated. AKAMAI: zfs: CR 3593801 Reviewed-by: Tim Chase <tim@chase2k.com> Reviewed-by: Richard Yao <ryao@gentoo.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com> Issue #6035	2017-05-02 15:49:48 -04:00
Debabrata Banerjee	b855550c33	Stop double reclaiming or not reclaiming at all Move arcstat_need_free increment from all direct calls to when arc_reclaim_lock is busy and we exit wihout doing anything. Data will be reclaimed in reclaim thread. The previous location meant that we both reclaim the memory in this thread, and also schedule the same amount of memory for reclaim in arc_reclaim, effectively doubling the requested reclaim. AKAMAI: zfs: CR 3695072 Reviewed-by: Tim Chase <tim@chase2k.com> Reviewed-by: Richard Yao <ryao@gentoo.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com> Issue #6035	2017-05-02 15:49:36 -04:00
Debabrata Banerjee	30fffb9021	Make arc_need_free updates atomic Ensures proper accounting of bytes we requested to free AKAMAI: zfs: CR 3695072 Reviewed-by: Tim Chase <tim@chase2k.com> Reviewed-by: Richard Yao <ryao@gentoo.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com> Issue #6035	2017-05-02 15:48:49 -04:00
Debabrata Banerjee	9b50146dc4	Don't report ghost buffers as evictable mem Ghost meta/data buffers are not actually allocated AKAMAI: zfs: CR 3695072 Reviewed-by: Tim Chase <tim@chase2k.com> Reviewed-by: Richard Yao <ryao@gentoo.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com> Issue #6035	2017-05-02 15:47:23 -04:00
jxiong	2b91b5119c	minor improvement to abd_free_pages() It doesn't need to have a loop to free page in a single scatterlist entry because it should be single or compound page. The pages can be freed in one invocation to __free_pages() for both cases. Reviewed-by: Gvozden Neskovic <neskovic@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Jinshan Xiong <jinshan.xiong@gmail.com> Closes #6057	2017-05-02 10:06:18 -07:00
jxiong	24fa20340d	Guarantee PAGESIZE alignment for large zio buffers In current implementation, only zio buffers in 16KB and bigger are guaranteed PAGESIZE alignment. This breaks Lustre since it assumes that 'arc_buf_t::b_data' must be page aligned when zio buffers are greater than or equal to PAGESIZE. This patch will make the zio buffers to be PAGESIZE aligned when the sizes are not less than PAGESIZE. This change may cause a little bit memory waste but that should be fine because after ABD is introduced, zio buffers are used to hold data temporarily and live in memory for a short while. Reviewed-by: Don Brady <don.brady@intel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Jinshan Xiong <jinshan.xiong@gmail.com> Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com> Closes #6084	2017-05-02 10:04:30 -07:00
Brian Behlendorf	7dae2c81e7	Linux 4.12 compat: super_setup_bdi_name() All filesystems were converted to dynamically allocated BDIs. The destruction of backing_dev_info structures is handled as part of super block destruction. Refactor the code to abstract away the details of creating and destroying a BDI. Reviewed-by: Chunwei Chen <david.chen@osnexus.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #6089	2017-05-02 09:46:18 -07:00
Yuri Pankov	153b228554	OpenZFS 7786 - zfs`vdev_online() needs better notification about state changes Authored by: Yuri Pankov <yuri.pankov@nexenta.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Albert Lee <trisk@forkgnu.org> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Ported-by: bunder2015 <omfgbunder@gmail.com> OpenZFS-issue: https://www.illumos.org/issues/7786 OpenZFS-commit: http://github.com/openzfs/openzfs/commit/db8498f Closes #6074	2017-05-01 16:24:37 -04:00
Brian Behlendorf	e99932f7de	Limit zfs_dirty_data_max_max to 4G Reinstate default 4G zfs_dirty_data_max_max limit. Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Chunwei Chen <david.chen@osnexus.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #6072 Closes #6081	2017-05-01 13:01:39 -07:00
Chunwei Chen	692e55b8fe	Reinstate zvol_taskq to fix aio on zvol Commit `37f9dac` removed the zvol_taskq for processing zvol requests. This was removed as part of switching to make_request_fn and was motivated by a concern at the time over dispatch latency. However, this also made all bio request synchronous, and caused serious performance issues as the bio submitter would wait for every bio it submitted, effectively making the IO depth 1. This patch reinstate zvol_taskq, and to make sure overlapped I/Os are ordered properly, we take range lock in zvol_request, and pass it along with bio to the I/O functions zvol_{write,discard,read}. In order to facilitate benchmarks a zvol_request_sync module option was added to switch between sync and async request handling. For the moment, the default behavior is synchronous but this is likely to change pending additional testing. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Closes #5824	2017-04-26 13:54:40 -07:00
Dan Kimmel	a7004725d0	OpenZFS 7252 - compressed zfs send / receive OpenZFS 7252 - compressed zfs send / receive OpenZFS 7628 - create long versions of ZFS send / receive options Authored by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Sebastien Roy <sebastien.roy@delphix.com> Reviewed by: David Quigley <dpquigl@davequigley.com> Reviewed by: Thomas Caputi <tcaputi@datto.com> Approved by: Dan McDonald <danmcd@omniti.com> Reviewed by: David Quigley <dpquigl@davequigley.com> Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Ported-by: bunder2015 <omfgbunder@gmail.com> Ported-by: Don Brady <don.brady@intel.com> Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> Porting Notes: - Most of 7252 was already picked up during ABD work. This commit represents the gap from the final commit to openzfs. - Fixed split_large_blocks check in do_dump() - An alternate version of the write_compressible() function was implemented for Linux which does not depend on fio. The behavior of fio differs significantly based on the exact version. - mkholes was replaced with truncate for Linux. OpenZFS-issue: https://www.illumos.org/issues/7252 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/5602294 Closes #6067	2017-04-26 12:31:43 -07:00
wli5	7a25f0891e	Change U16 to U32 due to atomic_inc_32_nv After run a long time with QAT compression, the variable "inst_num" is overflow by "atomic_inc_32_nv", which causes its neighbor variable overwritten. Change its definition from U16 to U32. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Weigang Li <weigang.li@intel.com> Closes #6051	2017-04-25 17:41:58 -07:00
Matthew Ahrens	a004338372	OpenZFS 8025 - dbuf_read() creates unnecessary zio_root() for bonus buf Authored by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Prashanth Sreenivasa <pks@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> dbuf_read() creates a zio_root() to track and wait for all the zio's that may happen as part of this call. However, if the blkptr_t for this buffer is NULL or a hole, we will not create any more zio's, so this zio_root() is unnecessary. This is always the case when calling dbuf_read() on a bonus buffer, because it has no blkptr (it's part of the containing dnode). For workloads that read a lot of bonus buffers (e.g. file creation and removal), creating and destroying these unnecessary zio's can decrease performance by around 3%. The fix is to only create/destroy the zio_root() in dbuf_read() if the blkptr is not NULL and not a hole. Porting Notes: - The error handling for when dbuf_read_impl() fails which was originally added in commit `5f6d0b6f5` has been preserved. OpenZFS-issue: https://www.illumos.org/issues/8025 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/8ec5c7c Closes #6048	2017-04-24 10:44:19 -07:00
dbavatar	6e03ec4fa2	Fix lseek result when dnode is dirty Fixup commit `66aca24`. We should have equivalent return values as generic_file_llseek() and advance to end of file. Reviewed-by: Richard Yao <ryao@gentoo.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Tested-by: bunder2015 <omfgbunder@gmail.com> Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com> Closes #6050 Closes #6053	2017-04-24 09:38:31 -07:00
Olaf Faaland	0091d66f4e	Correct lock ASSERTs in vdev_label_read/write The existing assertions in vdev_label_read() and vdev_label_write(), testing which config locks are held, are incorrect. The assertions test for locks which exceed what is required for safety. Both vdev_label_{read,write}() are changed to assert SCL_STATE is held as RW_READER or RW_WRITER. This is safe because: Changes to the vdev tree occur under SCL_ALL as RW_WRITER, via spa_vdev_enter() and spa_vdev_exit(). Changes to vdev state occur under SCL_STATE_ALL as RW_WRITER, via spa_vdev_state_enter() and spa_vdev_state_exit(). Therefore, the new assertions guarantee that the vdev cannot change out from under a zio, and I/O to a specified leaf vdev's label is safe. Furthermore, this is consistent with the SPA locking discussion in spa_misc.c, "For any zio operation that takes an explicit vdev_t argument ... zio_read_phys(), or zio_write_phys() ... SCL_STATE as reader suffices." Reviewed-by: Chunwei Chen <david.chen@osnexus.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Olaf Faaland <faaland1@llnl.gov> Closes #5983	2017-04-21 14:26:43 -07:00
DHE	06226b5936	Increase zfs_vdev_async_write_min_active to 2 Resilver operations frequently cause only a small amount of dirty data to be written to disk at a time, resulting in the IO scheduler to only issue 1 write at a time to the resilvering disk. When it is rotational media the drive will often travel past the next sector to be written before receiving a write command from ZFS, significantly delaying the write of the next sector. Raise zfs_vdev_async_write_min_active so that drives are kept fed during resilvering. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: DHE <git@dehacked.net> Issue #4825 Closes #5926	2017-04-14 14:03:44 -07:00
Matthew Ahrens	f6d4ce8e34	OpenZFS 8061 - sa_find_idx_tab can be declared more type-safely Authored by: Matthew Ahrens <mahrens@delphix.com> Approved by: Dan McDonald <danmcd@omniti.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Chris Williamson <chris.williamson@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov> sa_find_idx_tab() is declared as taking and returning "void *" parameters. These can be declared to be the specific types. OpenZFS-issue: https://www.illumos.org/issues/8061 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/4e64aff Closes #6017	2017-04-14 11:11:28 -07:00
Andriy Gapon	87a275d97a	OpenZFS 6101 - attempt to lzc_create() a filesystem under a volume results in a panic Authored by: Andriy Gapon <avg@FreeBSD.org> Approved by: Dan McDonald <danmcd@omniti.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov> When querying ZPL properties verify that the objset is of type DMU_OST_ZFS. OpenZFS-issue: https://www.illumos.org/issues/6101 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/ce2243a Closes #6015	2017-04-14 11:11:28 -07:00
Andriy Gapon	31b6bc74b9	OpenZFS 8026 - retire zfs_throttle_delay and zfs_throttle_resolution Authored by: Andriy Gapon <avg@FreeBSD.org> Approved by: Richard Lowe <richlowe@richlowe.net> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov> OpenZFS-issue: https://www.illumos.org/issues/8026 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/9b33e07 Closes #6014	2017-04-14 11:11:20 -07:00
Debabrata Banerjee	66aca24730	SEEK_HOLE should not block on txg_wait_synced() Force flushing of txg's can be painfully slow when competing for disk IO, since this is a process meant to execute asynchronously. Optimize this path via allowing data/hole seeking if the file is clean, but if dirty fall back to old logic. This is a compromise to disabling the feature entirely. Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com> Closes #4306 Closes #5962	2017-04-13 10:51:20 -07:00
Brian Behlendorf	e550644f0c	OpenZFS 5120 - zfs should allow large block/gzip/raidz boot pool (loader project) Authored by: Toomas Soome <tsoome@me.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com> Reviewed by: Andrew Stormont <andyjstormont@gmail.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Don Brady <don.brady@intel.com> Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> Porting Notes: - grub-2.02-beta2-422-gcad5cc0 includes support for large blocks. - Commit `8aab121` allowed GZIP[1-9]. - Grub allows pools with multiple top-level vdevs. OpenZFS-issue: https://www.illumos.org/issues/5120 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/c8811bd Closes #6007	2017-04-13 09:40:00 -07:00
Brian Behlendorf	00481e7dad	OpenZFS 7503 - zfs-test should tail ::zfs_dbgmsg on test failure Authored by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed by: Matt Ahrens <mahrens@delphix.com> Approved by: Gordon Ross <gordon.w.ross@gmail.com> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> Porting Notes: - Enable internal log for DEBUG builds and in zfs-tests.sh. - callbacks/zfs_dbgmsg.ksh - Dump interal log via kstat. - callbacks/zfs_dmesg.ksh - Dump dmesg log. - default.cfg - 'Test Suite Specific Commands' dropped. OpenZFS-issue: https://www.illumos.org/issues/7503 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/55a1300 Closes #6002	2017-04-12 13:36:48 -07:00
Giuseppe Di Natale	17b43f96f9	Skip rate limiting events in zfs_ereport_post In zfs_ereport_post, if an event is a rate limiting event, immediately return before any processing is done. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov> Closes #5998	2017-04-11 18:37:45 -07:00
LOLi	047187c1bd	Fix size inflation in spa_get_worst_case_asize() When we try assign a new transaction to a TXG we must know beforehand if there is sufficient free space on disk. This is to decide, in dmu_tx_assign(), if we should reject the TX with ENOSPC. We rely on spa_get_worst_case_asize() to inflate the size of our logical writes by a factor of spa_asize_inflation which is calculated as: (VDEV_RAIDZ_MAXPARITY + 1) * SPA_DVAS_PER_BP * 2 == 24 The problem with the current implementation is that we don't take into account what happens with very small writes on VDEVs with large physical block sizes. Consider the case of writes to a dataset with recordsize=512, copies=3 on a VDEV with ashift=13 (usually SSD with 8K block size): every logical IO will end up allocating 3 * 8K = 24K on disk, so 512 bytes multiplied by 48, which is double the size we account for. If we allow this kind of writes to be assigned a TX it is possible, when the pool is almost full, to trigger an allocation failure (ENOSPC) in the ZIO pipeline, which will in turn result in the whole pool being suspended. The bug is fixed by using, in spa_get_worst_case_asize(), the MAX() value chosen between the logical io size from zfs_write() and the maximum physical block size used among our VDEVs. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #5941	2017-04-10 15:28:21 -07:00
Matthew Ahrens	8542ef852a	OpenZFS 8005 - poor performance of 1MB writes on certain RAID-Z configurations Authored by: Matt Ahrens <mahrens@delphix.com> Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Don Brady <don.brady@intel.com> Ported-by: Matt Ahrens <mahrens@delphix.com> RAID-Z requires that space be allocated in multiples of P+1 sectors, because this is the minimum size block that can have the required amount of parity. Thus blocks on RAIDZ1 must be allocated in a multiple of 2 sectors; on RAIDZ2 multiple of 3; and on RAIDZ3 multiple of 4. A sector is a unit of 2^ashift bytes, typically 512B or 4KB. To satisfy this constraint, the allocation size is rounded up to the proper multiple, resulting in up to 3 "pad sectors" at the end of some blocks. The contents of these pad sectors are not used, so we do not need to read or write these sectors. However, some storage hardware performs much worse (around 1/2 as fast) on mostly-contiguous writes when there are small gaps of non-overwritten data between the writes. Therefore, ZFS creates "optional" zio's when writing RAID-Z blocks that include pad sectors. If writing a pad sector will fill the gap between two (required) writes, we will issue the optional zio, thus doubling performance. The gap-filling performance improvement was introduced in July 2009. Writing the optional zio is done by the io aggregation code in vdev_queue.c. The problem is that it is also subject to the limit on the size of aggregate writes, zfs_vdev_aggregation_limit, which is by default 128KB. For a given block, if the amount of data plus padding written to a leaf device exceeds zfs_vdev_aggregation_limit, the optional zio will not be written, resulting in a ~2x performance degradation. The problem occurs only for certain values of ashift, compressed block size, and RAID-Z configuration (number of parity and data disks). It cannot occur with the default recordsize=128KB. If compression is enabled, all configurations with recordsize=1MB or larger will be impacted to some degree. The problem notably occurs with recordsize=1MB, compression=off, with 10 disks in a RAIDZ2 or RAIDZ3 group (with 512B or 4KB sectors). Therefore this problem has been known as "the 1MB 10-wide RAIDZ2 (or 3) problem". The problem also occurs with the following configurations: With recordsize=512KB or 256KB, compression=off, the problem occurs only in rarely-used configurations: * 4-wide RAIDZ1 with recordsize=512KB and ashift=12 (4KB sectors) * 4-wide RAIDZ2 (either recordsize, either ashift) * 5-wide RAIDZ2 with recordsize=512KB (either ashift) * 6-wide RAIDZ2 with recordsize=512KB (either ashift) With recordsize=1MB, compression=off, ashift=9 (512B sectors) * RAIDZ1 with 4 or 8 disks * RAIDZ2 with 4, 8, or 10 disks * RAIDZ3 with 6, 8, 9, or 10 disks With recordsize=1MB, compression=off, ashift=12 (4KB sectors) * RAIDZ1 with 7 or 8 disks * RAIDZ2 with 4, 5, or 10 disks * RAIDZ3 with 6, 9, or 10 disks With recordsize=2MB and larger (which can only be selected by changing kernel tunables), many configurations are affected, including with higher numbers of disks (up to 18 disks with recordsize=2MB). Increase zfs_vdev_aggregation_limit to allow the optional zio to be aggregated, thus eliminating the problem. Setting it to 256KB fixes all commonly-used configurations. The solution is to aggregate optional zio's regardless of the aggregation size limit. Analysis sponsored by Intel Corp. OpenZFS-issue: https://www.illumos.org/issues/8005 OpenZFS-commit: https://github.com/openzfs/openzfs/pull/321 Closes #5931	2017-04-10 15:21:45 -07:00
Giuseppe Di Natale	42db43e982	OpenZFS 2932 - support crash dumps to raidz, etc. pools Authored by: Bill Pijewski <wdp@joyent.com> Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Dan McDonald <danmcd@nexenta.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov> OpenZFS-issue: https://www.illumos.org/issues/2932 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/810e43b Closes #5984 Closes #5216	2017-04-10 10:24:17 -07:00
George Wilson	3b7f360c96	OpenZFS 8023 - Panic destroying a metaslab deferred range tree Authored by: George Wilson <george.wilson@delphix.com> Approved by: Dan McDonald <danmcd@omniti.com> Reviewed by: Brad Lewis <brad.lewis@delphix.com> Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov> We don't want to dirty any data when we're in the final txgs of the pool export logic. This change introduces checks to make sure that no data is dirtied after a certain point. It also addresses the culprit of this specific bug – the space map cannot be upgraded when we're in final stages of pool export. If we encounter a space map that wants to be upgraded in this phase, then we simply ignore the request as it will get retried the next time we set the fragmentation metric on that metaslab. OpenZFS-issue: https://www.illumos.org/issues/8023 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/2ef00f5 Closes #5991	2017-04-09 16:12:35 -07:00
Andriy Gapon	c0c8cc7b43	OpenZFS 8027 - tighten up dsl_pool_dirty_delta Authored by: Andriy Gapon <avg@FreeBSD.org> Approved by: Dan McDonald <danmcd@omniti.com> Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov> OpenZFS-issue: https://www.illumos.org/issues/8027 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/642668d Closes #5988	2017-04-09 16:04:35 -07:00
Don Brady	177c91d06e	Fix regression in zfs_ereport_start() On 32-bit platforms spa_state is 32 bits without cast, and thus caused a NULL pointer dereference when treated as 64bit in var arg. Accidentally introduced by `bcdb96a`. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com> Signed-off-by: Don Brady <don.brady@intel.com> Closes #5966 Closes #5965	2017-04-05 14:24:26 -07:00
Steven Hartland	2e215fecbe	OpenZFS 7885 - zpool list can report 16.0e for expandsz Authored by: Steven Hartland <steven.hartland@multiplay.co.uk> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Gordon Ross <gordon.w.ross@gmail.com> When a member of a RAIDZ has been replaced with a device smaller than the original, then the top level vdev can report its expand size as 16.0E. The reduced child asize causes the RAIDZ to have a vdev_asize lower than its vdev_max_asize which then results in an underflow during the calculation of the parents expand size. Fix this by updating the vdev_asize if it shrinks, which is already protected by a check against vdev_min_asize so should always be safe. Also for RAIDZ vdevs, ensure that the sum of their child vdev_min_asize is always greater than the parents vdev_min_size. Reviewed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Ported-by: George Melikov <mail@gmelikov.ru> OpenZFS-issue: https://www.illumos.org/issues/7885 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/bb0dbaa Closes #5963	2017-04-05 09:33:20 -07:00
N Clark	bcdb96a3e1	Additional Information for Zedlets * Add ZPOOL pool state to zfs_post_common to allow differentiation between export and destroy by zedlets. * Add pool name as standard export This ensures pool name is exported to zedlets. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Don Brady <don.brady@intel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Nathaniel Clark <nathaniel.l.clark@intel.com> Closes #5942	2017-04-03 14:23:02 -07:00
Gvozden Neskovic	84c07adadb	Remove dependency on linear ABD Wherever possible it's best to avoid depending on a linear ABD. Update the code accordingly in the following areas. - vdev_raidz - zio, zio_checksum - zfs_fm - change abd_alloc_for_io() to use abd_alloc() Reviewed-by: David Quigley <david.quigley@intel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Gvozden Neskovic <neskovic@gmail.com> Closes #5668	2017-03-29 12:24:51 -07:00
LOLi	ff61d1a495	Check ashift validity in 'zpool add' `df83110` added the ability to specify a custom "ashift" value from the command line in 'zpool add' and 'zpool attach'. This commit adds additional checks to the provided ashift to prevent invalid values from being used, which could result in disastrous consequences for the whole pool. Additionally provide ASHIFT_MAX and ASHIFT_MIN definitions in spa.h. Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #5878	2017-03-28 17:21:11 -07:00
Chunwei Chen	12aec7dcd9	Fix wrong offset args in vdev_cache_write The offset arguments is wrong when changing to abd_copy_off in `a6255b7` Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Gvozden Neskovic <neskovic@gmail.com> Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Closes #5932 Closes #5936	2017-03-28 11:06:22 -07:00
Brian Behlendorf	8d70398740	Retry zfs_znode_alloc() in zfs_mknode() For historical reasons zfs_mknode() was written such that it could never fail. This poses a problem for Linux since zfs_znode_alloc() could potentually failure due to low memory. Handle this gracefully by retrying zfs_znode_alloc() until it succeeds, direct reclaim will eventually be able to allocate memory. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #5535 Closes #5908	2017-03-23 18:26:50 -07:00
George Wilson	55922e73b4	OpenZFS 3821 - Race in rollback, zil close, and zil flush Authored by: George Wilson <george.wilson@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Andriy Gapon <avg@FreeBSD.org> Approved by: Richard Lowe <richlowe@richlowe.net> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Ported-by: George Melikov <mail@gmelikov.ru> OpenZFS-issue: https://www.illumos.org/issues/3821 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/43297f9 Closes #5905	2017-03-23 18:20:58 -07:00
wli5	6a9d635998	GZIP compression offloading with QAT accelerator This patch implement the hardware accelerator method in GZIP compression in ZFS. When the ZFS pool is enabled GZIP compression, the compression API will be automatically transferred to the hardware accelerator to free up CPU resource and speed up the compression time. * To enable Intel QAT hardware acceleration in ZOL you need to have QAT hardware and the driver installed: * QAT hardware DH8950: http://ark.intel.com/products/79483/Intel-QuickAssist-Adapter-8950 * QAT driver: https://01.org/intel-quickassist-technology * Start QAT driver in your system: service qat_service start * Enable QAT in ZFS, e.g.: ./configure --with-qat=<qat-driver-path>/QAT1.6 make * Set GZIP compression in ZFS dataset: zfs set compression = gzip <dataset> * Get QAT hardware statistics by: cat /proc/spl/kstat/zfs/qat * To disable QAT in ZFS: insmod zfs.ko zfs_qat_disable=1 Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com> Signed-off-by: Weigang Li <weigang.li@intel.com> Closes #5846	2017-03-22 17:58:47 -07:00
Matthew Ahrens	64fc776208	OpenZFS 7968 - multi-threaded spa_sync() Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Brad Lewis <brad.lewis@delphix.com> Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Ported-by: Matthew Ahrens <mahrens@delphix.com> spa_sync() iterates over all the dirty dnodes and processes each of them by calling dnode_sync(). If there are many dirty dnodes (e.g. because we created or removed a lot of files), the single thread of spa_sync() calling dnode_sync() can become a bottleneck. Additionally, if many dnodes are dirtied concurrently in open context (e.g. due to concurrent file creation), the os_lock will experience lock contention via dnode_setdirty(). The solution is to track dirty dnodes on a multilist_t, and for spa_sync() to use separate threads to process each of the sublists in the multilist. OpenZFS-issue: https://www.illumos.org/issues/7968 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/4a2a54c Closes #5752	2017-03-20 18:36:00 -07:00
Olaf Faaland	a3478c0747	Linux 4.11 compat: iops.getattr and friends In torvalds/linux@a528d35, there are changes to the getattr family of functions, struct kstat, and the interface of inode_operations .getattr. The inode_operations .getattr and simple_getattr() interface changed to: int (getattr) (const struct path , struct dentry , struct kstat , u32 request_mask, unsigned int query_flags) The request_mask argument indicates which field(s) the caller intends to use. Fields the caller has not specified via request_mask may be set in the returned struct anyway, but their values may be approximate. The query_flags argument indicates whether the filesystem must update the attributes from the backing store. Currently both fields are ignored. It is possible that getattr-related functions within zfs could be optimized based on the request_mask. struct kstat includes new fields: u32 result_mask; /* What fields the user got / u64 attributes; / See STATX_ATTR_* flags / struct timespec btime; / File creation time */ Fields attribute and btime are cleared; the result_mask reflects this. These appear to be optional based on simple_getattr() and vfs_getattr() within the kernel, which take the same approach. Reviewed-by: Chunwei Chen <david.chen@osnexus.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Olaf Faaland <faaland1@llnl.gov> Closes #5875	2017-03-20 17:51:16 -07:00
Matthew Ahrens	9522bd2429	OpenZFS 7801 - add more by-dnode routines (lint) Authored by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov> OpenZFS-issue: https://www.illumos.org/issues/7801 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/f25efb3 Closes #5894	2017-03-20 12:33:17 -07:00
Matthew Ahrens	8614ddf9b4	OpenZFS 6874 - rollback and receive need to reset ZPL state to what's on disk Authored by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> When we do a clone swap (caused by "zfs rollback" or "zfs receive"), the ZPL doesn't completely reload the state from the DMU; some values remain cached in the zfsvfs_t. OpenZFS-issue: https://www.illumos.org/issues/6874 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/1fdcbd0 Closes #5888	2017-03-13 17:42:42 -07:00
Brian Behlendorf	1c2555ef92	Restructure mount option handling Restructure the handling of mount options to be consistent with upstream OpenZFS. This required making the following changes. - The zfs_mntopts_t was renamed vfs_t and adjusted to provide the minimal needed functionality. This includes a pointer back to the associated zfsvfs_t. Plus it made it possible to revert zfs_register_callbacks() and zfsvfs_create() back to their original prototypes. - A zfs_mnt_t structure was added for the sole purpose of providing a structure to pass the osname and raw mount pointer to zfs_domount() without having to copy them. - Mount option parsing was moved down from the zpl_* wrapper functions in to the zfs_* functions. This allowed for the code to be simplied and it's where similar functionality appears on other platforms. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2017-03-10 09:51:41 -08:00
Brian Behlendorf	f298b24ddf	Rename zfs_* functions Several functions were renamed when ZFS was originally ported to Linux. Revert the code to the original names to minimize the delta with upstream OpenZFS. zfs_sb_teardown -> zfsvfs_teardown zfs_sb_create -> zfsvfs_create zfs_sb_setup -> zfsvfs_setup zfs_sb_free -> zfsvfs_free get_zfs_sb -> getzfsvfs zfs_sb_hold -> zfsvfs_hold zfs_sb_rele -> zfsvfs_rele zfs_sb_prune_aliases -> zfs_prune_aliases (Linux-only) zfs_sb_prune -> zfs_prune (Linux only) Align the zfs_vnops.h and zfs_vfsops.h with upstream as much as possible. Several prototypes were removed and those that remain were reordered. Move the EXPORT_SYMBOL lines to the end of the source files for consistency with the other source files. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2017-03-10 09:51:35 -08:00
Brian Behlendorf	0037b49e83	Rename zfs_sb_t -> zfsvfs_t The use of zfs_sb_t instead of zfsvfs_t results in unnecessary conflicts with the upstream source. Change all instances of zfs_sb_t to zfsvfs_t including updating the variables names. Whenever possible the code was updated to be consistent with hope it appears in the upstream OpenZFS source. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2017-03-10 09:51:33 -08:00
Brian Behlendorf	ef1bdf363c	Fix ZVOL BLKFLSBUF ioctl The BLKFLSBUF ioctl is expected to do two things: - flush dirty pages to stable storage, and - invalidate clean pages Unfortunately, the existing implementation of BLKFLSBUF in zvol_ioctl() only flushes pages which are part of the current TXG to disk. There may be additional dirty pages in the page cache which haven't yet been submitted to the DMU and therefore aren't part of any TXG. Furthermore because zvol_ioctl() returns 0 the generic blkdev_flushbuf() does not invalidate the page cache. Resolve the issue by moving bdev_flush() in to zvol_ioctl() and explicitly waiting for a full TXG sync. Then invalidate the page cache. The associated ARC buffers need not be evicted since they cannot be bypassed using O_DIRECT. Reviewed-by: Chunwei Chen <david.chen@osnexus.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #5871 Closes #5879	2017-03-09 17:43:36 -08:00
Giuseppe Di Natale	589bb918ef	Suppress cppcheck nullPointer error in zfs_write Newer versions of cppcheck find the potential NULL pointer bug in zfs_write(). The function is difficult to refactor without extensive work, so suppress the potential NULL pointer error which cannot occur for now. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov> Closes #5882	2017-03-09 17:40:21 -08:00
Chunwei Chen	9b77d1c958	Fix nfs snapdir automount The current implementation for allowing nfs to access snapdir is very buggy. It uses a special fh for snapdirs, such that the next time nfsd does fh_to_dentry, it actually returns the root inode inside the snapshot. So nfsd never knows it cross a mountpoint. The problem is that nfsd will not hold a reference on the vfsmount of the snapshot. This cause auto unmounter to unmount the snapshot even though nfs is still holding dentries in it. To fix this, we return the inode for the snapdirs themselves. However, we also trigger automount upon fh_to_dentry, and return ESTALE so nfsd will revalidate and see the mountpoint and do crossmnt. Because nfsd will now be aware that these are different filesystems users must add crossmnt to their export options to access snapshot directories. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Closes #3794 Closes #4716 Closes #5810 Closes #5833	2017-03-08 09:26:33 -08:00
Andriy Gapon	423e7b6261	OpenZFS 7867 - ARC space accounting leak Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Tim Chase <tim@chase2k.com> Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> OpenZFS-issue: https://www.illumos.org/issues/7867 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/aa1f740d Closes #5874	2017-03-07 18:14:32 -08:00
Gvozden Neskovic	650383f283	[icp] fpu and asm cleanup for linux Properly annotate functions and data section so that objtool does not complain when CONFIG_STACK_VALIDATION and CONFIG_FRAME_POINTER are enabled. Pass KERNELCPPFLAGS to assembler. Use kfpu_begin()/kfpu_end() to protect SIMD regions in Linux kernel. Reviewed-by: Tom Caputi <tcaputi@datto.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Gvozden Neskovic <neskovic@gmail.com> Closes #5872 Closes #5041	2017-03-07 12:59:31 -08:00
Brian Behlendorf	3ec3bc2167	OpenZFS 7793 - ztest fails assertion in dmu_tx_willuse_space Reviewed by: Steve Gonczi <steve.gonczi@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> Background information: This assertion about tx_space_* verifies that we are not dirtying more stuff than we thought we would. We “need” to know how much we will dirty so that we can check if we should fail this transaction with ENOSPC/EDQUOT, in dmu_tx_assign(). While the transaction is open (i.e. between dmu_tx_assign() and dmu_tx_commit() — typically less than a millisecond), we call dbuf_dirty() on the exact blocks that will be modified. Once this happens, the temporary accounting in tx_space_* is unnecessary, because we know exactly what blocks are newly dirtied; we call dnode_willuse_space() to track this more exact accounting. The fundamental problem causing this bug is that dmu_tx_hold_() relies on the current state in the DMU (e.g. dn_nlevels) to predict how much will be dirtied by this transaction, but this state can change before we actually perform the transaction (i.e. call dbuf_dirty()). This bug will be fixed by removing the assertion that the tx_space_ accounting is perfectly accurate (i.e. we never dirty more than was predicted by dmu_tx_hold_()). By removing the requirement that this accounting be perfectly accurate, we can also vastly simplify it, e.g. removing most of the logic in dmu_tx_count_(). The new tx space accounting will be very approximate, and may be more or less than what is actually dirtied. It will still be used to determine if this transaction will put us over quota. Transactions that are marked by dmu_tx_mark_netfree() will be excepted from this check. We won’t make an attempt to determine how much space will be freed by the transaction — this was rarely accurate enough to determine if a transaction should be permitted when we are over quota, which is why dmu_tx_mark_netfree() was introduced in 2014. We also won’t attempt to give “credit” when overwriting existing blocks, if those blocks may be freed. This allows us to remove the do_free_accounting logic in dbuf_dirty(), and associated routines. This logic attempted to predict what will be on disk when this txg syncs, to know if the overwritten block will be freed (i.e. exists, and has no snapshots). OpenZFS-issue: https://www.illumos.org/issues/7793 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/3704e0a Upstream bugs: DLPX-32883a Closes #5804 Porting notes: - DNODE_SIZE replaced with DNODE_MIN_SIZE in dmu_tx_count_dnode(), Using the default dnode size would be slightly better. - DEBUG_DMU_TX wrappers and configure option removed. - Resolved _by_dnode() conflicts these changes have not yet been applied to OpenZFS.	2017-03-07 09:51:59 -08:00
Brian Behlendorf	e2fcb56275	OpenZFS 7843 - get_clones_stat() is suboptimal for lots of clones Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> OpenZFS-issue: https://www.illumos.org/issues/7843 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/4d519e7 Closes #5868	2017-03-07 09:47:40 -08:00
Chunwei Chen	7a789346af	Fix loop device becomes read-only Commit `933ec99` removes read and write from f_op because the vfs layer will select iter_write or aio_write automatically. However, for Linux <= 4.0, loop_set_fd will actually check f_op->write and set read-only if not exists. This patch add them back and use the generic do_sync_{read,write} for aio_{read,write} and new_sync_{read,write} for {read,write}_iter. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Closes #5776 Closes #5855	2017-03-06 09:20:20 -08:00
Olaf Faaland	4859fe796c	Linux 4.11 compat: avoid refcount_t name conflict Linux 4.11 introduces a new type, refcount_t, which conflicts with the type of the same name defined within ZFS. Rename the ZFS type zfs_refcount_t. Within the ZFS code, use a macro to cause references to refcount_t to be changed to zfs_refcount_t at compile time. This reduces conflicts when later landing OpenZFS patches. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Olaf Faaland <faaland1@llnl.gov> Closes #5823 Closes #5842	2017-02-28 16:10:18 -08:00
Matthew Ahrens	66eead53c9	Clean up by-dnode code in dmu_tx.c `0eef1bde31` introduced some changes which we slightly improved the style of when porting to illumos. There is also one minor error-handling fix, in zap_add() the "zap" may become NULL in case of an error re-opening the ZAP. Originally suggested at: https://github.com/openzfs/openzfs/pull/276 Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #5805	2017-02-24 13:34:26 -08:00
Isaac Huang	f7e76821c5	ABD style cleanups The commit `a6255b7fce` removed a few assertions which help catch errors and improve code readability. It also duplicated two conditionals, which was unnecessary and made the code confusing to read. This patch cleans it up. Reviewed-by: David Quigley <david.quigley@intel.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Isaac Huang <he.huang@intel.com> Closes #5802	2017-02-24 12:05:42 -08:00
Daniel Hoffman	9e2c3bb4b9	OpenZFS 7812 - Remove gender specific language Authored by: Daniel Hoffman <dj.hoffman@delphix.com> Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Steve Gonczi <steve.gonczi@delphix.com> Reviewed by: Chris Williamson <chris.williamson@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Igor Kozhukhov <igor@dilos.org> Reviewed by: Dan McDonald <danmcd@omniti.com> Reviewed by: Robert Mustacchi <rm@joyent.com> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Ported-by: George Melikov <mail@gmelikov.ru> This change removes all gendered language that did not refer specifically to an individual person or pet. The convention taken was to use variations on "they" when referring to users and/or human beings, while using "it" when referring to code, functions, and/or libraries. Additionally, we took the liberty to fix up any whitespace issues that were found in any files that were already being modified. OpenZFS-issue: https://www.illumos.org/issues/7812 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/ad626db Closes #5822	2017-02-24 11:07:04 -08:00
Andriy Gapon	0efd97912a	OpenZFS 7199 - dsl_dataset_rollback_sync may try to free already free blocks 7200 no blocks must be born in a txg after a snaphot is created Authored by: Andriy Gapon <andriy.gapon@clusterhq.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Brad Lewis <brad.lewis@delphix.com> Approved by: Gordon Ross <gordon.w.ross@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Ported-by: George Melikov <mail@gmelikov.ru> OpenZFS-issue: https://www.illumos.org/issues/7199 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/bfaed0b Closes #5817	2017-02-24 11:05:33 -08:00
Isaac Huang	6d82f98c3d	Fix incorrect spare vdev state after replacing After a hot spare replaces an OFFLINE vdev, the new parent spare vdev state is set incorrectly to OFFLINE. The correct state should be DEGRADED. The incorrect OFFLINE state will prevent top-level vdev from reading the spare vdev, thus causing unnecessary reconstruction. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Don Brady <don.brady@intel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Isaac Huang <he.huang@intel.com> Closes #5766 Closes #5770	2017-02-23 10:32:15 -08:00
Matthew Ahrens	c30e58c462	zfs_arc_num_sublists_per_state should be common to all multilists The global tunable zfs_arc_num_sublists_per_state is used by the ARC and the dbuf cache, and other users are planned. We should change this tunable to be common to all multilists. This tuning may be overridden on a per-multilist basis. Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #5764	2017-02-15 15:49:33 -08:00
Tim Chase	544596c59e	Fix zfs_compressed_arc_enabled parameter description A likely cut/paste error caused the description to be applied to zfs_arc_average_blocksize. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Closes #5788	2017-02-13 10:59:05 -08:00
Chunwei Chen	d6df043c53	Fix off by one in zpl_lookup Doing the following command would return success with zfs creating an orphan object. touch $(for i in $(seq 256); do printf "n"; done) The funny thing is that this will only work once for each directory, because after upgraded to fzap, zfs_lookup would fail properly since it has additional length check. Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #5768	2017-02-11 12:42:17 -08:00
Matthew Ahrens	d7958b4cda	OpenZFS 7104 - increase indirect block size Authored by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Dan McDonald <danmcd@omniti.com> Approved by: Robert Mustacchi <rm@joyent.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Ported-by: George Melikov <mail@gmelikov.ru> OpenZFS-issue: https://www.illumos.org/issues/7104 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/4b5c8e9 Closes #5679	2017-02-09 10:27:02 -08:00
Matthew Ahrens	df7eeccc75	panic in bpobj_space(): null pointer dereference This is a race condition in the deadlist code. A thread executing an administrative command that uses dsl_deadlist_space_range() holds the lock of the whole deadlist_t to protect the access of all its entries that the deadlist contains in an avl tree. Sync threads trying to insert a new entry in the deadlist (through dsl_deadlist_insert() -> dle_enqueue()) do not hold the deadlist lock at that moment. If the dle_bpobj is the empty bpobj (our sentinel value), we close and reopen it. Between these two operations, it is possible for the dsl_deadlist_space_range() thread to dereference that bpobj which is NULL during that window. Threads should hold the a deadlist's dl_lock when they manipulate its internal data so scenarios like the one above are avoided. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #5762	2017-02-09 10:19:12 -08:00
Brian Behlendorf	ea7e86d8db	Fix iput() calls within a tx As explicitly stated in section 2 of the 'Programming rules' comments at the top of zfs_vnops.c. If you must call iput() within a tx then use zfs_iput_async(). Move iput() calls after dmu_tx_commit() / dmu_tx_abort when possible. When not possible convert the iput() calls to zfs_iput_async(). Reviewed-by: Don Brady <don.brady@intel.com> Reviewed-by: Chunwei Chen <david.chen@osnexus.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #5758	2017-02-08 17:28:22 -08:00
Matthew Ahrens	4a5d7f8267	Allow c99 code to compile Add the appropriate compiler flags to accept c99 code. This will help to minimize differences with upstream, and aid porting changes. One change was necessary in zvol.c because the DEFINE_IDA() macro does not work with the new compiler flags. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Chunwei Chen <david.chen@osnexus.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #5756	2017-02-08 09:27:48 -08:00
George Melikov	23d70cdef1	OpenZFS 6931 - lib/libzfs: cleanup gcc warnings Authored by: Igor Kozhukhov <ikozhukhov@gmail.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Ported-by: George Melikov <mail@gmelikov.ru> OpenZFS-issue: https://www.illumos.org/issues/6931 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/88f61de Closes #5741	2017-02-07 14:02:27 -08:00
David Quigley	bef78122e6	Add missing module_param for zfs_per_txg_dirty_frees_percent When the code was added this tunable was not exposed via module params. Also it was not documented. This patch changes the type from a uint32 to a ulong as done with other percentage tunables and also documents it in the zfs-module-parameters man page. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: David Quigley <david.quigley@intel.com> Closes #5750	2017-02-07 09:44:03 -08:00
George Melikov	298ec40b6d	OpenZFS 7448 - ZFS doesn't notice when disk vdevs have no write cache Authored by: Hans Rosenfeld <hans.rosenfeld@nexenta.com> Reviewed by: Dan Fields <dan.fields@nexenta.com> Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Dan McDonald <danmcd@omniti.com> Reviewed-by: Don Brady <don.brady@intel.com> Reviewed-by: Tim Chase <tim@chase2k.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Ported-by: George Melikov <mail@gmelikov.ru> OpenZFS-issue: https://www.illumos.org/issues/7448 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/295438b Closes #5737	2017-02-04 09:23:50 -08:00
George Melikov	0a252daed3	OpenZFS 7504 - kmem_reap hangs spa_sync and administrative tasks Authored by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Approved by: Dan McDonald <danmcd@omniti.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tim Chase <tim@chase2k.com> Ported-by: George Melikov <mail@gmelikov.ru> OpenZFS-issue: https://www.illumos.org/issues/7504 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/405a5a0 Closes #5736	2017-02-04 09:21:25 -08:00
George Melikov	2e0e443ac4	OpenZFS 7247 - zfs receive of deduplicated stream fails Authored by: Chris Williamson <chris.williamson@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Ported-by: George Melikov <mail@gmelikov.ru> OpenZFS-issue: https://www.illumos.org/issues/7247 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/2ad25b4 Closes #5689 Porting notes: - tests/zfs-tests/tests/functional/cli_root/zfs_receive/zfs_receive_013_pos.ksh renamed as zfs_receive_015_pos.ksh, zfs_receive_013_pos.ksh is now used for OpenZFS test. - libzfs_sendrecv.c: SMALLEST_POSSIBLE_MAX_DDT_MB is always used for all 32-bit builds.	2017-02-04 09:10:24 -08:00
George Melikov	9b7b9cd370	OpenZFS 1300 - filename normalization doesn't work for removes Authored by: Kevin Crowe <kevin.crowe@nexenta.com> Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Ported-by: George Melikov <mail@gmelikov.ru> OpenZFS-issue: https://www.illumos.org/issues/1300 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/8f1750d Closes #5725 Porting notes: - zap_micro.c: all `MT_EXACT` are replaced by `0`	2017-02-02 14:13:41 -08:00
Chunwei Chen	c7af63d62a	Fix write(2) returns zero bug from `933ec99` For generic_write_checks with 2 args, we can exit when it returns zero because it means count is zero. However this is not the case for generic_write_checks with 4 args, where zero means no error. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Haakan T Johansson <f96hajo@chalmers.se> Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Closes #5720 Closes #5726	2017-02-02 09:43:42 -08:00
Giuseppe Di Natale	fc386db191	Remove lint ifdef checks in zdb and dbuf This is effectively dead code for the Linux implementation which can be removed to improve readability. We want to linter to check the real production/debug build as much as possible. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov> Closes #5722	2017-02-01 16:47:04 -08:00
George Melikov	0f676dc228	OpenZFS 7072 - zfs fails to expand if lun added when os is in shutdown state Authored by: George Wilson <george.wilson@delphix.com> Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Alex Reece <alex@delphix.com> Approved by: Dan McDonald <danmcd@omniti.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Ported-by: George Melikov <mail@gmelikov.ru> OpenZFS-issue: https://www.illumos.org/issues/7072 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/c39a2aa Closes #5694 Porting notes: - vdev.c: 'vdev_get_stats' changes are moved to 'vdev_get_stats_ex'. - vdev_disk.c: ignored, Linux specific code is different.	2017-02-01 13:14:02 -08:00
David Quigley	2fe36b0bfb	Use fletcher_4 routines natively with `abd_iterate_func()` This patch adds the necessary infrastructure for ABD to make use of the vectorized fletcher 4 routines. - export ABD compatible interface from fletcher_4 - add ABD fletcher_4 tests for data and metadata ABD types. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Original-patch-by: Gvozden Neskovic <neskovic@gmail.com> Signed-off-by: David Quigley <david.quigley@intel.com> Closes #5589	2017-02-01 09:34:22 -08:00
George Melikov	539d33c791	OpenZFS 6569 - large file delete can starve out write ops Authored by: Alek Pinchuk <alek@nexenta.com> Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Ported-by: George Melikov <mail@gmelikov.ru> Tested-by: kernelOfTruth <kerneloftruth@gmail.com> OpenZFS-issue: https://www.illumos.org/issues/6569 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/1bf4b6f2 Closes #5706	2017-01-31 14:44:03 -08:00
Giuseppe Di Natale	d69a321e56	OpenZFS 7545 - zdb should disable reference tracking Authored by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Steve Gonczi <steve.gonczi@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov> Porting Notes: Moved reference_tracking_enable and reference_history outside of ZFS_DEBUG. OpenZFS-issue: https://www.illumos.org/issues/7545 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/4dd77f9 Closes #5701	2017-01-31 14:36:35 -08:00
Tim Chase	b81a3ddc32	Update deadman operation to better align with upstream OpenZFS The deadman in ZoL didn't behave quite as it did in upstream OpenZFS. In addition to the 2 purposes for which OpenZFS used the zfs_deadman_synctime_ms parameter, ZoL also used it to determine how frequently the deadman would fire once it has been triggered. This patch adds the zfs_deadman_checktime_ms parameter to control how frequently the subsequent checks are performed. The deadman is now disabled for suspended pools. As had been the case, unlike upstream OpenZFS, ZoL will not panic when a hung IO is detected. The module parameter documentation has been upated to include the new parameter and to better describe the operation of the deadmen. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Tim Chase <tim@chase2k.com> Closes #5695	2017-01-31 14:19:08 -08:00
George Melikov	005e27e3b3	OpenZFS 7019 - zfsdev_ioctl skips secpolicy when FKIOCTL is set Authored by: Alex Wilson <alex.wilson@joyent.com> Reviewed by: Robert Mustacchi <rm@joyent.com> Reviewed by: Richard Lowe <richlowe@richlowe.net> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Dan McDonald <danmcd@omniti.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Ported-by: George Melikov <mail@gmelikov.ru> OpenZFS-issue: https://www.illumos.org/issues/7019 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/45b1747 Closes #5709	2017-01-31 10:24:23 -08:00
George Melikov	6325e48f95	OpenZFS 7136 - ESC_VDEV_REMOVE_AUX ought to always include vdev information Authored by: Alan Somers <asomers@gmail.com> 7115 6922 generates ESC_ZFS_VDEV_REMOVE_AUX a bit too often Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net> Approved by: Robert Mustacchi <rm@joyent.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Ported-by: George Melikov <mail@gmelikov.ru> OpenZFS-issue: https://www.illumos.org/issues/7136 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/b72b6bb Closes #5691 Porting notes: - Functionally this patch behaves the same as the OpenZFS version but it was adapted because because ZoL doesn't have the same illumos sysevent_t infrastructure and functionality.	2017-01-31 10:19:36 -08:00
George Melikov	41425f79da	OpenZFS 7490 - real checksum errors are silenced when zinject is on Authored by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Ported-by: George Melikov <mail@gmelikov.ru> OpenZFS-issue: https://www.illumos.org/issues/7490 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/6cedfc3 Closes #5693	2017-01-30 17:12:58 -08:00
George Melikov	e2da829cc1	OpenZFS 6922 - Emit ESC_ZFS_VDEV_REMOVE_AUX after removing an aux device Authored by: Alan Somers <asomers@gmail.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Dan McDonald <danmcd@omniti.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Ported-by: George Melikov <mail@gmelikov.ru> OpenZFS-issue: https://www.illumos.org/issues/6922 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/63364b0 Closes #5690	2017-01-30 15:33:46 -08:00
George Melikov	fa603f8233	OpenZFS 7277 - zdb should be able to print zfs_dbgmsg's Porting notes: - 'zfs_dbgmsg_print()' reintroduced to userspace. Authored by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com> Approved by: Dan McDonald <danmcd@omniti.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Ported-by: George Melikov <mail@gmelikov.ru> OpenZFS-issue: https://www.illumos.org/issues/7277 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/29bdd2f Closes #5684	2017-01-28 12:16:43 -08:00
Brian Behlendorf	a32494d22a	Fix suspend Godfather I/Os io_reexecute bits After resuming a pool the godfather zio could have both the ZIO_REEXECUTE_NOW and ZIO_REEXECUTE_SUSPEND bits set. This can occur if some child zios set ZIO_REEXECUTE_NOW while other set ZIO_REEXECUTE_SUSPEND. The godfather zio can inherit both flags in zio_notify_parent(). The child zios which assigned the ZIO_REEXECUTE_SUSPEND flag will be removed from the godfather's child list and added to the spa->spa_suspend_zio_root child list. While child zios with the ZIO_REEXECUTE_NOW bit set remain being monitored by the godfather zio. When the godfather zio executes zio_done() the presence of the ZIO_REEXECUTE_SUSPEND bit results in all io_reexecute being cleared. These child zios will then not be re-executed and instead will be destroyed and lost. The most straight forward way to address this situation is to only clear the ZIO_REEXECUTE_SUSPEND bit and leave the ZIO_REEXECUTE_NOW bit set. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: yuxiang <guo.yong33@zte.com.cn>	2017-01-28 12:13:34 -08:00
George Melikov	721ed0ee86	OpenZFS 7580 - ztest failure in dbuf_read_impl Authored by: George Wilson <george.wilson@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Steve Gonczi <steve.gonczi@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Ported-by: George Melikov <mail@gmelikov.ru> OpenZFS-issue: https://www.illumos.org/issues/7580 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/3105d95 Closes #5678	2017-01-28 12:11:09 -08:00
George Melikov	a08abc1bb3	OpenZFS 7301 - zpool export -f should be able to interrupt file freeing Authored by: Alek Pinchuk <alek@nexenta.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com> Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com> Reviewed by: John Kennedy <john.kennedy@delphix.com> Approved by: Gordon Ross <gordon.ross@nexenta.com> Reviewed-by: Tim Chase <tim@chase2k.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Ported-by: George Melikov <mail@gmelikov.ru> OpenZFS-issue: https://www.illumos.org/issues/7301 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/eb72182 Closes #5680	2017-01-27 11:46:39 -08:00
George Melikov	cc9bb3e58e	OpenZFS 7254 - ztest failed assertion in ztest_dataset_dirobj_verify: dirobjs + 1 == usedobjs Authored by: Paul Dagnelie <pcd@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Steve Gonczi <steve.gonczi@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Ported-by: George Melikov <mail@gmelikov.ru> OpenZFS-issue: https://www.illumos.org/issues/7254 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/c166b69 Closes #5670	2017-01-27 11:43:42 -08:00
Chunwei Chen	933ec99951	Retire .write/.read file operations The .write/.read file operations callbacks can be retired since support for .read_iter/.write_iter and .aio_read/.aio_write has been added. The vfs_write()/vfs_read() entry functions will select the correct interface for the kernel. This is desirable because all VFS write/read operations now rely on common code. This change also add the generic write checks to make sure that ulimits are enforced correctly on write. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Closes #5587 Closes #5673	2017-01-27 10:43:39 -08:00
Brian Behlendorf	986dd8aacc	OpenZFS 5561 - support root pools on EFI/GPT partitioned disks Reviewed by: Jean McCormack <jean.mccormack@nexenta.com> Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com> Approved by: Dan McDonald <danmcd@omniti.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> OpenZFS-issue: https://www.illumos.org/issues/5561 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/1a902ef Closes #5672	2017-01-27 10:40:02 -08:00
Tim Chase	258553d3d7	OpenZFS 7613 - ms_freetree[4] is only used in syncing context metaslab_t:ms_freetree[TXG_SIZE] is only used in syncing context. We should replace it with two trees: the freeing tree (ranges that we are freeing this syncing txg) and the freed tree (ranges which have been freed this txg). Authored by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Alex Reece <alex@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Ported-by: Tim Chase <tim@chase2k.com> OpenZFS-issue: https://www.illumos.org/issues/7613 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/a8698da2 Closes #5598	2017-01-26 15:27:19 -08:00
George Melikov	9c9531cb6f	OpenZFS 7500 - Simplify dbuf_free_range by removing dn_unlisted_l0_blkid Authored by: Stephen Blinick <stephen.blinick@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Approved by: Gordon Ross <gordon.w.ross@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Ported-by: George Melikov <mail@gmelikov.ru> OpenZFS-issue: https://www.illumos.org/issues/7500 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/653af1b Closes #5639	2017-01-26 15:15:48 -08:00
George Melikov	39efbde7c5	OpenZFS 6676 - Race between unique_insert() and unique_remove() causes ZFS fsid change Authored by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com> Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com> Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com> Reviewed by: Dan Vatca <dan.vatca@gmail.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Sebastien Roy <sebastien.roy@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Ported-by: George Melikov <mail@gmelikov.ru> OpenZFS-issue: https://www.illumos.org/issues/6676 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/40510e8 Closes #5667	2017-01-26 14:43:28 -08:00
George Melikov	aeacdefedc	OpenZFS 7386 - zfs get does not work properly with bookmarks Authored by: Marcel Telka <marcel@telka.sk> Reviewed by: Simon Klinkert <simon.klinkert@gmail.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Approved by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Ported-by: George Melikov <mail@gmelikov.ru> OpenZFS-issue: https://www.illumos.org/issues/7386 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/edb901a Closes #5666	2017-01-26 14:42:15 -08:00
George Melikov	1149ba6478	OpenZFS 7606 - dmu_objset_find_dp() takes a long time while importing pool When importing a pool with a large number of filesystems within the same parent filesystem, we see that dmu_objset_find_dp() takes a long time. It is called from 3 places: spa_check_logs(), spa_ld_claim_log_blocks(), and spa_load_verify(). There are several ways to improve performance here: 1. We don't really need to do spa_check_logs() or spa_ld_claim_log_blocks() if the pool was closed cleanly. 2. spa_load_verify() uses dmu_objset_find_dp() to check that no datasets have too long of names. 3. dmu_objset_find_dp() is slow because it's doing zap_value_search() (which is O(N sibling datasets)) to determine the name of each dsl_dir when it's opened. In this case we actually know the name when we are opening it, so we can provide it and avoid the lookup. This change implements fix #3 from the above list; i.e. make dmu_objset_find_dp() provide the name of the dataset so that we don't have to search for it. Authored by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Steve Gonczi <steve.gonczi@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Prashanth Sreenivasa <prashksp@gmail.com> Reviewed-by: David Quigley <david.quigley@intel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Ported-by: George Melikov <mail@gmelikov.ru> OpenZFS-issue: https://www.illumos.org/issues/7606 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/cac6bab Closes #5662	2017-01-26 12:46:02 -08:00
George Melikov	ec923db25c	OpenZFS 7180 - potential race between zfs_suspend_fs+zfs_resume_fs and zfs_ioc_rename Authored by: Andriy Gapon <andriy.gapon@clusterhq.com> Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Ported-by: George Melikov <mail@gmelikov.ru> OpenZFS-issue: https://www.illumos.org/issues/7180 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/690041b Closes #5627	2017-01-23 10:53:46 -08:00
George Melikov	cffd6e1167	OpenZFS 3746 - ZRLs are racy Authored by: Will Andrews <will@freebsd.org> Reviewed by: Boris Protopopov <bprotopopov@hotmail.com> Reviewed by: Pavel Zakharov <pavel.zakha@gmail.com> Reviewed by: Yuri Pankov <yuri.pankov@gmail.com> Reviewed by: Justin T. Gibbs <gibbs@scsiguy.com> Approved by: Matt Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Ported-by: George Melikov <mail@gmelikov.ru> OpenZFS-issue: https://www.illumos.org/issues/3746 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/260af64 Closes #5625	2017-01-23 10:35:58 -08:00
George Melikov	911c41af2d	OpenZFS 7304 - zfs filesystem/snapshot counts should be read-only Authored by: Jerry Jelinek <jerry.jelinek@joyent.com> Reviewed by: Garrett D'Amore <garrett@damore.org> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Dan McDonald <danmcd@omniti.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Ported-by: George Melikov <mail@gmelikov.ru> OpenZFS-issue: https://www.illumos.org/issues/7304 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/007a6c1 Closes #5624	2017-01-23 10:17:35 -08:00

1 2 3 4 5 ...

1549 Commits