Commit Graph

1336 Commits

Author SHA1 Message Date
George Melikov
34a6b42844 OpenZFS 7659 - Missing thread_exit() in dmu_send.c
Two threads send_traverse_thread() and receive_writer_thread() should
end with thread_exit();

Mostly a cosmetic issue under IllumOS.

Authored by: Jorgen Lundman <lundman@lundman.net>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: George Melikov <mail@gmelikov.ru>

OpenZFS-issue: https://www.illumos.org/issues/7659
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/a569268
Closes #5603
2017-01-18 15:10:35 -08:00
George Melikov
7330fc57b7 OpenZFS 7235 - remove unused func dsl_dataset_set_blkptr
Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Alex Reece <alex@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: George Melikov <mail@gmelikov.ru>

OpenZFS-issue: https://www.illumos.org/issues/7235
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/bd56f80
Closes #5604
2017-01-17 15:22:56 -08:00
George Melikov
61ca48ff38 OpenZFS 7256 - low probability race in zfs_get_data
Authored by: Andriy Gapon <andriy.gapon@clusterhq.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: George Melikov <mail@gmelikov.ru>

OpenZFS-issue: https://www.illumos.org/issues/7256
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/6ed18a8
Closes #5601
2017-01-17 15:18:59 -08:00
George Melikov
e88551d52f OpenZFS 7071 - lzc_snapshot does not fill in errlist on ENOENT
Authored by: Igor Kozhukhov ikozhukhov@gmail.com
Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: George Melikov <mail@gmelikov.ru>

OpenZFS-issue: https://www.illumos.org/issues/7071
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/25f7d99
Closes #5597
2017-01-17 14:52:17 -08:00
George Melikov
cf7d1484bf OpenZFS 7082 - bptree_iterate() passes wrong args to zfs_dbgmsg()
Authored by: Igor Kozhukhov <ikozhukhov@gmail.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Ported-by: George Melikov <mail@gmelikov.ru>

OpenZFS-issue: https://www.illumos.org/issues/7082
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/10e67aa
Closes #5596
2017-01-17 14:49:24 -08:00
Brian Behlendorf
832805d951 OpenZFS 6586 - Whitespace inconsistencies in the spa feature dependency arrays in zfeature_common.c
Porting Notes:
- Preserved 'static const spa_feature_t hole_birth_deps[]'.

Authored by: ilovezfs <ilovezfs@icloud.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Richard Laager <rlaager@wiktel.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>

OpenZFS-issue: https://www.illumos.org/issues/6586
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/22b6687
Closes #5592
2017-01-17 14:46:28 -08:00
LOLi
08f0510d87 Fix unallocated object detection for large_dnode datasets
Fix dmu_object_next() to correctly handle unallocated objects on
large_dnode datasets.

We implement this by scanning the dnode block until we find the correct
offset to be used in dnode_next_offset(). This is necessary because we
can't assume *objectp is a hole even if dmu_object_info() returns
ENOENT.

This fixes a couple of issues with zfs receive on large_dnode datasets.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ned Bass <bass6@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #5027 
Closes #5532
2017-01-13 15:47:34 -08:00
Brian Behlendorf
5043684ae5 OpenZFS 7603 - xuio_stat_wbuf_* should be declared (void)
Porting Notes:
- include/sys/dmu.h prototypes were already updated in 0bc8fd7

Authored by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Robert Mustacchi <rm@joyent.com>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Approved by: Richard Lowe <richlowe@richlowe.net>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>

OpenZFS-issue: https://www.illumos.org/issues/7603
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/99aa8b5
Closes #5586
2017-01-13 15:33:14 -08:00
Brian Behlendorf
9775e98844 OpenZFS 7181 - race between zfs_mount and zfs_ioc_rollback
Authored by: Andriy Gapon <andriy.gapon@clusterhq.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Gordon Ross <gordon.w.ross@gmail.com>
Reviewed-by: Richard Yao <ryao@gentoo.org>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>

OpenZFS-issue: https://www.illumos.org/issues/7181
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/90f2c09
Closes #5585
2017-01-13 15:29:32 -08:00
bzzz77
0eef1bde31 Add *_by-dnode routines
Add *_by_dnode() routines for accessing objects given their
dnode_t *, this is more efficient than accessing the object by 
(objset_t *, uint64_t object).  This change converts some but
not all of the existing consumers.  As performance-sensitive
code paths are discovered they should be converted to use
these routines.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Closes #5534 
Issue #4802
2017-01-13 14:58:41 -08:00
Don Brady
38640550f2 OpenZFS 7743 - per-vdev-zaps init path for upgrade
Authored by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Don Brady <don.brady@intel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Joe Stein <jas14@cs.brown.edu>
Ported-by: Don Brady <don.brady@intel.com>

When loading a pool that had been created before the existance of
per-vdev zaps, on a system that knows about per-vdev zaps, the
per-vdev zaps will not be allocated and initialized.

This appears to be because the logic that would have done so, in
spa_sync_config_object(), is not reached under normal operation. It is
only reached if spa_config_dirty_list is non-empty.

The fix is to add another `AVZ_ACTION_` enum that will allow this code
to be reached when we detect that we're loading an old pool, even when
there are no dirty configs.

OpenZFS-issue: https://www.illumos.org/issues/7743
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/e2d29d0
Closes #5582
2017-01-13 13:50:22 -08:00
George Melikov
0bc63d83f6 OpenZFS 6603 - zfeature_register() should verify ZFEATURE_FLAG_PER_DATASET implies SPA_FEATURE_EXTENSIBLE_DATASET
Authored by: ilovezfs <ilovezfs@icloud.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Richard Laager <rlaager@wiktel.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: George Melikov <mail@gmelikov.ru>

OpenZFS-issue: https://www.illumos.org/issues/6603
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/0803e91
Closes #5573
2017-01-12 11:58:04 -08:00
Don Brady
4e21fd060a OpenZFS 7303 - dynamic metaslab selection
This change introduces a new weighting algorithm to improve
metaslab selection. The new weighting algorithm relies on the
SPACEMAP_HISTOGRAM feature. As a result, the metaslab weight
now encodes the type of weighting algorithm used (size-based
vs segment-based).

Porting Notes: The metaslab allocation tracing code is conditionally
removed on linux (dependent on mdb debugger).

Authored by: George Wilson <george.wilson@delphix.com>
Reviewed by: Alex Reece <alex@delphix.com>
Reviewed by: Chris Siden <christopher.siden@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <paul.dagnelie@delphix.com>
Reviewed by: Pavel Zakharov pavel.zakharov@delphix.com
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Don Brady <don.brady@intel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: Don Brady <don.brady@intel.com>

OpenZFS-issue: https://www.illumos.org/issues/7303
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/d5190931bd
Closes #5404
2017-01-12 11:52:56 -08:00
George Melikov
e9aa730c49 OpenZFS 6328 - Fix cstyle errors in zfs codebase
Authored by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Alex Reece <alex@delphix.com>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed by: Jorgen Lundman <lundman@lundman.net>
Approved by: Robert Mustacchi <rm@joyent.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: George Melikov <mail@gmelikov.ru>

OpenZFS-issue: https://www.illumos.org/issues/6328
OpenZFS-commit: https://github.com/illumos/illumos-gate/commit/9a686fb
Closes #5579
2017-01-12 09:42:11 -08:00
ka7
4e33ba4c38 Fix spelling
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Haakan T Johansson <f96hajo@chalmers.se>
Closes #5547 
Closes #5543
2017-01-03 11:31:18 -06:00
Tim Chase
a5e046eaac 4.10 compat - BIO flag changes and others
[bio] The req_op enum was changed to req_opf.  Update the "Linux 4.8 API"
autotools checks to use an int to determine whether the various REQ_OP
values are defined.  This should work properly on kernels >= 4.8.

[bio] bio_set_op_attrs() is now an inline function and can't be detected
with #ifdef.  Add a configure check to determine whether bio_set_op_attrs()
is defined.  Move the local definition of it from vdev_disk.c to
blkdev_compat.h for consistency with other related compability shims.

[bio] The read/write flags and their modifiers, including WRITE_FLUSH,
WRITE_FUA and WRITE_FLUSH_FUA have been removed from fs.h.  Add the new
bio_set_flush() compatibility wrapper to replace VDEV_WRITE_FLUSH_FUA
and set the flags appropriately for each supported kernel version.

[vfs] The generic_readlink() function has been made static.  If .readlink
in inode_operations is NULL, generic_readlink() is used.

[zol typo] Completely unrelated to 4.10 compat, fix a typo in the check
for REQ_OP_SECURE_ERASE so that the proper macro is defined:

    s/HAVE_REQ_OP_SECURE_DISCARD/HAVE_REQ_OP_SECURE_ERASE/

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Tim Chase <tim@chase2k.com>
Closes #5499
2016-12-30 16:03:59 -06:00
LOLi
3500a14595 Don't persist temporary pool name on devices
Fix a regression accidentally introduced by e0ab3ab.

Additionally, add a new script zpool_import_014_pos.ksh to
the ZFS test suite to exercise 'zpool import -t' functionality.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #5466 
Closes #5515
2016-12-22 10:39:00 -08:00
Chunwei Chen
da8f51e16a Use a dedicated taskq for vdev_file
The introduction of parallel zvol prefetch causes deadlock when using
vdev_file.

spa_async->(spa_namespace_lock)->txg_wait_synced->(wait for txg_sync)
txg_sync->zio_wait->(wait for vdev_file_io_fsync on system_taskq)
zvol_prefetch_minors_impl (on system_taskq)->spa_open_common->(wait for spa_namespace_lock)

We fix this by using dedicated taskq for vdev_file.  This same change
was originally made in commit bc25c93 but reverted in commit aa9af22
when dynamic taskqs were added.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Closes #5506 
Closes #5495
2016-12-21 10:47:15 -08:00
LOLi
5f1346c299 Fix dsl_props_set_sync_impl to work with nested nvlist
When iterating over the input nvlist in dsl_props_set_sync_impl() when we don't
preserve the nvpair name before looking up ZPROP_VALUE, so when we later go to
process it nvpair_name() is always "value" and not the actual property name.

This fixes a couple of bugs in zfs_ioc_recv():
* Received properties were not restored correctly when failing to receive an
incremental send stream
* Received properties were not completely replaced by the new ones when
successfully receiving an incremental send stream

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #5497
2016-12-20 18:46:59 -08:00
Brian Behlendorf
a3823f428d Fix file attributes
This branch contains the following fixes/improvements.

* Fix setting i_flags
* Fix wrong operator in xvattr.h
* Fix fchange macro in zpl_ioctl_setflags()
* Added configure check to use inode_set_flags()
* Added a test case for chattr for better test coverage

Reviewed-by: Tim Chase <tim@chase2k.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes #5486 
Closes #5470 
Closes #5469
2016-12-19 13:01:10 -08:00
Chunwei Chen
6c01a4af2b Fix zmo leak when zfs_sb_create fails
zfs_sb_create would normally takes ownership of zmo, and it will be freed in
zfs_sb_free. However, when zfs_sb_create fails we need to explicit free it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes #5490 
Closes #5496
2016-12-19 09:46:29 -08:00
Chunwei Chen
a5248129b8 Use inode_set_flags when available
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
2016-12-16 13:54:51 -08:00
Chunwei Chen
c360af5411 Fix fchange in zpl_ioctl_setflags
The fchange in zpl_ioctl_setflags was for detecting flag change. However it
was incorrect and would always fail to detect a flag change from set to unset,
causing users without CAP_LINUX_IMMUTABLE to be able to unset flags.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
2016-12-16 12:46:46 -08:00
Gvozden Neskovic
0101796290 ABD: Adapt avx512bw raidz assembly
Adapt avx512bw implementation for use with abd buffers. Mul2 implementation
is rewritten to take advantage of the BW instruction set.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Romain Dolbeau <romain.dolbeau@atos.net>
Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
Closes #5477
2016-12-15 17:31:33 -08:00
Chunwei Chen
7bb1325f95 Fix i_flags issue caused by 64c688d
Fix zfs_xvattr_set to set S_IMMUTABLE and S_APPEND flags correctly.

Reinstate zfs_set_inode_flags and use it when zfs_xvatter_set and also when
setting up inode in zfs_znode_alloc and zfs_rezget.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
2016-12-14 14:48:09 -08:00
Chunwei Chen
f2d8bdc62e Add ida_destroy in zvol_fini to fix memleak
User of ida needs to call ida_destroy after using it. Otherwise
ida->free_bitmap and/or other stuff may leak.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes #5484
2016-12-14 09:41:39 -08:00
bunder2015
f974e25dc1 Fix typos in dbuf.c
This removes two large whitespaces in "modinfo zfs" as well as correcting
a couple typos.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: bunder2015 <omfgbunder@gmail.com>
Closes #5475
2016-12-13 14:21:02 -08:00
Brian Behlendorf
02730c333c Use cstyle -cpP in make cstyle check
Enable picky cstyle checks and resolve the new warnings.  The vast
majority of the changes needed were to handle minor issues with
whitespace formatting.  This patch contains no functional changes.

Non-whitespace changes are as follows:

* 8 times ; to { } in for/while loop
* fix missing ; in cmd/zed/agents/zfs_diagnosis.c
* comment (confim -> confirm)
* change endline , to ; in cmd/zpool/zpool_main.c
* a number of /* BEGIN CSTYLED */ /* END CSTYLED */ blocks
* /* CSTYLED */ markers
* change == 0 to !
* ulong to unsigned long in module/zfs/dsl_scan.c
* rearrangement of module_param lines in module/zfs/metaslab.c
* add { } block around statement after for_each_online_node

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Håkan Johansson <f96hajo@chalmers.se>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5465
2016-12-12 10:46:26 -08:00
Chunwei Chen
a806cb6a89 Don't count '@' for dataset namelen if not a snapshot
Don't count '@' for dataset namelen if not a snapshot.  This
fixes making a pool unimportable when the  dataset namelen
is 255.

Add test file for zfs create name length 255.

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes #5432 
Closes #5456
2016-12-09 11:52:08 -07:00
luozhengzheng
c077090a9b Fix coverity defects: CID 154617
CID 154617: Memory - illegal accesses (UNINIT)

The value here just needs to be initialized to make Coverity happy.
When dsize == 0, then value of daiter.iter_mapaddr is irrelevant. That
address won't be accessed, it's only used for some arithmetic. dsize
can be zero either if dabd is null, or if code column is longer than the
current data column.

Reviewed-by: Gvozden Neskovic <neskovic@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: luozhengzheng <luo.zhengzheng@zte.com.cn>
Closes #5437
2016-12-08 14:48:09 -07:00
Brian Behlendorf
f95e647891 Speed up zvol import and export speed
Speed up import and export speed by:

* Add system delay taskq
* Parallel prefetch zvol dnodes during zvol_create_minors
* Parallel zvol_free during zvol_remove_minors
* Reduce list linear search using ida and hash

Reviewed-by: Boris Protopopov <boris.protopopov@actifio.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes #5433
2016-12-08 14:05:02 -07:00
Brian Behlendorf
27f2b90d3e Revert "Disable zio_dva_throttle_enabled by default"
Enable zio_dva_throttle_enabled=1 by default. Subsequent
testing has been unable to reproduce the suspected regression.

Tested-by: kernelOfTruth kerneloftruth@gmail.com
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Signed-off-by: Brian Behlendorf behlendorf1@llnl.gov
Reverts #5335
Closes #5289
Closes #5457
2016-12-08 13:57:42 -07:00
Gvozden Neskovic
e8a2014436 Cache ddt_get_dedup_dspace() value if there was no ddt changes
Save and reuse ddt dspace calculation when there have been no ddt changes.
This avoids unnecessary traversal of 168KiB of ddt histograms.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
Closes #5425
2016-12-02 16:59:35 -07:00
Brian Behlendorf
baf67d15a5 Refactor txg history kstat
It was observed that even when the txg history is disabled by
setting `zfs_txg_history=0` the txg_sync thread still fetches
the vdev stats unnecessarily.

This patch refactors the code such that vdev_get_stats() is no
longer called when `zfs_txg_history=0`.  And it further reduces
the  differences between upstream and the ZoL txg_sync_thread()
function.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5412
2016-12-02 16:57:49 -07:00
Chunwei Chen
899662e344 zvol_remove_minors do parallel zvol_free
On some kernel version, blk_cleanup_queue and put_disk will wait for more then
10ms. So a pool with a lot of zvols will easily wait for more then 1 min if we
do zvol_free sequentially.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Requires-spl: refs/pull/588/head
2016-12-02 11:13:35 -08:00
Chunwei Chen
7ac557cef1 zpool_create_minors parallel prefetch
Do parallel prefetch all zvol dnodes before actually creating each individual.
This will greatly reduce the import time when having a lot of zvols and disk
is slow.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
2016-12-02 11:13:35 -08:00
Brian Behlendorf
b0319c1faa OpenZFS 7143 - dbuf_read() creates unnecessary zio_root() for bonus buf
dbuf_read() creates a zio_root() to track and wait for all the
zio's that may happen as part of this call. However, if the blkptr_t
for this buffer is NULL or a hole, we will not create any more zio's,
so this zio_root() is unnecessary. This is always the case when calling
dbuf_read() on a bonus buffer, because it has no blkptr (it's part of
the containing dnode). For workloads that read a lot of bonus buffers
(e.g. file creation and removal), creating and destroying these
unnecessary zio's can decrease performance by around 3%.

The fix is to only create/destroy the zio_root() in dbuf_read() if
the blkptr is not NULL and not a hole.

Changes sponsored by Intel Corp.

Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by:  Alex Zhuravlev <alexey.zhuravlev@intel.com>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs/openzfs#137
Closes #4803 
Closes #5382
2016-12-01 16:50:11 -07:00
luozhengzheng
ba712624d6 Fix incorrect operator in abd_alloc_sametype()
This should be & and not | so is_metadata is set correctly.

Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: luozhengzheng <luo.zhengzheng@zte.com.cn>
Closes #5438
2016-12-01 16:45:16 -07:00
cao
e2c7d3785a Remove unused sa_update_from_cb()
It looks like this was functionality which was added in the
original SA implementation and then never needed.  It can
be safely removed now and easily added back if we find a
use for it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: cao.xuewen <cao.xuewen@zte.com.cn>
Closes #5440
2016-12-01 16:39:06 -07:00
cao
6a8fd57fa7 Compile zio.h and zio_impl.h mutual include
zio.h includes zio_impl.h but zio_impl.h also includes zio.h, so the
header files to contain each other.  Get rid of the zio_impl.h include
in zio.h and update zio_inject.c to include zio.h instead of zio_impl.h.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: cao.xuewen <cao.xuewen@zte.com.cn>
Closes #5439
2016-12-01 16:36:25 -07:00
Chunwei Chen
d45e010dce zvol: reduce linear list search
Use kernel ida to generate minor number, and use hash table to find zvol with
name.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
2016-12-01 14:53:11 -08:00
Chunwei Chen
57ddcda164 Use system_delay_taskq for long delay tasks
Use it for spa_deadman, zpl_posix_acl_free, snapentry_expire.
This free system_taskq from the above long delay tasks, and allow us to do
taskq_wait_outstanding on system_taskq without being blocked forever, making
system_taskq more generic and useful.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
2016-12-01 14:52:48 -08:00
Brian Behlendorf
a3fd9d9e15 Convert zio_buf_alloc() consumers
In multiple cases zio_buf_alloc() was used instead of kmem_alloc()
or vmem_alloc().  This was often done because the allocations
could be large and it was easy to use zfs_buf_alloc() for them.

But this isn't ideal for allocations which are small or short
lived.  In these cases it is better to use kmem_alloc() or
vmem_alloc().  If possible we want to avoid the case where
we have slabs allocated for kmem caches which are rarely used.

Note for small allocations vmem_alloc() will be internally
converted to kmem_alloc().  Therefore as long as large
allocations are infrequent and short lived the penalty for
using vmem_alloc() is small.

Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5409
2016-11-30 16:18:20 -07:00
Chunwei Chen
9829574834 ABD optimized page allocation code
* Convert ABD to use the Linux Kernel scatterlist implementation
  instead of the hand rolled one from illumos.

* Scatter ABDs are preferentially populated with higher order
  compound pages from a single zone.  Allocation size is
  progressively decreased until it can be satisfied without
  performing reclaim or compaction.

* An alternate page allocator is provided for kernels older
  than 3.6 and for CONFIG_HIGHMEM systems.  This allocator
  is designed as a fallback for maximum compatibility.

* Extended abdstats to provide visibility in the the allocator.

* Add cached value for PAGESIZE in userspace.

Contributions-by:
Chunwei Chen <david.chen@osnexus.com>
Gvozden Neskovic <neskovic@gmail.com>
Jinshan Xiong <jinshan.xiong@intel.com>
Isaac Huang <he.huang@intel.com>
David Quigley <david.quigley@intel.com>
Brian Behlendorf <behlendorf1@llnl.gov>
2016-11-29 14:34:33 -08:00
Chunwei Chen
4f60152910 ABD kmap to kmap_atomic
Convert usage of kmap to kmap_atomic while correctly saving off
irq state.
2016-11-29 14:34:33 -08:00
Romain Dolbeau
88cc2352ea ABD raidz NEON support
Port NEON implementation of RAID-Z functions to ABD.

Signed-off-by: Roomain Dolbeau <romain.dolbeau@atos.net>
2016-11-29 14:34:33 -08:00
Gvozden Neskovic
65d71d4212 ABD raidz avx512f support
Implement shift based multiplication for 512f. Higher IPC over lookup based
methods yields up to 40% better performance on the current hardware.

Results on Xeon Phi(TM) CPU 7210:
implementation   gen_p           gen_pq          gen_pqr         rec_p           rec_q           rec_r           rec_pq          rec_pr          rec_qr          rec_pqr
original         142232671       24411492        12948205        283053705       22348167        4215911         9171609         2265548         2378370         1648495
scalar           295711162       49851491        33253815        293198109       88179448        61866752        27941684        25764416        17384442        12138153
sse2             410055998       199642658       117973654       406240463       152688682       121092250       84968180        79291076        47473657        20779719
ssse3            411641595       199669571       117937647       406211024       137638508       117050346       81263322        76120405        46281559        32696722
avx2             616485806       311515332       188595628       605455115       260602390       230554476       148198817       138800254       92273356        62937819
avx512f          832191523       408509425       253599522       810094481       404325734       317590971       218235687       197204920       133101937       94001219
fastest          avx512f         avx512f         avx512f         avx512f         avx512f         avx512f         avx512f         avx512f         avx512f         avx512f

Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
2016-11-29 14:34:33 -08:00
Gvozden Neskovic
cbf484f8ad ABD Vectorized raidz
Enable vectorized raidz code on ABD buffers.  The avx512f,
avx512bw, neon and aarch64_neonx2 are disabled in this commit.
With the exception of avx512bw these implementations are
updated for ABD in the subsequent commits.

Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
2016-11-29 14:34:33 -08:00
Gvozden Neskovic
a206522c4f ABD changes for vectorized RAIDZ
* userspace: aligned buffers. Minimum of 32B alignment is
  needed for AVX2. Kernel buffers are aligned 512B or more.
* add abd_get_offset_size() interface
* abd_iter_map(): fix calculation of iter_mapsize
* add abd_raidz_gen_iterate() and abd_raidz_rec_iterate()

Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
2016-11-29 14:34:33 -08:00
Isaac Huang
b0be93e81a ABD page support to vdev_disk.c
Signed-off-by: Isaac Huang <he.huang@intel.com>
2016-11-29 14:34:32 -08:00