freebsd-dev/module/zfs
smh 9f500936c8 FreeBSD r256956: Improve ZFS N-way mirror read performance by using load and locality information.
The existing algorithm selects a preferred leaf vdev based on offset of the zio
request modulo the number of members in the mirror. It assumes the devices are
of equal performance and that spreading the requests randomly over both drives
will be sufficient to saturate them. In practice this results in the leaf vdevs
being under utilized.

The new algorithm takes into the following additional factors:
* Load of the vdevs (number outstanding I/O requests)
* The locality of last queued I/O vs the new I/O request.

Within the locality calculation additional knowledge about the underlying vdev
is considered such as; is the device backing the vdev a rotating media device.

This results in performance increases across the board as well as significant
increases for predominantly streaming loads and for configurations which don't
have evenly performing devices.

The following are results from a setup with 3 Way Mirror with 2 x HD's and
1 x SSD from a basic test running multiple parrallel dd's.

With pre-fetch disabled (vfs.zfs.prefetch_disable=1):

== Stripe Balanced (default) ==
Read 15360MB using bs: 1048576, readers: 3, took 161 seconds @ 95 MB/s
== Load Balanced (zfslinux) ==
Read 15360MB using bs: 1048576, readers: 3, took 297 seconds @ 51 MB/s
== Load Balanced (locality freebsd) ==
Read 15360MB using bs: 1048576, readers: 3, took 54 seconds @ 284 MB/s

With pre-fetch enabled (vfs.zfs.prefetch_disable=0):

== Stripe Balanced (default) ==
Read 15360MB using bs: 1048576, readers: 3, took 91 seconds @ 168 MB/s
== Load Balanced (zfslinux) ==
Read 15360MB using bs: 1048576, readers: 3, took 108 seconds @ 142 MB/s
== Load Balanced (locality freebsd) ==
Read 15360MB using bs: 1048576, readers: 3, took 48 seconds @ 320 MB/s

In addition to the performance changes the code was also restructured, with
the help of Justin Gibbs, to provide a more logical flow which also ensures
vdevs loads are only calculated from the set of valid candidates.

The following additional sysctls where added to allow the administrator
to tune the behaviour of the load algorithm:
* vfs.zfs.vdev.mirror.rotating_inc
* vfs.zfs.vdev.mirror.rotating_seek_inc
* vfs.zfs.vdev.mirror.rotating_seek_offset
* vfs.zfs.vdev.mirror.non_rotating_inc
* vfs.zfs.vdev.mirror.non_rotating_seek_inc

These changes where based on work started by the zfsonlinux developers:
https://github.com/zfsonlinux/zfs/pull/1487

Reviewed by:	gibbs, mav, will
MFC after:	2 weeks
Sponsored by:	Multiplay

References:
  https://github.com/freebsd/freebsd@5c7a6f5d
  https://github.com/freebsd/freebsd@31b7f68d
  https://github.com/freebsd/freebsd@e186f564

Performance Testing:
  https://github.com/zfsonlinux/zfs/pull/4334#issuecomment-189057141

Porting notes:
- The tunables were adjusted to have ZoL-style names.
- The code was modified to use ZoL's vd_nonrot.
- Fixes were done to make cstyle.pl happy
- Merge conflicts were handled manually
- freebsd/freebsd@e186f564bc by my
  collegue Andriy Gapon has been included. It applied perfectly, but
  added a cstyle regression.
- This replaces 556011dbec entirely.
- A typo "IO'a" has been corrected to say "IO's"
- Descriptions of new tunables were added to man/man5/zfs-module-parameters.5.

Ported-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4334
2016-02-26 11:24:35 -08:00
..
arc.c Add l2arc_max_block_size tunable 2016-02-25 09:44:00 -08:00
blkptr.c Illumos 4757, 4913 2014-08-01 14:28:05 -07:00
bplist.c Change KM_PUSHPAGE -> KM_SLEEP 2015-01-16 14:41:26 -08:00
bpobj.c Illumos 5810 - zdb should print details of bpobj 2015-05-11 15:10:24 -07:00
bptree.c Illumos 5960, 5925 2016-01-08 15:08:19 -08:00
bqueue.c Allow 16M send/recv blocks 2016-01-08 20:23:23 -05:00
dbuf_stats.c Illumos 5497 - lock contention on arcs_mtx 2015-06-11 10:27:25 -07:00
dbuf.c Illumos 5045 - use atomic_{inc,dec}_* instead of atomic_add_* 2016-01-15 15:38:36 -08:00
ddt_zap.c Change KM_PUSHPAGE -> KM_SLEEP 2015-01-16 14:41:26 -08:00
ddt.c Handle zap_lookup() failure in ddt_object_load() 2015-08-19 14:32:50 -07:00
dmu_diff.c Illumos 5960, 5925 2016-01-08 15:08:19 -08:00
dmu_object.c Illumos 5960, 5925 2016-01-08 15:08:19 -08:00
dmu_objset.c Illumos 6495 - Fix mutex leak in dmu_objset_find_dp 2016-01-28 12:45:39 -05:00
dmu_send.c Illumos 5809 - Blowaway full receive in v1 pool causes kernel panic 2016-02-08 09:37:55 -08:00
dmu_traverse.c Illumos 5960, 5925 2016-01-08 15:08:19 -08:00
dmu_tx.c Illumos 4950 - files sometimes can't be removed from a full filesystem 2016-01-21 16:59:30 -08:00
dmu_zfetch.c Illumos 6281 - prefetching should apply to 1MB reads 2016-01-12 13:51:27 -08:00
dmu.c Illumos 4950 - files sometimes can't be removed from a full filesystem 2016-01-21 16:59:30 -08:00
dnode_sync.c Illumos 5960, 5925 2016-01-08 15:08:19 -08:00
dnode.c Illumos 5987 - zfs prefetch code needs work 2016-01-12 09:02:33 -08:00
dsl_bookmark.c Illumos 4951 - ZFS administrative commands should use reserved space 2015-05-04 09:41:10 -07:00
dsl_dataset.c Illumos 6171 - dsl_prop_unregister() slows down dataset eviction. 2016-01-12 10:53:12 -08:00
dsl_deadlist.c Handle damaged blk_birth in dsl_deadlist_insert() 2015-12-15 16:12:31 -08:00
dsl_deleg.c Illumos 4951 - ZFS administrative commands should use reserved space 2015-05-04 09:41:10 -07:00
dsl_destroy.c Illumos 5960, 5925 2016-01-08 15:08:19 -08:00
dsl_dir.c Illumos 6171 - dsl_prop_unregister() slows down dataset eviction. 2016-01-12 10:53:12 -08:00
dsl_pool.c Align thread priority with Linux defaults 2015-07-28 13:36:47 -07:00
dsl_prop.c Illumos 6171 - dsl_prop_unregister() slows down dataset eviction. 2016-01-12 10:53:12 -08:00
dsl_scan.c Illumos 6537 - Panic on zpool scrub with DEBUG kernel 2016-02-05 11:29:32 -08:00
dsl_synctask.c Illumos 4951 - ZFS administrative commands should use reserved space 2015-05-04 09:41:10 -07:00
dsl_userhold.c Illumos 4951 - ZFS administrative commands should use reserved space 2015-05-04 09:41:10 -07:00
fm.c Illumos 5045 - use atomic_{inc,dec}_* instead of atomic_add_* 2016-01-15 15:38:36 -08:00
gzip.c cstyle: Resolve C style issues 2013-12-18 16:46:35 -08:00
lz4.c Change KM_PUSHPAGE -> KM_SLEEP 2015-01-16 14:41:26 -08:00
lzjb.c Change KM_PUSHPAGE -> KM_SLEEP 2015-01-16 14:41:26 -08:00
Makefile.in Illumos 5960, 5925 2016-01-08 15:08:19 -08:00
metaslab.c Remove fastwrite mutex 2016-01-15 15:38:35 -08:00
multilist.c Identify locks flagged by lockdep 2015-12-22 10:21:33 -08:00
range_tree.c Illumos 5163 - arc should reap range_seg_cache 2015-06-25 08:58:16 -07:00
refcount.c Change KM_PUSHPAGE -> KM_SLEEP 2015-01-16 14:41:26 -08:00
rrwlock.c Illumos 5008 - lock contention (rrw_exit) while running a read only load 2015-07-06 09:34:13 -07:00
sa.c Prevent SA length overflow 2015-12-30 13:20:12 -08:00
sha256.c Add linux sha2 support 2010-08-31 13:41:59 -07:00
spa_boot.c Add linux kernel module support 2010-08-31 13:41:58 -07:00
spa_config.c Illumos 3749 - zfs event processing should work on R/O root filesystems 2016-01-12 14:42:32 -08:00
spa_errlog.c Illumos 4914 - zfs on-disk bookmark structure should be named *_phys_t 2014-08-06 14:48:41 -07:00
spa_history.c Illumos 5027 - zfs large block support 2015-05-11 12:23:16 -07:00
spa_misc.c Illumos 6367 - spa_config_tryenter incorrectly handles the multiple-lock case 2016-01-12 11:05:28 -08:00
spa_stats.c Illumos 5369 - arc flags should be an enum 2015-06-11 10:27:25 -07:00
spa.c Illumos 6527 - Possible access beyond end of string in zpool comment 2016-01-28 12:46:04 -05:00
space_map.c Illumos 5960, 5925 2016-01-08 15:08:19 -08:00
space_reftree.c Change KM_PUSHPAGE -> KM_SLEEP 2015-01-16 14:41:26 -08:00
trace.c Illumos 5497 - lock contention on arcs_mtx 2015-06-11 10:27:25 -07:00
txg.c Increase default user space stack size 2016-01-13 13:55:12 -08:00
uberblock.c Illumos 5347 - idle pool may run itself out of space 2015-07-14 10:35:21 -07:00
unique.c Change KM_PUSHPAGE -> KM_SLEEP 2015-01-16 14:41:26 -08:00
vdev_cache.c Illumos 5045 - use atomic_{inc,dec}_* instead of atomic_add_* 2016-01-15 15:38:36 -08:00
vdev_disk.c Fix use-after-free in vdev_disk_physio_completion 2015-10-13 15:25:33 -07:00
vdev_file.c Disable LBA weighting on files and SSDs 2015-09-01 15:22:07 -07:00
vdev_label.c Illumos 6414 - vdev_config_sync could be simpler 2016-01-28 12:44:39 -05:00
vdev_mirror.c FreeBSD r256956: Improve ZFS N-way mirror read performance by using load and locality information. 2016-02-26 11:24:35 -08:00
vdev_missing.c Illumos #5244 - zio pipeline callers should explicitly invoke next stage 2015-04-30 15:07:47 -07:00
vdev_queue.c FreeBSD r256956: Improve ZFS N-way mirror read performance by using load and locality information. 2016-02-26 11:24:35 -08:00
vdev_raidz.c Illumos #5244 - zio pipeline callers should explicitly invoke next stage 2015-04-30 15:07:47 -07:00
vdev_root.c Illumos #3598 2013-10-31 14:58:04 -07:00
vdev.c Identify locks flagged by lockdep 2015-12-22 10:21:33 -08:00
zap_leaf.c Illumos 5314 - Remove "dbuf phys" db->db_data pointer aliases in ZFS 2015-04-28 16:25:20 -07:00
zap_micro.c Add zap_prefetch() interface 2015-12-04 09:39:20 -08:00
zap.c Illumos 5960, 5925 2016-01-08 15:08:19 -08:00
zfeature_common.c Illumos 5959 - clean up per-dataset feature count code 2015-12-04 14:20:20 -08:00
zfeature.c Illumos 5959 - clean up per-dataset feature count code 2015-12-04 14:20:20 -08:00
zfs_acl.c Change KM_PUSHPAGE -> KM_SLEEP 2015-01-16 14:41:26 -08:00
zfs_byteswap.c Add linux kernel module support 2010-08-31 13:41:58 -07:00
zfs_ctldir.c Hold the zfs_snapentry_t before dispatch 2015-12-14 12:06:31 -08:00
zfs_debug.c Add dbgmsg kstat 2015-09-04 16:08:14 -07:00
zfs_dir.c Illumos 4950 - files sometimes can't be removed from a full filesystem 2016-01-21 16:59:30 -08:00
zfs_fm.c Remove wrong ASSERT in annotate_ecksum 2016-02-17 10:43:02 -08:00
zfs_fuid.c Illumos #3522 2013-10-30 14:51:27 -07:00
zfs_ioctl.c Illumos 6096 - ZFS_SMB_ACL_RENAME needs to cleanup better 2016-02-05 11:28:53 -08:00
zfs_log.c Illumos 5027 - zfs large block support 2015-05-11 12:23:16 -07:00
zfs_onexit.c zfsdev_getminor() should check for invalid file handles 2015-06-22 17:02:13 -07:00
zfs_replay.c Linux AIO Support 2014-09-05 15:11:43 -07:00
zfs_rlock.c Change KM_PUSHPAGE -> KM_SLEEP 2015-01-16 14:41:26 -08:00
zfs_sa.c Prevent SA length overflow 2015-12-30 13:20:12 -08:00
zfs_vfsops.c Fix zsb->z_hold_mtx deadlock 2016-01-15 15:33:45 -08:00
zfs_vnops.c llumos 6334 - Cannot unlink files when over quota 2016-01-26 15:27:08 -08:00
zfs_znode.c Illumos 4950 - files sometimes can't be removed from a full filesystem 2016-01-21 16:59:30 -08:00
zil.c Align thread priority with Linux defaults 2015-07-28 13:36:47 -07:00
zio_checksum.c Illumos 4757, 4913 2014-08-01 14:28:05 -07:00
zio_compress.c Illumos 5661 - ZFS: "compression = on" should use lz4 if feature is enabled 2015-07-10 12:11:45 -07:00
zio_inject.c Illumos 5045 - use atomic_{inc,dec}_* instead of atomic_add_* 2016-01-15 15:38:36 -08:00
zio.c Illumos 5438 - zfs_blkptr_verify should continue after zfs_panic_recover 2016-01-12 13:54:05 -08:00
zle.c Update core ZFS code from build 121 to build 141. 2010-05-28 13:45:14 -07:00
zpl_ctldir.c Linux 3.18 compat: Snapshot auto-mounting 2015-08-31 13:54:39 -07:00
zpl_export.c zfsctl: No need to sync ctldir inodes 2015-08-31 13:54:39 -07:00
zpl_file.c Linux 4.1 compat: loop device on ZFS 2015-08-24 10:17:06 -07:00
zpl_inode.c Handling negative dentries in a CI file system. 2016-02-02 13:59:00 -08:00
zpl_super.c Disable zpl_nr_cached_objects() callback 2015-09-25 12:45:42 -07:00
zpl_xattr.c Linux 4.5 compat: xattr list handler 2016-01-20 11:36:56 -08:00
zrlock.c Illumos 5812 - assertion failed in zrl_tryenter(): zr_owner==NULL 2015-04-30 14:43:40 -07:00
zvol.c Make zvol minor functionality more robust 2016-02-24 11:54:24 -08:00