freebsd-dev/sys/cddl/contrib/opensolaris/uts/common/fs/zfs
Josh Paetzel 9a625bd31c MFV 316870
7448 ZFS doesn't notice when disk vdevs have no write cache

illumos/illumos-gate@295438ba32
295438ba32

https://www.illumos.org/issues/7448
       I built a SmartOS image with all the NVMe commits including 7372
       (support NVMe volatile write cache) and repeated my dd testing:
       > #!/bin/bash
       > for i in `seq 1 1000`; do
       > dd if=/dev/zero of=file00 bs=1M count=102400 oflag=sync &
       > dd if=/dev/zero of=file01 bs=1M count=102400 oflag=sync &
       > wait
       > rm file00 file01
       > done
       >
       Previously each dd command took ~145 seconds to finish, now it takes
       ~400 seconds.
       Eventually I figured out it is 7372 that causes unnecessary
       nvme_bd_sync() executions which wasted CPU cycles.
  If a NVMe device doesn't support a write cache, the nvme_bd_sync function will
  return ENOTSUP to indicate this to upper layers.
  It seems this returned value is ignored by ZFS, and as such this bug is not
  really specific to NVMe. In vdev_disk_io_start() ZFS sends the flush to the
  disk driver (blkdev) with a callback to vdev_disk_ioctl_done(). As nvme filled
  in the bd_sync_cache function pointer, blkdev will not return ENOTSUP, as the
  nvme driver in general does support cache flush. Instead it will issue an
  asynchronous flush to nvme and immediately return 0, and hence ZFS will not set
  vdev_nowritecache here. The nvme driver will at some point process the cache
  flush command, and if there is no write cache on the device it will return
  ENOTSUP, which will be delivered to the vdev_disk_ioctl_done() callback. This
  function will not check the error code and not set nowritecache.
  The right place to check the error code from the cache flush is in
  zio_vdev_io_assess(). This would catch both cases, synchronous and asynchronous
  cache flushes. This would also be independent of the implementation detail that
  some drivers can return ENOTSUP immediately.

Reviewed by: Dan Fields <dan.fields@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Hans Rosenfeld <hans.rosenfeld@nexenta.com>
Obtained from:	Illumos
2017-04-21 00:17:54 +00:00
..
sys MFV 316868 2017-04-21 00:12:47 +00:00
arc.c Reduce ARC fragmentation threshold 2017-03-17 12:34:57 +00:00
blkptr.c
bplist.c
bpobj.c MFV r296518: 5027 zfs large block support (add copyright) 2016-03-08 17:51:09 +00:00
bptree.c MFV r302661: 7082 bptree_iterate() passes wrong args to zfs_dbgmsg() 2016-09-01 15:10:40 +00:00
bqueue.c
dbuf.c MFV 314243 2017-02-25 14:45:54 +00:00
ddt_zap.c
ddt.c MFV r289310: 2015-10-16 14:45:21 +00:00
dmu_diff.c MFV r302991: 6950 ARC should cache compressed data 2016-09-03 08:30:51 +00:00
dmu_object.c MFV 316868 2017-04-21 00:12:47 +00:00
dmu_objset.c MFV 316868 2017-04-21 00:12:47 +00:00
dmu_send.c MFV r306422: 7254 ztest failed assertion in ztest_dataset_dirobj_verify: dirobjs + 1 == usedobjs 2016-09-28 23:54:47 +00:00
dmu_traverse.c Add #ifdef _KERNEL around send_holes_without_birth_time sysctl. 2016-09-29 17:48:53 +00:00
dmu_tx.c MFC r305337: 7004 dmu_tx_hold_zap() does dnode_hold() 7x on same object 2016-09-03 11:00:29 +00:00
dmu_zfetch.c Add missed vfs.zfs.zfetch.max_idistance sysctl. 2016-12-10 21:19:27 +00:00
dmu.c MFV: 315989 2017-03-27 17:27:46 +00:00
dnode_sync.c MFV 316868 2017-04-21 00:12:47 +00:00
dnode.c MFV 314243 2017-02-25 14:45:54 +00:00
dsl_bookmark.c MFV r302660: 6314 buffer overflow in dsl_dataset_name 2016-09-01 15:08:27 +00:00
dsl_dataset.c MFV r314910: 7843 get_clones_stat() is suboptimal for lots of clones 2017-03-08 13:48:26 +00:00
dsl_deadlist.c MFV r296518: 5027 zfs large block support (add copyright) 2016-03-08 17:51:09 +00:00
dsl_deleg.c MFV r302660: 6314 buffer overflow in dsl_dataset_name 2016-09-01 15:08:27 +00:00
dsl_destroy.c MFV r306422: 7254 ztest failed assertion in ztest_dataset_dirobj_verify: dirobjs + 1 == usedobjs 2016-09-28 23:54:47 +00:00
dsl_dir.c MFV 314243 2017-02-25 14:45:54 +00:00
dsl_pool.c MFV 312436 2017-01-20 15:01:04 +00:00
dsl_prop.c MFV r302660: 6314 buffer overflow in dsl_dataset_name 2016-09-01 15:08:27 +00:00
dsl_scan.c MFV r306422: 7254 ztest failed assertion in ztest_dataset_dirobj_verify: dirobjs + 1 == usedobjs 2016-09-28 23:54:47 +00:00
dsl_synctask.c
dsl_userhold.c MFV r302660: 6314 buffer overflow in dsl_dataset_name 2016-09-01 15:08:27 +00:00
edonr_zfs.c MFV r289310: 2015-10-16 14:45:21 +00:00
gzip.c
lz4.c MFV r268120: 2016-09-11 17:48:06 +00:00
lzjb.c
metaslab.c MFV r315290, r315291: 7303 dynamic metaslab selection 2017-03-24 09:37:00 +00:00
multilist.c
range_tree.c
refcount.c MFV r304155: 7090 zfs should improve allocation order and throttle allocations 2016-09-03 10:04:37 +00:00
rrwlock.c
sa.c MFV 314243 2017-02-25 14:45:54 +00:00
sha256.c Connect the SHA-512t256 and Skein hashing algorithms to ZFS 2016-05-31 04:12:14 +00:00
skein_zfs.c Connect the SHA-512t256 and Skein hashing algorithms to ZFS 2016-05-31 04:12:14 +00:00
spa_config.c MFV r299440: 6736 ZFS per-vdev ZAPs 2016-05-11 12:54:00 +00:00
spa_errlog.c
spa_history.c MFV r302660: 6314 buffer overflow in dsl_dataset_name 2016-09-01 15:08:27 +00:00
spa_misc.c rename vfs.zfs.debug_flags to vfs.zfs.debugflags 2017-04-14 15:35:07 +00:00
spa.c MFV r315290, r315291: 7303 dynamic metaslab selection 2017-03-24 09:37:00 +00:00
space_map.c MFV r315290, r315291: 7303 dynamic metaslab selection 2017-03-24 09:37:00 +00:00
space_reftree.c MFV r289561: 6328 Fix cstyle errors in zfs codebase 2015-10-19 08:25:37 +00:00
THIRDPARTYLICENSE.lz4
THIRDPARTYLICENSE.lz4.descrip
trim_map.c
txg.c
uberblock.c
unique.c
vdev_cache.c MFV r304155: 7090 zfs should improve allocation order and throttle allocations 2016-09-03 10:04:37 +00:00
vdev_disk.c MFV 316870 2017-04-21 00:17:54 +00:00
vdev_file.c MFV r296505: 6531 Provide mechanism to artificially limit disk performance 2016-03-08 17:27:13 +00:00
vdev_geom.c Fix vdev_geom_attach_by_guids for partitioned disks 2017-04-13 14:51:34 +00:00
vdev_label.c zfsbootcfg: a simple tool to set next boot (one time) options for zfsboot 2016-10-29 14:09:32 +00:00
vdev_mirror.c MFV r304155: 7090 zfs should improve allocation order and throttle allocations 2016-09-03 10:04:37 +00:00
vdev_missing.c
vdev_queue.c zfs: add zio_buf_alloc_nowait and use it in vdev_queue_aggregate 2017-03-23 08:59:17 +00:00
vdev_raidz.c MFV r296518: 5027 zfs large block support (add copyright) 2016-03-08 17:51:09 +00:00
vdev_root.c
vdev.c Fix expandsz 16.0E vals and vdev_min_asize of RAIDZ children 2017-04-03 13:11:28 +00:00
zap_leaf.c MFV r289561: 6328 Fix cstyle errors in zfs codebase 2015-10-19 08:25:37 +00:00
zap_micro.c MFV 314243 2017-02-25 14:45:54 +00:00
zap.c MFV 314243 2017-02-25 14:45:54 +00:00
zfeature.c MFV r289561: 6328 Fix cstyle errors in zfs codebase 2015-10-19 08:25:37 +00:00
zfs_acl.c zfs: honour and make use of vfs vnode locking protocol 2016-08-05 06:23:06 +00:00
zfs_byteswap.c
zfs_ctldir.c zfs: provide a special vptocnp method for the .zfs vnode 2017-03-11 16:00:49 +00:00
zfs_debug.c
zfs_dir.c fix .zfs-related cases in zfs_lookup that were broken by r303763 2016-08-06 11:02:07 +00:00
zfs_fm.c
zfs_fuid.c
zfs_ioctl.c MFV r308987: 7180 potential race between zfs_suspend_fs+zfs_resume_fs 2016-11-24 10:21:22 +00:00
zfs_log.c After some ZIL changes 6 years ago zil_slog_limit got partially broken 2016-11-17 21:01:27 +00:00
zfs_onexit.c
zfs_replay.c MFV r289561: 6328 Fix cstyle errors in zfs codebase 2015-10-19 08:25:37 +00:00
zfs_rlock.c
zfs_sa.c zfs: honour and make use of vfs vnode locking protocol 2016-08-05 06:23:06 +00:00
zfs_vfsops.c reimplement zfsctl (.zfs) support 2017-02-21 17:47:08 +00:00
zfs_vnops.c - Remove 'struct vmmeter' from 'struct pcpu', leaving only global vmmeter 2017-04-17 17:34:47 +00:00
zfs_znode.c fix unsafe modification of zfs_vnodeops when DIAGNOSTIC is enabled 2016-11-20 14:00:50 +00:00
zfs.conf
zil.c Execute last ZIO of log commit synchronously. 2017-03-02 07:55:47 +00:00
zio_checksum.c MFV r302991: 6950 ARC should cache compressed data 2016-09-03 08:30:51 +00:00
zio_compress.c
zio_inject.c MFV r296505: 6531 Provide mechanism to artificially limit disk performance 2016-03-08 17:27:13 +00:00
zio.c MFV 316870 2017-04-21 00:17:54 +00:00
zle.c
zrlock.c 3746 ZRLs are racy 2016-10-27 07:38:07 +00:00
zvol.c Do not invoke the resize event when previous provider's size was zero. 2017-03-01 18:03:32 +00:00