7448 ZFS doesn't notice when disk vdevs have no write cache
illumos/illumos-gate@295438ba32
295438ba32
https://www.illumos.org/issues/7448
I built a SmartOS image with all the NVMe commits including 7372
(support NVMe volatile write cache) and repeated my dd testing:
> #!/bin/bash
> for i in `seq 1 1000`; do
> dd if=/dev/zero of=file00 bs=1M count=102400 oflag=sync &
> dd if=/dev/zero of=file01 bs=1M count=102400 oflag=sync &
> wait
> rm file00 file01
> done
>
Previously each dd command took ~145 seconds to finish, now it takes
~400 seconds.
Eventually I figured out it is 7372 that causes unnecessary
nvme_bd_sync() executions which wasted CPU cycles.
If a NVMe device doesn't support a write cache, the nvme_bd_sync function will
return ENOTSUP to indicate this to upper layers.
It seems this returned value is ignored by ZFS, and as such this bug is not
really specific to NVMe. In vdev_disk_io_start() ZFS sends the flush to the
disk driver (blkdev) with a callback to vdev_disk_ioctl_done(). As nvme filled
in the bd_sync_cache function pointer, blkdev will not return ENOTSUP, as the
nvme driver in general does support cache flush. Instead it will issue an
asynchronous flush to nvme and immediately return 0, and hence ZFS will not set
vdev_nowritecache here. The nvme driver will at some point process the cache
flush command, and if there is no write cache on the device it will return
ENOTSUP, which will be delivered to the vdev_disk_ioctl_done() callback. This
function will not check the error code and not set nowritecache.
The right place to check the error code from the cache flush is in
zio_vdev_io_assess(). This would catch both cases, synchronous and asynchronous
cache flushes. This would also be independent of the implementation detail that
some drivers can return ENOTSUP immediately.
Reviewed by: Dan Fields <dan.fields@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Hans Rosenfeld <hans.rosenfeld@nexenta.com>
Obtained from: Illumos
FreeBSD Source:
This is the top level of the FreeBSD source directory. This file
was last revised on:
FreeBSD
For copyright information, please see the file COPYRIGHT in this directory (additional copyright information also exists for some sources in this tree - please see the specific source directories for more information).
The Makefile in this directory supports a number of targets for building components (or all) of the FreeBSD source tree. See build(7) and http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/makeworld.html for more information, including setting make(1) variables.
The buildkernel
and installkernel
targets build and install
the kernel and the modules (see below). Please see the top of
the Makefile in this directory for more information on the
standard build targets and compile-time flags.
Building a kernel is a somewhat more involved process. See build(7), config(8), and http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/kernelconfig.html for more information.
Note: If you want to build and install the kernel with the
buildkernel
and installkernel
targets, you might need to build
world before. More information is available in the handbook.
The kernel configuration files reside in the sys/<arch>/conf
sub-directory. GENERIC is the default configuration used in release builds.
NOTES contains entries and documentation for all possible
devices, not just those commonly used.
Source Roadmap:
bin System/user commands.
cddl Various commands and libraries under the Common Development
and Distribution License.
contrib Packages contributed by 3rd parties.
crypto Cryptography stuff (see crypto/README).
etc Template files for /etc.
gnu Various commands and libraries under the GNU Public License.
Please see gnu/COPYING* for more information.
include System include files.
kerberos5 Kerberos5 (Heimdal) package.
lib System libraries.
libexec System daemons.
release Release building Makefile & associated tools.
rescue Build system for statically linked /rescue utilities.
sbin System commands.
secure Cryptographic libraries and commands.
share Shared resources.
sys Kernel sources.
tests Regression tests which can be run by Kyua. See tests/README
for additional information.
tools Utilities for regression testing and miscellaneous tasks.
usr.bin User commands.
usr.sbin System administration commands.
For information on synchronizing your source tree with one or more of the FreeBSD Project's development branches, please see:
http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/synching.html