Commit Graph

1631 Commits

Author SHA1 Message Date
Brian Behlendorf
2cbb06b561 Restructure per-filesystem reclaim
Originally when the ARC prune callback was introduced the idea was
to register a single callback for the ZPL.  The ARC could invoke this
call back if it needed the ZPL to drop dentries, inodes, or other
cache objects which might be pinning buffers in the ARC.  The ZPL
would iterate over all ZFS super blocks and perform the reclaim.

For the most part this design has worked well but due to limitations
in 2.6.35 and earlier kernels there were some problems.  This patch
is designed to address those issues.

1) iterate_supers_type() is not provided by all kernels which makes
it impossible to safely iterate over all zpl_fs_type filesystems in
a single callback.  The most straight forward and portable way to
resolve this is to register a callback per-filesystem during mount.
The arc_*_prune_callback() functions have always supported multiple
callbacks so this is functionally a very small change.

2) Commit 050d22b removed the non-portable shrink_dcache_memory()
and shrink_icache_memory() functions and didn't replace them with
equivalent functionality.  This meant that for Linux 3.1 and older
kernels the ARC had no mechanism to drop dentries and inodes from
the caches if needed.  This patch adds that missing functionality
by calling shrink_dcache_parent() to release dentries which may be
pinning inodes.  This will result in all unused cache entries being
dropped which is a bit heavy handed but it's the only interface
available for old kernels.

3) A zpl_drop_inode() callback is registered for kernels older than
2.6.35 which do not support the .evict_inode callback.  This ensures
that when the last reference on an inode is dropped it is immediately
removed from the cache.  If this isn't done than inode can end up on
the global unused LRU with no mechanism available to ZFS to drop them.
Since the ARC buffers are not dropped the hottest inodes can still
be recreated without performing disk IO.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Issue #3160
2015-03-20 10:35:20 -07:00
Brian Behlendorf
596a8935a1 Fix arc_meta_max accounting
The arc_meta_max value should be increased when space it consumed not when
it is returned.  This ensure's that arc_meta_max is always up to date.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Issue #3160
2015-03-20 10:35:20 -07:00
Richard Yao
5c3f61eb49 Increase Linux pipe buffer size on 'zfs receive'
I noticed when reviewing documentation that it is possible for user
space to use fctnl(fd, F_SETPIPE_SZ, (unsigned long) size) to change
the kernel pipe buffer size on Linux to increase the pipe size up to
the value specified in /proc/sys/fs/pipe-max-size. There are users using
mbuffer to improve zfs recv performance when piping over the network, so
it seems advantageous to integrate such functionality directly into the
zfs recv tool. This avoids the addition of two buffers and two copies
(one for the buffer mbuffer adds and another for the additional pipe),
so it should be more efficient.

This could have been made configurable and/or this could have changed
the value back to the original after we were done with the file
descriptor, but I do not see a strong case for doing either, so I
went with a simple implementation.

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #1161
2015-03-20 10:03:34 -07:00
Turbo Fredriksson
b1a3e93217 Move duplicate information about the 'zfs send -e' option.
The extra one was under the 'zfs receive' command (which isn't relevant).
Instead, it should have been further up (still in the 'zfs send' option).

Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3194
2015-03-19 10:39:43 -07:00
Chunwei Chen
40749aa7a6 Use MUTEX_FSTRANS on l2arc_buflist_mtx
Use MUTEX_FSTRANS on l2arc_buflist_mtx to prevent the following deadlock
scenario:
1. arc_release() -> hash_lock -> l2arc_buflist_mtx
2. l2arc_write_buffers() -> l2arc_buflist_mtx -> (direct reclaim) ->
   arc_buf_remove_ref() -> hash_lock

Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Signed-off-by: Tim Chase <tim@chase2k.com>
Issue #3160
2015-03-18 09:29:38 -07:00
Hajo Möller
a1d3450e94 Fix warning about AM_INIT_AUTOMAKE arguments
As of automake 1.14.2, currently shipped with Ubuntu 14.04, automake
warns about AM_INIT_AUTOMAKE having more than one argument:

configure.ac:41: warning: AM_INIT_AUTOMAKE: two- and three-arguments forms are deprecated.  For more info, see:
configure.ac:41: http://www.gnu.org/software/automake/manual/automake.html#Modernize-AM_005fINIT_005fAUTOMAKE-invocation

This commit fixes the warnings by following above link's advice, so
AM_INIT gets called with the package's name and version. As both are
defined in the META file we're parsing it with `grep`, `cut` and `tr`.

NOTE: autoconf < 1.14 not supporting m4_esyscmd_s so m4_esyscmd was
used and modified `tr` to truncate newlines, too.

Signed-off-by: Hajo M<C3><B6>ller <dasjoe@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3174
2015-03-18 09:18:57 -07:00
Bill McGonigle
e023409500 Linux 4.0 compat: bdi_setup_and_register() __must_check
Explicitly disable the unused by variable warnings by setting
__attribute__((unused)) for bdi_setup_and_register().  This is
required because the function is defined with the __must_check
attribute.

Signed-off-by: Bill McGonigle <bill-github.com-public1@bfccomputing.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3141
2015-03-16 10:56:26 -07:00
Justin T. Gibbs
4c7b7eedcd Illumos 5630 - stale bonus buffer in recycled dnode_t leads to data corruption
5630 stale bonus buffer in recycled dnode_t leads to data corruption
Author: Justin T. Gibbs <justing@spectralogic.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Will Andrews <will@freebsd.org>
Approved by: Robert Mustacchi <rm@joyent.com>

References:
  https://www.illumos.org/issues/5630
  https://github.com/illumos/illumos-gate/commit/cd485b4

Ported-by: Chris Dunlop <chris@onthe.net.au>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Issue #3172
2015-03-12 15:40:39 -07:00
Josef 'Jeff' Sipek
73ad4a9f3c Illumos 5047 - don't use atomic_*_nv if you discard the return value
5047 don't use atomic_*_nv if you discard the return value
Author: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
Reviewed by: Garrett D'Amore <garrett@damore.org>
Reviewed by: Jason King <jason.brian.king@gmail.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>

References:
  https://www.illumos.org/issues/5047
  https://github.com/illumos/illumos-gate/commit/640c167

Porting Notes:

Several hunks from the original patch where not specific to ZFS
and thus were dropped.

Ported-by: Chris Dunlop <chris@onthe.net.au>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Issue #3172
2015-03-12 15:40:33 -07:00
Brian Behlendorf
7f3e466283 Mark zfs_inactive() with PF_FSTRANS
Allowing direct reclaim to re-enter the VFS in the zfs_inactive()
call path has historically been problematic for ZoL.  Therefore,
in order to avoid an entire class of current and future issues
caused by this PF_FSTRANS is set for all zfs_inactive() callers.

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3163
2015-03-10 09:21:48 -07:00
Hajo Möller
6184b3a6a0 Actually source /etc/sysconfig/zfs instead of /etc/default/zfs
Signed-off-by: Hajo M<C3><B6>ller <dasjoe@users.noreply.github.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3162
2015-03-09 17:13:04 -07:00
Ned Bass
417104bdd3 Use cached feature info in spa_add_feature_stats()
Avoid issuing I/O to the pool when retrieving feature flags information.
Trying to read the ZAPs from disk means that zpool clear would hang if
the pool is suspended and recovery would require a reboot. To keep the
feature stats resident in memory, we hang a cached nvlist off of the
spa.  It is built up from disk the first time spa_add_feature_stats() is
called, and refreshed thereafter using the cached feature reference
counts. spa_add_feature_stats() gets called at pool import time so we
can be sure the cached nvlist will be available if the pool is later
suspended.

Signed-off-by: Ned Bass <bass6@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3082
2015-03-05 14:11:10 -08:00
Chris Dunlap
0e86d309cc Add ZED to zfs.redhat.in script
This commit updates the zfs.redhat.in script to start/stop ZED.

Signed-off-by: Chris Dunlap <cdunlap@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #3153
2015-03-05 14:07:04 -08:00
Brian Behlendorf
a7b9d0c3a0 Replace zfs.redhat.in with zfs.lsb.in init script
This commit replaces the zfs.redhat.in init script with a slightly
modified version of the existing zfs.lsb.in init script.  This was
done to minimize the functional differences between platforms.
The lsb version of the script was choosen because it's heavily
tested and provides the most functionality.

Changes made for RHEL systems:
* Configuration: /etc/default/zfs -> /etc/sysconfig/zfs
* LSB functions: /lib/lsb/init-functions -> /etc/rc.d/init.d/functions
* Logging: log_begin_msg/log_end_msg -> action

Features in LSB which are now in RHEL:
* USE_DISK_BY_ID=0      - Use the by-id names
* VERBOSE_MOUNT=0       - Verbose mounts by default
* DO_OVERLAY_MOUNTS=0   - Overlay mounts by default
* MOUNT_EXTRA_OPTIONS=0 - Generic extra options

Existing RHEL features which were removed:
* Automatically mounting FSs on ZVOLs listed in /etc/fstab

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #3153
2015-03-04 11:33:07 -08:00
Turbo Fredriksson
02bd676df1 Install arc_summary.py
Add the arc_summary Makefile to the build system so the script is
properly included in the distribution tarball and installed.

Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3147
2015-03-03 13:38:17 -08:00
Brian Behlendorf
989fd514b1 Change ASSERT(!"...") to cmn_err(CE_PANIC, ...)
There are a handful of ASSERT(!"...")'s throughout the code base for
cases which should be impossible.  This patch converts them to use
cmn_err(CE_PANIC, ...) to ensure they are always enabled and so that
additional debugging is logged if they were to occur.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #1445
2015-03-03 13:22:21 -08:00
Brian Behlendorf
8c45def24a Linux 4.0 compat: bdi_setup_and_register()
The 'capabilities' argument which was passed to bdi_setup_and_register()
has been removed.  File systems should no longer pass BDI_CAP_MAP_COPY.
For our purposes this means there are now three different interfaces
which must be handled.  A zpl_bdi_setup_and_register() wrapper function
has been introduced to provide a single interface to the ZPL code.

* 2.6.32 - 2.6.33, bdi_setup_and_register() is not exported.
* 2.6.34 - 3.19, bdi_setup_and_register() takes 3 arguments.
* 4.0 - x.y, bdi_setup_and_register() takes 2 arguments.

I've also taken this opportunity to remove HAVE_BDI because kernels
older then 2.6.32 are no longer supported.  All kernels newer than
this will have one of the above interfaces.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Closes #3128
2015-03-03 10:49:45 -08:00
Brian Behlendorf
4ec15b8dcf Use MUTEX_FSTRANS mutex type
There are regions in the ZFS code where it is desirable to be able
to be set PF_FSTRANS while a specific mutex is held.  The ZFS code
could be updated to set/clear this flag in all the correct places,
but this is undesirable for a few reasons.

1) It would require changes to a significant amount of the ZFS
   code.  This would complicate applying patches from upstream.

2) It would be easy to accidentally miss a critical region in
   the initial patch or to have an future change introduce a
   new one.

Both of these concerns can be addressed by using a new mutex type
which is responsible for managing PF_FSTRANS, support for which was
added to the SPL in commit zfsonlinux/spl@9099312 - Merge branch
'kmem-rework'.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Closes #3050
Closes #3055
Closes #3062
Closes #3132
Closes #3142
Closes #2983
2015-03-03 10:46:40 -08:00
Isaac Huang
d14cfd83da Fix deadlock between zpool export and zfs list
Pool reference count is NOT checked in spa_export_common()
if the pool has been imported readonly=on, i.e. spa->spa_sync_on
is FALSE. Then zpool export and zfs list may deadlock:

1. Pool A is imported readonly.
2. zpool export A and zfs list are run concurrently.
3. zfs command gets reference on the spa, which holds a dbuf on
   on the MOS meta dnode.
4. zpool command grabs spa_namespace_lock, and tries to evict dbufs
   of the MOS meta dnode. The dbuf held by zfs command can't be
   evicted as its reference count is not 0.
5. zpool command blocks in dnode_special_close() waiting for the
   MOS meta dnode reference count to drop to 0, with
   spa_namespace_lock held.
6. zfs command tries to get the spa_namespace_lock with a reference
   on the spa held, which holds a dbuf on the MOS meta dnode.
7. Now zpool command and zfs command deadlock each other.

Also any further zfs/zpool command will block on spa_namespace_lock
forever.

The fix is to always check pool reference count in spa_export_common(),
no matter whether the pool was imported readonly or not.

Signed-off-by: Isaac Huang <he.huang@intel.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2034
2015-03-02 11:50:06 -08:00
Brian Behlendorf
87a63dd702 Prevent "zpool destroy|export" when suspended
Cleanly destroying or exporting a pool requires that the pool
not be suspended.  Therefore, set the POOL_CHECK_SUSPENDED flag
for these ioctls so the utilities will output a descriptive
error message rather than block.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #2878
2015-03-02 11:50:06 -08:00
Tim Chase
fdc5d98253 Avoid dladdr() in ztest
Under Linux, at least, dladdr() doesn't reliably work for functions which
aren't in a DSO.  Add the function name to ztest_info[].

Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3130
2015-02-27 13:54:15 -08:00
Christer Ekholm
8bdcfb5396 Fix possible future overflow in zfs_nicenum
The function zfs_nicenum that converts number to human-readable output
uses a index to a string of letters. This patch limits the index to
the length of the string.

Signed-off-by: Christer Ekholm <che@chrekh.se>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3122
2015-02-24 11:39:10 -08:00
Brian Behlendorf
b4f3666a16 Retire spl_module_init()/spl_module_fini()
In the original implementation of the SPL wrappers were provided
for module initialization and cleanup.  This was done to abstract
away any compatibility code which might be needed for the SPL.

As it turned out the only significant compatibility issue was that
the default pwd during module load differed under Illumos and Linux.
Since this is such as minor thing and the wrappers complicate the
code they are being retired.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2985
2015-02-24 11:37:44 -08:00
Brian Behlendorf
1efdc45ea8 Fix O_APPEND open(2) flag
As described in flags section of open(2):

  O_APPEND:
    The  file  is  opened in append mode.  Before each write(2), the
    file offset is positioned at the end of the  file,  as  if  with
    lseek(2).   O_APPEND may lead to corrupted files on NFS filesys-
    tems if more than one process appends data to a  file  at  once.
    This is because NFS does not support appending to a file, so the
    client kernel has to simulate it, which can't be done without  a
    race condition.

This issue was originally overlooked because normally the generic
VFS code handles this for a filesystem.  However, because ZFS explictly
registers a zpl_write() function it's responsible for the seek.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3124
2015-02-24 11:21:54 -08:00
Steffen Müthing
1d966336f5 Fix error in dracut script if not using ZFS root
If we are not booting from ZFS, parse-zfs.sh fails because of
an unset variable.

Signed-off-by: Steffen M<C3><BC>thing <steffen.muething@iwr.uni-heidelberg.de>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #3110
2015-02-17 16:15:16 -08:00
Steffen Müthing
1543b20a87 Add required files to initramfs
The dracut module installs the udev rules and the vdev_id utility for creating
the /dev/disk/by-vdev/ names, but omits some additional utilities and the
config file required by vdev_id.

Signed-off-by: Steffen M<C3><BC>thing <steffen.muething@iwr.uni-heidelberg.de>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #3110
2015-02-17 16:12:50 -08:00
Dan Swartzendruber
1611bb7b4f Set zfs_autoimport_disable default value to 1
When loading the ZFS kernel modules they should not populate the
spa namespace using the cache file.  This behavior isn't consistent
with other Linux kernel modules and we need to move away from it.
Removing this makes the whole startup process predictable with four
basic steps which are driven by the init system.

1) modprobe
2) zpool import
3) zfs mount
4) zfs share

This change also helps lay the groundwork for eventually removing
the kobj_* compatibility code on the kernel side.  It may need to
be preserved in userspace because libzfs_init() depends on it.
This is why the conditional must be wrapped with an #ifdef _KERNEL.

Signed-off-by: Dan Swartzendruber <dswartz@druber.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2820
2015-02-17 16:09:41 -08:00
Brian Behlendorf
7d2868d5fc Skip bad DVAs during free by setting zfs_recover=1
When a bad DVA is encountered in metaslab_free_dva() the system
should treat it as fatal.  This indicates that somehow a damaged
DVA was written to disk and that should be impossible.

However, we have seen a handful of reports over the years of pools
somehow being damaged in this way.  Since this damage can render
otherwise intact pools unimportable, and the consequence of skipping
the bad DVA is only leaked free space, it makes sense to provide
a mechanism to ignore the bad DVA.  Setting the zfs_recover=1 module
option will cause the DVA to be ignored which may allow the pool to
be imported.

Since zfs_recover=0 by default any pool attempting to free a bad DVA
will treat it as a fatal error preserving the current behavior.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3099
Issue #3090
Issue #2720
2015-02-13 16:02:04 -08:00
Sören Tempel
cbedd7b034 Write directly to $initdir
Simplify install() by removing the need for a temp file.

Signed-off-by: Soeren Tempel <soeren+git@soeren-tempel.net>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #3093
2015-02-13 15:58:53 -08:00
Sören Tempel
cfbaa3c830 Use test(1) in a proper way
Use the correct operators to check the expected data type.

Signed-off-by: Soeren Tempel <soeren+git@soeren-tempel.net>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #3093
2015-02-13 15:58:18 -08:00
Tim Chase
e02b533e74 Enhancements to zpool dry run mode.
In dry run mode, zpool should display more of the proposed pool
configuration for "zpool add".  This commit adds support for displaying
cache devices.

Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #1106
2015-02-11 14:05:17 -08:00
Brian Behlendorf
340dfbe193 Change VERIFY to ASSERT in mutex_destroy()
There have been multiple reports of 'zdb' tripping the VERIFY in
mutex_destroy() because pthread_mutex_destroy() returns EBUSY.

Exactly how this can happen still needs to be explained, but this
doesn't strictly need to be fatal for non-debug builds.  Therefore,
this patch converts the VERIFY to an ASSERT until the root cause
is determined and resolved.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #2027
2015-02-11 13:58:40 -08:00
Andrey Vesnovaty
5f15fa2216 Fix readdir for .zfs/snapshot directory
dmu_snapshot_list_next stores the index of the next snapshot entry to the offp
argument, which zpl_snapdir_iterate then uses for the dir_emit. This
result in an off-by-one error. Therefore a temporary variable should be
used.

This was a regression introduced in commit zfsonlinux/zfs@0f37d0c.

Signed-off-by: Andrey Vesnovaty <andrey.vesnovaty@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2930
2015-02-10 16:34:30 -08:00
Brian Behlendorf
3941503c0a Retire zio_cons()/zio_dest()
The zio_cons() constructor and zio_dest() destructor don't exist
in the upstream Illumos code.  They were introduced as a workaround
to avoid issue #2523.  Since this issue has now been resolved this
code is being reverted to bring ZoL back in sync with Illumos.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ned Bass <bass6@llnl.gov>
Issue #3063
2015-02-10 16:09:49 -08:00
Brian Behlendorf
6442f3cfe3 Retire zio_bulk_flags
Long ago the zio_bulk_flags module parameter was introduced to
facilitate debugging and profiling the zio_buf_caches.  Today
this code works well and there's no compelling reason to keep
this functionality.  In fact it's preferable to revert this so
the code is more consistent with other ZFS implementations.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ned Bass <bass6@llnl.gov>
Issue #3063
2015-02-10 16:08:49 -08:00
Jörg Thalheim
534759fad3 Linux 3.19 compat: file_inode was added
struct access f->f_dentry->d_inode was replaced by accessor function
file_inode(f)

Signed-off-by: Joerg Thalheim <joerg@higgsboson.tk>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3084
2015-02-10 11:24:51 -08:00
Brian Behlendorf
77aef6f60e Use vmem_alloc() for nvlists
Several of the nvlist functions may perform allocations larger than
the 32k warning threshold.  Convert them to use vmem_alloc() so the
best allocator is used.

Commit efcd79a retired KM_NODEBUG which was used to suppress large
allocation warnings.  Concurrently the large allocation warning threshold
was increased from 8k to 32k.  The goal was to identify the remaining
locations, such as this one, where the allocation can be larger than
32k.  This patch is expected fine tuning resulting for the kmem-rework
changes, see commit 6e9710f.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3057
Closes #3079
Closes #3081
2015-02-10 11:00:08 -08:00
Brian Behlendorf
afe373260e Revert "Don't read space maps during import for readonly pools"
This reverts commit 7fc8c33ede which
accidentally introduced a ztest failure.

ztest: '/usr/sbin/zdb -bcc -d -U /var/tmp/zpool.cache ztest' exit code 2
child exited with code 3

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2015-02-09 16:56:59 -08:00
Tim Chase
e890dd85a7 Produce a full snapshot list for zfs send -p
In order to accelerate zfs receive operations in the face of many
property-containing snapshots, commit 0574855 changed the header nvlist
("fss") of a send stream to exclude snapshots which aren't part of the
stream.  This, however, would cause zfs receive -F to erroneously remove
snapshots; it would remove any snapshot which wasn't listed in the header
nvlist.

This patch restores the full list of snapshots in fss[<id>[snaps]] but
still suppresses the properties of non-sent snapshots and also removes a
consistency check in which an error is raised if a listed snapshot does
not have any properties in fss[<id>[snapprops]].

The 0574855 commit also introduced a bug in which zfs send -p of a
complete stream (zfs send -p pool/fs@snap) would exclude the snapshot
properties in fss[<id>[snapprops]].  This patch detects the last snapshot
in a series when no "from" snapshot has been specified and includes its
properties.

Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2907
2015-02-09 16:43:17 -08:00
Brian Behlendorf
7fc8c33ede Don't read space maps during import for readonly pools
Normally when importing a pool the space maps for all top level
vdevs are read from disk.  The space maps will be required latter
when an allocation is performed and free blocks need to be located.

However, if the pool is imported readonly then we are guaranteed
that no allocations can occur.  In this case the space maps need
not be loaded..   A similar argument can be made for the DTLs
(dirty time logs).

Because a pool import will fail if the space maps cannot be read.
The ability to safely ignore them makes it more likely that a
damaged pool can be imported readonly to recover its contents.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #2831
2015-02-09 16:43:03 -08:00
Lukas Wunner
bf5efb5c66 Fix Dracut scripts to allow for blanks in pool and dataset names
The ability to use blanks is documented in zpool(8) and implemented
in module/zcommon/zfs_namecheck.c:valid_char().

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3083
2015-02-09 10:08:43 -08:00
Lukas Wunner
293d141ae4 Fix loop in Dracut shutdown script
The shell executes each command of a pipeline in a subshell,
thus $ret always had the same value after the while loop that
it had before the loop (http://mywiki.wooledge.org/BashFAQ/024),
signaling success even if some of the zpools could not be exported.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3083
2015-02-09 10:08:05 -08:00
Justin T. Gibbs
33b4de513e Illumos 5311 - traverse_dnode may report success when it should not
5311 traverse_dnode may report success when it should not
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Andriy Gapon <avg@FreeBSD.org>
Reviewed by: Will Andrews <willa@spectralogic.com>
Approved by: Dan McDonald <danmcd@omniti.com>

References:
  https://github.com/illumos/illumos-gate/commit/2a89c2c
  https://www.illumos.org/issues/5311

Ported by: DHE <git@dehacked.net>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2970
2015-02-06 12:07:15 -08:00
Ned Bass
a62d1b02e3 Fix SA header size accounting
The functions sa_find_sizes() and sa_build_layout() fail to account
for the additional 2 bytes of SA header space when calculating whether
a variable size attribute might spill over. They may consequently
determine that an attribute will fit in the bonus buffer along with a
spill block pointer, when in reality the attribute would be partially
overwritten by the spill block pointer if spill over occurs. This also
causes an inconsistency between the SA header size and the number of
variable size attributes in the layout, tripping an assertion when
debugging is on. The following reproducer demonstrates the problem.

  ln -s $(perl -e 'print "z" x 20') file
  setfattr -h -n trusted.foo -v $(perl -e 'print "z" x 200') file

Even though sa_find_sizes() computes the index of the attribute where
spill-over will occur, sa_build_layouts() discards the result and
recomputes it itself. As it turns out, both functions get it wrong.
Since this computation is awkward and, as history has shown, easy to
screw up, let's just do it in one place. This patch fixes the bug in
sa_find_sizes() and updates sa_build_layout() to use the result
computed there.

Also improve the comments in sa_find_sizes().

Signed-off-by: Ned Bass <bass6@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Closes #3070
2015-02-06 09:26:46 -08:00
Brian Behlendorf
e2c4acde55 Skip evicting dbufs when walking the dbuf hash
When a dbuf is in the DB_EVICTING state it may no longer be on the
dn_dbufs list.  In which case it's unsafe to call DB_DNODE_ENTER.
Therefore, any dbuf which is found in this safe must be skipped.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2553
Closes #2495
2015-02-06 09:24:28 -08:00
Chunwei Chen
aa506dcb3d Fix build error when make deb
After 53698a4, the following error occurs when make deb.

  CCLD     zed
../../lib/libzfs/.libs/libzfs.so: undefined reference to `get_system_hostid'

Add libzpool.la to zed/Makefile.am to fix this

Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3080
2015-02-06 09:16:32 -08:00
Chunwei Chen
53698a453d Read spl_hostid module parameter before gethostid()
If spl_hostid is set via module parameter, it's likely different from
gethostid(). Therefore, the userspace tool should read it first before
falling back to gethostid().

Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3034
2015-02-04 16:44:53 -08:00
Tim Chase
aa2ef419e4 Spurious ENOMEM returns when reading dbufs kstat
Commit 7b2d78a046 fixed some improper uses
of snprintf(), however, in __dbuf_stats_hash_table_data() the return
value of snprintf is propagated to the caller.  This caused spurious
ENOMEM errors when reading the dbufs kstat.

This commit causes the actual number of characters written to be returned.

Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3072
2015-02-04 16:35:26 -08:00
avg
037763e44e fix l2arc compression buffers leak
Commit log from FreeBSD:

    We have observed that arc_release() can be called concurrently with a
    l2arc in-flight write.  Also, we have observed that arc_hdr_destroy()
    can be called from arc_write_done() for a zio with ZIO_FLAG_IO_REWRITE
    flag in similar circumstances.

    Previously the l2arc headers would be freed while leaking their
    associated compression buffers.  Now the buffers are placed on
    l2arc_free_on_write list for delayed freeing.  This is similar to
    what was already done to arc buffers that were supposed to be freed
    concurrently with in-flight writes of those buffers.

    In addition to fixing the discovered leaks this change also adds
    some protective code to assert that a compression buffer associated
    with a l2arc header is never leaked.

    A new kstat l2_cdata_free_on_write is added.  It keeps a count
    of delayed compression buffer frees which previously would have
    been leaks.

    Tested by:  Vitalij Satanivskij <satan@ukr.net> et al
    Requested by:       many
    MFC after:  2 weeks
    Sponsored by:       HybridCluster / ClusterHQ

References:
    https://illumos.org/issues/5222
    https://github.com/freebsd/freebsd/commit/b98f85d
    http://thread.gmane.org/gmane.os.freebsd.current/155757/focus=155781
    http://lists.open-zfs.org/pipermail/developer/2014-January/000455.html
    http://lists.open-zfs.org/pipermail/developer/2014-February/000523.html

Ported-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3029
2015-02-03 16:54:16 -08:00
Brian Behlendorf
19ea3d25df Use zio buffers in zil_itx_create()
The zil_itx_create() function uses the vmem_alloc() allocator for
its buffers because when logging a write that buffer may be as large
as 64K.  This is non-optimal because we may need to allocate many of
of these buffers and this interface has the potential to be slow.
Instead, use zio_data_buf_alloc() which is specifically designed to
be able to efficiently allocate a wide range of buffer sizes.

In addition, do some cleanup and use the zil_itx_destroy() function
to always free an itx structure.  This way we're always sure the
right allocation functions are used.  Notice that in the current
code kmem_free() and vmem_free() were both used.  This happened to
work because these wrappers map to the same internal SPL function.

This was identified as a potential problem when a low-end memory
constrained system began logging the following warnings.  There
was no deadlock here just repeated allocation failures resulting
in increased latency.

Possible memory allocation deadlock: size=65792 lflags=0x42d0
Pid: 20118, comm: kvm Tainted: P           O 3.2.0-0.bpo.4-amd64
Call Trace:
 [<ffffffffa040b834>] ? spl_kmem_alloc_impl+0x115/0x127 [spl]
 [<ffffffffa040b84f>] ? spl_kmem_alloc_debug+0x9/0x36 [spl]
 [<ffffffffa05d8a0b>] ? zil_itx_create+0x2d/0x59 [zfs]
 [<ffffffffa05c71e6>] ? zfs_log_write+0x13a/0x2f0 [zfs]
 [<ffffffffa05d41bc>] ? zfs_write+0x85b/0x9bb [zfs]
 [<ffffffffa05e37ec>] ? zpl_aio_write+0xca/0x110 [zfs]
 [<ffffffff811088e5>] ? do_sync_readv_writev+0xa3/0xde
 [<ffffffff81108f41>] ? do_readv_writev+0xaf/0x125
 [<ffffffff81109055>] ? sys_pwritev+0x55/0x9a
 [<ffffffff813721d2>] ? system_call_fastpath+0x16/0x1b

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Closes #3059
2015-02-02 11:20:41 -08:00