Commit Graph

1403 Commits

Author SHA1 Message Date
George Wilson
93cf20764a Illumos #4101, #4102, #4103, #4105, #4106
4101 metaslab_debug should allow for fine-grained control
4102 space_maps should store more information about themselves
4103 space map object blocksize should be increased
4105 removing a mirrored log device results in a leaked object
4106 asynchronously load metaslab
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: Sebastien Roy <seb@delphix.com>
Approved by: Garrett D'Amore <garrett@damore.org>

Prior to this patch, space_maps were preferred solely based on the
amount of free space left in each. Unfortunately, this heuristic didn't
contain any information about the make-up of that free space, which
meant we could keep preferring and loading a highly fragmented space map
that wouldn't actually have enough contiguous space to satisfy the
allocation; then unloading that space_map and repeating the process.

This change modifies the space_map's to store additional information
about the contiguous space in the space_map, so that we can use this
information to make a better decision about which space_map to load.
This requires reallocating all space_map objects to increase their
bonus buffer size sizes enough to fit the new metadata.

The above feature can be enabled via a new feature flag introduced by
this change: com.delphix:spacemap_histogram

In addition to the above, this patch allows the space_map block size to
be increase. Currently the block size is set to be 4K in size, which has
certain implications including the following:

    * 4K sector devices will not see any compression benefit
    * large space_maps require more metadata on-disk
    * large space_maps require more time to load (typically random reads)

Now the space_map block size can adjust as needed up to the maximum size
set via the space_map_max_blksz variable.

A bug was fixed which resulted in potentially leaking an object when
removing a mirrored log device. The previous logic for vdev_remove() did
not deal with removing top-level vdevs that are interior vdevs (i.e.
mirror) correctly. The problem would occur when removing a mirrored log
device, and result in the DTL space map object being leaked; because
top-level vdevs don't have DTL space map objects associated with them.

References:
  https://www.illumos.org/issues/4101
  https://www.illumos.org/issues/4102
  https://www.illumos.org/issues/4103
  https://www.illumos.org/issues/4105
  https://www.illumos.org/issues/4106
  https://github.com/illumos/illumos-gate/commit/0713e23

Porting notes:

A handful of kmem_alloc() calls were converted to kmem_zalloc(). Also,
the KM_PUSHPAGE and TQ_PUSHPAGE flags were used as necessary.

Ported-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2488
2014-07-22 09:39:16 -07:00
Prakash Surya
1be627f5c2 Move metaslab_group_alloc_update() call
This changes moves the called to metaslab_group_alloc_update() to the
metaslab_sync_reassess() function. The original placement of the call
within metaslab_sync_done() appears to have been a simple mistake,
introduced by ac72fac3ea.

This aligns us more closely to the upstream illumos code base.

Signed-off-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-07-22 09:38:16 -07:00
Brian Behlendorf
1e8db77102 Fix zil_commit() NULL dereference
Update the current code to ensure inodes are never dirtied if they are
part of a read-only file system or snapshot.  If they do somehow get
dirtied an attempt will make made to write them to disk.  In the case
of snapshots, which don't have a ZIL, this will result in a NULL
dereference in zil_commit().

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2405
2014-07-17 15:15:07 -07:00
Richard Yao
a5778ea242 zdb: Introduce -V for verbatim import
When given a pool name via -e, zdb would attempt an import. If it
failed, then it would attempt a verbatim import. This behavior is
not always desirable so a -V switch is added to zdb to control the
behavior. When specified, a verbatim import is done. Otherwise,
the behavior is as it was previously, except no verbatim import
is done on failure.

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2372
2014-07-17 11:40:32 -07:00
George Wilson
2fbc542ebd Illumos 4168, 4169, 4170: ztest, zdb and zhack fixes
4168 ztest assertion failure in dbuf_undirty
4169 verbatim import causes zdb to segfault
4170 zhack leaves pool in ACTIVE state
Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: Eric Schrock <eric.schrock@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@nexenta.com>

References:
    https://www.illumos.org/issues/4168
    https://www.illumos.org/issues/4169
    https://www.illumos.org/issues/4170
    https://github.com/illumos/illumos-gate/commit/7fdd916

Porting notes:

Of particular interest when troubleshooting corrupted pools, the
commonly-used "zdb -e" operation may perform verbatim imports and
furthermore, it will soon have direct support for verbatim imports via
a new "-V" option.  The 4169 fix eliminates a common segfault case in
which spa_history_log_version() tries to access an un-opened dsl_pool_t.

Ported by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2451
Closes #2283
Closes #2467
2014-07-17 11:37:57 -07:00
Tim Chase
f4a4046bd6 Convert zfs_mg_noalloc_threshold to a module parameter and document
The parameter was added as illumos issue 4081 which was committed to
zfsonlinux in ac72fac3ea.  This patch
documents the parameter and allows for it to be set as a module parameter.

Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2483
2014-07-16 16:49:25 -07:00
Matthew Ahrens
d586964141 Illumos #3641 compressed block histograms with zdb
This patch is a zdb extension of the '-b' option, producing a histogram
of the physical compressed block sizes per DMU object type on disk. The
'-bbbb' option to zdb will uncover this new feature; here's an example
usage on a new pool and snippet of the output it generates:

    # zpool create tank /dev/vd{b,c,d}
    # dd bs=1k  if=/dev/urandom of=/tank/1kfile  count=1
    # dd bs=3k  if=/dev/urandom of=/tank/3kfile  count=1
    # dd bs=64k if=/dev/urandom of=/tank/64kfile count=1
    # zdb -bbbb tank
    ...
         3  68.0K   68.0K   68.0K   22.7K    1.00    34.26  ZFS plain file
    psize (in 512-byte sectors): number of blocks
                              2:      1 *
                              3:      0
                              4:      0
                              5:      0
                              6:      1 *
                              7:      0
    ...
                            127:      0
                            128:      1 *
    ...

The blocks are also broken down by their indirection level. Expanding on
the above example:

    # zfs set recordsize=1k tank
    # dd bs=1k if=/dev/urandom of=/tank/2x1kfile count=2
    # zdb -bbbb tank
    ...
         1    16K      1K      2K      2K   16.00     1.02      L1 ZFS plain file
    psize (in 512-byte sectors): number of blocks
                              2:      1 *
         5  70.0K   70.0K   70.0K   14.0K    1.00    35.71      L0 ZFS plain file
    psize (in 512-byte sectors): number of blocks
                              2:      3 ***
                              3:      0
                              4:      0
                              5:      0
                              6:      1 *
                              7:      0
    ...
                            127:      0
                            128:      1 *
         6  86.0K   71.0K   72.0K   12.0K    1.21    36.73  ZFS plain file
    psize (in 512-byte sectors): number of blocks
                              2:      4 ****
                              3:      0
                              4:      0
                              5:      0
                              6:      1 *
                              7:      0
    ...
                            127:      0
                            128:      1 *
    ...

There's now a single 1K L1 block which is the indirect block needed for
the '2x1kfile' file just created, as well as two more 1K L0 blocks from
the same file.

This can be used to get a distribution of the block sizes used within
the pool, on a per object type basis.

References:
  https://illumos.org/issues/3641
  https://github.com/illumos/illumos-gate/commit/490d05b

Ported by: Tim Chase <tim@chase2k.com>
Signed-off-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Boris Protopopov <boris.protopopov@me.com>
Closes #2456
2014-07-16 11:52:46 -07:00
Andrew Barnes
61e99a73bc Preserve asize when last mirror child promoted to top-level vdev
If the smaller of 2 different sized child vdev's of a mirrored vdev is
detached, and the pool has the autoexpand property set to off, as the
remaining larger vdev is promoted to a top level vdev it fails to retain
the asize of the original top level mirror vdev and therefore partially
autoexpands.

This partially autoexpanded state leaves the new vdev too large to
re-mirror by adding the smaller vdev back in, and the pool fails to
utilize the space until next imported.

If the autoexpand property is set to on, the child vdev grows
in size after it has been promoted to a top level vdev as expected.

This commit causes the remaining child mirror to retain the asize of its
old parent mirror vdev if the autoexpand property is set to off,
this allows the smaller vdev to be re-added if required the vdev
can then be told to expand if required by the usual using zpool online -e.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrew Barnes <barnes333@gmail.com>
Signed-off-by: George Wilson <george.wilson@delphix.com>
Closes #1208
2014-07-02 14:04:29 -07:00
Garrison Jensen
b8fce77b08 Fix comment spelling errors.
Signed-off-by: Garrison Jensen <garrison.jensen@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2402
2014-07-01 14:20:23 -07:00
Tim Chase
52e68edc2d Document the optional "device" argument for "zpool split"
Most ZFS implementations seemed to have missed this bit of documentation.
The additional text is based on FreeBSD's man page.

Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2416
2014-07-01 14:16:43 -07:00
Tim Chase
09c0b8fe5e Return default value on numeric properties failing the "head check.
Updates 962d524212.

The referenced fix to get_numeric_property() caused numeric property
lookups to consider the type of the parent (head) dataset when checking
validity but there are some cases in the caller expects to see the
property's default value even when the lookup is invalid.

One case in which this is true is change_one() which is part of the
renaming infrastructure.  It may look up "zoned" on a snapshot of a volume
which is not valid but it expects to see the default value of false.

There may be other, yet unidentified cases in which zfs_prop_get_int()
is used on technically invalid properties but which expect the property's
default value to be returned.

Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Turbo Fredriksson <turbo@bayour.com>
Closes #2320
2014-07-01 14:14:31 -07:00
Dan McDonald
ee4712284c Illumos #4936 fix potential overflow in lz4
4936 lz4 could theoretically overflow a pointer with a certain input
Reviewed by: Saso Kiselkov <skiselkov.ml@gmail.com>
Reviewed by: Keith Wesolowski <keith.wesolowski@joyent.com>
Approved by: Gordon Ross <gordon.ross@nexenta.com>
Ported by: Tim Chase <tim@chase2k.com>

References:
  https://illumos.org/issues/4936
  https://github.com/illumos/illumos-gate/commit/58d0718

Porting notes:

This fixes the widely-reported "20-year-old vulnerability" in
LZO/LZ4 implementations which inherited said bug from the reference
implementation.

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2429
2014-07-01 14:10:47 -07:00
Tim Chase
4240dc332d Comment the lack of real_LZ4_uncompress()
Added several comments regarding the removal of real_LZ4_uncompress()
which exists in the reference implementation but has been removed here
since it's not used.

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-07-01 14:09:56 -07:00
Brian Behlendorf
d4aae2a054 Improve differing sector size error
When adding or replacing a vdev with a different sector size the
error message should be more useful.  In addition to describing
the problem provide a hint that the '-o ashift' option can be
used to override the optimal default value.

Since using a non-optimal value may incur a significant performance
penalty we should issue this error.  But there a numerous reasons
why a administrator may wish to do this anyway.

Signed-off-by: Niklas Edmundsson <ZNikke@github>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2421
2014-06-27 11:44:36 -07:00
Turbo Fredriksson
628668a39f Add information about the -o option to zpool replace
Users need to be aware that when replacing devices in an existing
pool they may need to override automatically detected ashift value.
This will all depend on the exact hardware they are using.

Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #2024
2014-06-27 08:31:07 -07:00
SenH
1567e0758b Fix man zpool property feature_guid
The property name gets mangled with the explanation due to the property
length.  Fixed by putting the explanation on the next line.

Before:
  unsupported@feature_Info rmation about unsupported features that are
  enabled on the pool. See zpool-features(5) for details.

After:
  unsupported@feature_guid
  Information about unsupported features that are enabled on the pool. See
  zpool-features(5) for details.

Signed-off-by: SenH <sen@senhaerens.be>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2419
2014-06-26 16:21:15 -07:00
Brian Behlendorf
07dabd234d Tag zfs-0.6.3
META file and release log updated.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-06-12 13:34:38 -07:00
Brian Behlendorf
5d2107d82b Fix zfs.spec.in defaults
Commit 2ee4e7da accidentally introduced two issues which only occur
when rebuilding the ZFS source rpm outside the ZFS build system.

1) The _dracutdir, _udevdir, and _udevruledir macros must be checked
   using the 'undefined' keyword.  This was just overlooked in the
   patch review and does not cause a failure when using 'make pkg'
   because the values are provided by the make target.

2) The default _udevruledir path included a typo.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #2310
2014-06-12 13:34:33 -07:00
Turbo Fredriksson
2ee4e7da90 Accept udev and dracut paths specified by ./configure
There are two common locations where udev and dracut components are
commonly installed.  When building packages using the 'make rpm|deb'
targets check those common locations and pass them to rpmbuild.  For
non-standard configurations these values can be provided by the
the following configure options:

  --with-udevdir=DIR      install udev helpers [default=check]
  --with-udevruledir=DIR  install udev rules [[UDEVDIR/rules.d]]
  --with-dracutdir=DIR    install dracut helpers [default=check]

When rebuilding using the source packages the per-distribution
default values specified in the spec file will be used.  This is
the preferred way to build packages for a distribution but the
ability to override the defaults is provided as a convenience.

Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2310
Closes #1680
2014-06-11 16:32:57 -07:00
Brian Behlendorf
7f6884f419 Revert "Fix __zio_execute() asynchronous dispatch"
This reverts commit 91579709fc which
limited the asynchronous dispatch to kernel space.  We want to do
this for two reasons:

1) While we have slightly more headroom in user space excessively
   deep stacks have been observed while running ztest, see #2293.

2) Removing this conditional makes the pipeline behave consistently
   regardless of if it's executing in kernel space or user space.
   This way we're more likely to uncover subtle issues with ztest.

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2384
2014-06-11 16:32:57 -07:00
Turbo Fredriksson
0f629346bb Set LANG to a reasonable default (C)
Set LANG=C before calling 'rpmbuild' to avoid rpmbuild failing on
the translated date string in the changelog.

Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes: zfsonlinux/spl#306
2014-06-10 16:46:21 -07:00
Turbo Fredriksson
21b446a79e Document the -X and -T options to 'zpool import'
These options have existed for a long time but have historically
been undocumented because they are not guaranteed to be safe.  They
should only be used as a last resort when attempting to recover a
damaged pool.

Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1130
2014-06-06 15:49:34 -07:00
Tim Chase
27b293be8a Expand the description of scan-related and other parameters.
Document that the scan-related parameters are, in fact, applicable only
to scrub and/or resilver operations as appropriate.

Expand a few of the prefetch-related descriptions.

Add clarification to other module parameters.

Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2361
2014-06-06 13:04:43 -07:00
Turbo Fredriksson
beb4be77b7 Man page updates for 'zfs share'
* Remove the references to share(1M), unshare(1M) and dfstab(4)
  since they are not applicable to Linux.
* Add the exact exportfs command line used when setting sharenfs=on.

Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue: #1641
2014-06-06 13:00:51 -07:00
Turbo Fredriksson
022f7bf68e Document the fact that ashift is vdev specific, not a pool global.
Users need to be aware that when adding devices to an existing pool
they may need to override automatically detected ashift value.
This will all depend on the exact hardware they are using.

Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes: #2024
2014-06-06 12:52:01 -07:00
Turbo Fredriksson
480f62655d Only automatically mount a clone when 'canmount == on'.
According to the man page, "When the noauto option is set, a dataset
can only be mounted and unmounted explicitly. The dataset is not
mounted automatically when the dataset is created or imported ...."

When cloning a dataset the canmount property was not being honored.
This patch adds the required check to achieve the behavior described
in the man page.

Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2241
2014-06-06 12:30:35 -07:00
Derek Dai
7a870db1b9 Do not export pool to prevent cache from been removed
Signed-off-by: Derek Dai <daiderek@gmail.com>
Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2353
2014-06-05 13:49:15 -07:00
Turbo Fredriksson
69c7bdb6e7 Accept kernel source dir(s) specified by ./configure
This adds ability to set the location of the kernel via defines
when building from the spec files.  This is useful when building
against a kernel installed in a non-standard location.

Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1874
2014-06-05 13:46:49 -07:00
Ben Allen
8b974ba036 Update spec file to enable systemd for RHEL7
Signed-off-by: Ben Allen <bsallen@alcf.anl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2355
2014-06-05 11:21:40 -07:00
Turbo Fredriksson
c9b5cc8c00 Move the libraries into separate packages
From day one the various ZFS libraries should have been placed in their
own sub-packages.  Primarily this allows for multiple major versions of
the libraries to be concurrently installed.  It also facilitates a
smaller build environment by minimizing the required dependencies.

The specific changes required to split the libraries from the utilities
are as follows:

* libzpool2, libnvpair1, libuutil1, and libzfs2 packages were added
  and contain the versioned shared libraries.  The Fedora packaging
  guidelines discourage providing static libraries so they are not
  included in the packages.

  http://fedoraproject.org/wiki/Packaging:Guidelines#Packaging_Static_Libraries

* The zfs-devel package was renamed libzfs2-devel and the new package
  obsoletes the old zfs-devel package.   This package includes all
  the required headers for the libzpool2, libnvpair1, libuutil1, and
  libzfs2 libraries and their respective unversioned shared libraries.

  This package should eventually be split in to individual lib*-devel
  packages but it will still take some work to cleanly separate them.
  Therefore the libzfs2-devel package provides the expected lib*-devel
  packages so the all proper dependencies can still be created.

  http://fedoraproject.org/wiki/Packaging:Guidelines#Devel_Packages

* Moved '/sbin/ldconfig' execution from the zfs packge to each of the
  new library packages as described by the packaging guidelines.

  http://fedoraproject.org/wiki/Packaging:Guidelines#Shared_Libraries

* The /usr/share/doc/ files were moved in to the libzfs2-devel package.

* Updated config/deb.am to be aware of the packaging changes.  This
  ensures that 'deb-utils' make target converts all the resulting
  packages generated by the 'rpm-utils' target.

Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes: #2329
Closes: #2341
Issue: #2145
2014-06-02 13:43:20 -07:00
Richard Yao
2024041b6c Remove superfluous statement
Clang's static analyzer reported that the value assigned to pcksum is
never used. That is because we initialize both zc and pcksum to {{ 0 }}
and then do `pcksum = zc;`. That is fairly pointless. However, it has
the effect of generating a false positive in Clang's static analyzer.
Since noise from false positives can obscure real issues, we fix it
anyway.

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Ned Bass <bass6@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #2330
2014-05-30 17:02:37 -07:00
Richard Yao
4def05f8a6 Fix memory leak in zpool_clear_label()
Clang's static analyzer reported a memory leak in zpool_clear_label().
Upon review, it turns out to be right. This should be a very short lived
leak because no daemons use this functionality, but that does not
preclude the possibility of third party daemons that do use it. Lets fix
it to be a good Samaritan.

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Ned Bass <bass6@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #2330
2014-05-30 17:00:37 -07:00
Chris Wedgwood
62a05896e8 Allow building without ACLs
Some kernel definitions were buried inside the #if... #endif logic for
ACLs.  When ACLs are not available these definitions get lost causing
the build to fail.

Signed-off-by: Chris Wedgwood <cw@f00f.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2349
2014-05-30 12:01:57 -07:00
Brian Behlendorf
866c162340 Fix DKMS package upgrade and packager
Running 'yum upgrade zfs-dkms' package could appear to work properly
and still leave you with no zfs modules installed.  This will occur
when only the zfs release, and not the version, are incremented.
This may be the case for a fast moving zfs-testing repository.

During the upgrade process DKMS will realize that zfs-x.y.z is already
installed and remove it.  DKMS then correctly builds the new modules
for zfs-x.y.z.  However, as a final step when the old zfs-x.y.z-r is
removed the %preun script runs and removes the newly build modules.
To handle this case the %preun script has been updated to only run
when the installed version exactly matches the full spec file version.

This change also updated ChangeLog section based on the DKMS
reference spec file.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-05-30 11:23:48 -07:00
Brian Behlendorf
79aada6105 Restrict release number to META version
When creating packages in a git repository the release number
can be automatically set by 'git describe'.  This normally works
well but if your repository has newer tags which match the form
NAME-VERSION* the release may be incorrectly calculated.  To
prevent this the match patten has been restricted to the contents
of the META file, NAME-VERSION.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-05-29 19:31:50 -07:00
Brian Behlendorf
6795a698f4 Use default slab types
We should not override the default memory type of the kmem cache.  This
was done previously to force certain objects which were slightly over
object size limit cut off in to KMC_KMEM caches for better performance.

The zfsonlinux/spl#356 patch slightly increases the default cut off
from 511 bytes 1024 bytes for x86_64.  This means there is long longer
a need to override the default for the caches.  And since the default
values are now being used the new spl_kmem_cache_slab_limit and
spl_kmem_cache_kmem_limit tunables will apply to all kmem caches.

The following is a list of caches that will be impacted:

                  | object size   | forced type   | default type
----------------- | ------------- | ------------- | --------------
dnode_t           | 936 bytes     | KMC_KMEM      | KMC_KMEM
zio_cache         | 1104 bytes    | *KMC_KMEM     | *KMC_VMEM
zio_link_cache    | 48 bytes      | KMC_KMEM      | KMC_KMEM
zio_vdev_cache    | 131088 bytes  | KMC_VMEM      | KMC_VMEM
zio_buf_512       | 512 bytes     | KMC_KMEM      | KMC_KMEM
zio_data_buf_512  | 512 bytes     | KMC_KMEM      | KMC_KMEM
zio_buf_1024      | 1024 bytes    | KMC_KMEM      | KMC_KMEM
zio_data_buf_1024 | 1024 bytes    | +KMC_VMEM     | +KMC_KMEM

* Cache memory type will change from KMC_KMEM to KMC_VMEM.
+ Cache memory type will change from KMC_VMEM to KMC_KMEM.

This patch removes another slight point of divergence between ZoL
and Illumos.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Prakash Surya <surya1@llnl.gov>
Closes #2337
2014-05-22 10:39:52 -07:00
Marcel Huber
58bd7ad060 Omit compiler warning by sticking to RAII
Resolve gcc 4.9.0 20140507 warnings about uninitialized 'ptr' when
using -Wmaybe-uninitialized.  The first two cases appears appear
to be legitimate but not the second two.  In general this is a
good practice so they are all initialized.

Signed-off-by: Marcel Huber <marcelhuberfoo@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2345
2014-05-22 09:44:15 -07:00
John Albietz
5f3c101b8f Added INTEL SSD 530 Series
INTEL SSD 530 Series... SSDSC2BW24

Signed-off-by: John Albietz <inthecloud247@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2184
2014-05-19 16:57:14 -07:00
HC
f9a1ac4d59 Honor zfs_nocacheflush for file vdevs
For consistency with disk vdevs honor the zfs_nocacheflush tunable.
This setting is available primarily for debugging and performance
analysis.

Signed-off-by: HC <mmttdebbcc@yahoo.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2336
2014-05-19 13:30:48 -07:00
Tim Chase
83021b47c2 Calculate header size correctly in sa_find_sizes()
In the case where a variable-sized SA overlaps the spill block pointer and
a new variable-sized SA is being added, the header size was improperly
calculated to include the to-be-moved SA.  This problem could be
reproduced when xattr=sa enabled as follows:

	ln -s $(perl -e 'print "x" x 120') blah
	setfattr -n security.selinux -v blahblah -h blah

The symlink is large enough to interfere with the spill block pointer and
has a typical SA registration as follows (shown in modified "zdb -dddd"
<SA attr layout obj> format):

	[ ... ZPL_DACL_COUNT ZPL_DACL_ACES ZPL_SYMLINK ]

Adding the SA xattr will attempt to extend the registration to:

	[ ... ZPL_DACL_COUNT ZPL_DACL_ACES ZPL_SYMLINK ZPL_DXATTR ]

but since the ZPL_SYMLINK SA interferes with the spill block pointer, it
must also be moved to the spill block which will have a registration of:

	[ ZPL_SYMLINK ZPL_DXATTR ]

This commit updates extra_hdrsize when this condition occurs, allowing
hdrsize to be subsequently decreased appropriately.

Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ned Bass <bass6@llnl.gov>
Issue #2214
Issue #2228
Issue #2316
Issue #2343
2014-05-19 11:55:50 -07:00
Tim Chase
3937ab20f3 Allow for lock-free reading zfsdev_state_list.
Restructure the zfsdev_state_list to allow for lock-free reading by
converting to a simple singly-linked list from which items are never
deleted and over which only forward iterations are performed.  It depends
on, among other things, the atomicity of accessing the zs_minor integer
and zs_next pointer.

This fixes a lock inversion in which the zfsdev_state_lock is used by
both the sync task (txg_sync) and indirectly by any user program which
uses /dev/zfs; the zfsdev_release method uses the same lock and then
blocks on the sync task.

The most typical failure scenerio occurs when the sync task is cleaning
up a user hold while various concurrent "zfs" commands are in progress.

Neither Illumos nor Solaris are affected by this issue because they use
DDI interface which provides lock-free reading of device state via the
ddi_get_soft_state() function.

Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2301
2014-05-19 11:45:11 -07:00
Richard Yao
1cbae971c5 Handle ZPOOL_STATUS_HOSTID_MISMATCH in zpool status
Verbatim imports can cause hostid mismatches, but things otherwise work. `zpool
status` does not handle this and will fail when assertions are enabled:

```
zpool: ../../cmd/zpool/zpool_main.c:4418: status_callback: Assertion `reason == ZPOOL_STATUS_OK' failed.

Program received signal SIGABRT, Aborted.
```

Lets instead add a case to display an informative message such as this:

```
  pool: rpool
 state: ONLINE
status: Mismatch between pool hostid and system hostid on imported pool.
        This pool was previously imported into a system with a different hostid,
        and then was verbatim imported into this system.
action: Export this pool on all systems on which it is imported.
        Then import it to correct the mismatch.
   see: http://zfsonlinux.org/msg/ZFS-8000-EY
  scan: scrub repaired 0 in 0h8m with 0 errors on Thu Apr 17 19:43:57 2014
config:

        NAME        STATE     READ WRITE CKSUM
        rpool       ONLINE       0     0     0
          sda       ONLINE       0     0     0

errors: No known data errors
```

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2342
2014-05-19 10:16:29 -07:00
Chunwei Chen
bc25c9325b Use a dedicated taskq for vdev_file
Originally, vdev_file used system_taskq. This would cause a deadlock,
especially on system with few CPUs. The reason is that the prefetcher
threads, which are on system_taskq, will sometimes be blocked waiting
for I/O to finish. If the prefetcher threads consume all the tasks in
system_taskq, the I/O cannot be served and thus results in a deadlock.

We fix this by creating a dedicated vdev_file_taskq for vdev_file I/O.

Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2270
2014-05-14 16:20:21 -07:00
Brian Behlendorf
2c33b91275 Handle vdev_lookup_top() failure in dva_get_dsize_sync()
The dva_get_dsize_sync() function incorrectly assumes that the call
to vdev_lookup_top() cannot fail.  However, the NULL dereference at
clearly shows that under certain circumstances it is possible.  Note
that offset 0x570 (1376) maps as expected to vd->vdev_deflate_ratio.

  BUG: unable to handle kernel NULL pointer dereference at 00000570

  crash> struct -o vdev
  struct vdev {
       [0] uint64_t vdev_id;
       ... ...
    [1376] uint64_t vdev_deflate_ratio;

Given that this can happen this patch add the required error handling.
In the case where vdev_lookup_top() fails assume that no deflation
will occur for the DVA and use the asize.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ned Bass <bass6@llnl.gov>
Signed-off-by: Alexey Zhuravlev <alexey.zhuravlev@intel.com>
Closes #1707
Closes #1987
Closes #1891

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-05-06 10:41:48 -07:00
Tim Chase
962d524212 Check the dataset type more rigorously when fetching properties.
When fetching property values of snapshots, a check against the head
dataset type must be performed.  Previously, this additional check was
performed only when fetching "version", "normalize", "utf8only" or "case".

This caused the ZPL properties "acltype", "exec", "devices", "nbmand",
"setuid" and "xattr" to be erroneously displayed with meaningless values
for snapshots of volumes.  It also did not allow for the display of
"volsize" of a snapshot of a volume.

This patch adds the headcheck flag paramater to zfs_prop_valid_for_type()
and zprop_valid_for_type() to indicate the check is being done
against a head dataset's type in order that properties valid only for
snapshots are handled correctly.  This allows the the head check in
get_numeric_property() to be performed when fetching a property for
a snapshot.

Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2265
2014-05-06 10:41:46 -07:00
Brian Behlendorf
1ce0457348 Fix style
A minor style issue was accidentally introduced by aa7d06a.
This change resolves that style problem.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-05-06 10:41:17 -07:00
George Wilson
aa7d06a98a Illumos #4101 finer-grained control of metaslab_debug
Today the metaslab_debug logic performs two tasks:

- load all metaslabs on import/open
- don't unload metaslabs at the end of spa_sync

This change provides knobs for each of these independently.

References:
  https://illumos.org/issues/4101
  https://github.com/illumos/illumos-gate/commit/0713e23

Notes:

1) This is a small piece of the metaslab improvement patch from
Illumos. It was worth bringing over before the rest, since it's
low risk and it can be useful on fragmented pools (e.g. Lustre
MDTs). metaslab_debug_unload would give the performance benefit
of the old metaslab_debug option without causing unwanted delay
during pool import.

Ported-by: Ned Bass <bass6@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2227
2014-05-06 09:46:04 -07:00
Brian Behlendorf
cc79a5c263 Treat spill block dbufs as meta data
When the system attributes (SAs) for an object exceed what can
can be stored in the bonus area of a dnode a spill block is
allocated.  These spill blocks are currently considered data
blocks.  However, they should be accounted for as meta data
because they are effectively an extension of the dnode.

While this may seem like a minor accounting issue it has broader
implications.  The key thing to be aware of is that each spill
block will hold a reference on its parent dnode.  The dnode in
turn holds a reference on its dbuf in the dnode object.  This
means that a single 512 byte data buffer for a spill block can
pin over 16k of meta data.  This is analogous to the small file
situation described in 2b13331 where a relatively small number
of data buffer can cause the ARC to exceed the meta limit.

However, unlike the small file case a spill block can legitimately
be considered meta data.  By changing the spill block to meta data
they will now be dropped from the cache when the meta limit is
reached.  This then allows the dnodes and dbufs which the spill
block was pinning to be released.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Prakash Surya <surya1@llnl.gov>
Closes #2294
2014-05-05 13:56:59 -07:00
Brian Behlendorf
51268f31a8 Remove SELinux enforcing check from init scripts
The default SELinux policy for RHEL and Fedora has been updated
to include ZFS in the list of filesystems which support xattrs.
Therefore, there's no longer a need to detect this in the init
scripts.

References:
  https://bugzilla.redhat.com/show_bug.cgi?id=811532
  https://bugzilla.redhat.com/show_bug.cgi?id=816543

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2166
2014-05-02 11:37:46 -07:00
Richard Yao
7809eb8b65 ztest: Switch to LWP rwlock interface
ztest is intended to subject the ZFS code in userland to stress that it
should be able to withstand. Any failures that occur when running it are
failures that likely would occur inside the kernel. However, being in
userland, it is much easier to debug them. In practice, this prevents
a large number of problems from reaching production code.

A design decision was made by the original authors of ztest to make a
distinction between userland locking primitives and kernel locking
primitives. The ztest code itself calls userland locking primitives
while the kernel code being run in userland will call emulated kernel
locking primitives that wrap the userland locking primitives.

When ztest was first ported to Linux, a decision was made to use the
emulated kernel interfaces everywhere. In effect, the userland
rw_rdlock()/rw_wrlock() became the kernel rw_enter() and and the userland
rw_unlock() became the kernel rw_exit(). This caused a regression
because of an assertion in rw_enter() to catch recursive locking. That
is permitted in userland, but not in the kernel. Consequently, the ztest
code itself does recursive read locking. The use of the emulated kernel
interfaces consequently caused the following failure:

ztest: ../../lib/libzpool/kernel.c:384: Assertion `rwlp->rw_owner !=
zk_thread_current() (0x1c87150 != 0x1c87150)' failed.

That occurs because ztest_dmu_objset_create_destroy() will take a read
lock and call ztest_dmu_object_alloc_free(). That will call ztest_io(),
which will take a readlock only when asked to do ZTEST_IO_REWRITE. This
triggered the assertion.

The pthreads rwlock interface was based on the LWP rwlock interface
implemented in Illumos libc. Luckily enough, the subset used by ztest is
almost identical, so we can solve this problem by switching to the LWP
thread rwlock interface in ztest. This eliminates a point of divergence
with Illumos and should make code sharing slightly easier.

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1970
2014-05-01 15:53:58 -07:00