Commit Graph

132 Commits

Author SHA1 Message Date
Prakash Surya
0b39b9f96f Swap DTRACE_PROBE* with Linux tracepoints
This patch leverages Linux tracepoints from within the ZFS on Linux
code base. It also refactors the debug code to bring it back in sync
with Illumos.

The information exported via tracepoints can be used for a variety of
reasons (e.g. debugging, tuning, general exploration/understanding,
etc). It is advantageous to use Linux tracepoints as the mechanism to
export this kind of information (as opposed to something else) for a
number of reasons:

    * A number of external tools can make use of our tracepoints
      "automatically" (e.g. perf, systemtap)
    * Tracepoints are designed to be extremely cheap when disabled
    * It's one of the "accepted" ways to export this kind of
      information; many other kernel subsystems use tracepoints too.

Unfortunately, though, there are a few caveats as well:

    * Linux tracepoints appear to only be available to GPL licensed
      modules due to the way certain kernel functions are exported.
      Thus, to actually make use of the tracepoints introduced by this
      patch, one might have to patch and re-compile the kernel;
      exporting the necessary functions to non-GPL modules.

    * Prior to upstream kernel version v3.14-rc6-30-g66cc69e, Linux
      tracepoints are not available for unsigned kernel modules
      (tracepoints will get disabled due to the module's 'F' taint).
      Thus, one either has to sign the zfs kernel module prior to
      loading it, or use a kernel versioned v3.14-rc6-30-g66cc69e or
      newer.

Assuming the above two requirements are satisfied, lets look at an
example of how this patch can be used and what information it exposes
(all commands run as 'root'):

    # list all zfs tracepoints available

    $ ls /sys/kernel/debug/tracing/events/zfs
    enable              filter              zfs_arc__delete
    zfs_arc__evict      zfs_arc__hit        zfs_arc__miss
    zfs_l2arc__evict    zfs_l2arc__hit      zfs_l2arc__iodone
    zfs_l2arc__miss     zfs_l2arc__read     zfs_l2arc__write
    zfs_new_state__mfu  zfs_new_state__mru

    # enable all zfs tracepoints, clear the tracepoint ring buffer

    $ echo 1 > /sys/kernel/debug/tracing/events/zfs/enable
    $ echo 0 > /sys/kernel/debug/tracing/trace

    # import zpool called 'tank', inspect tracepoint data (each line was
    # truncated, they're too long for a commit message otherwise)

    $ zpool import tank
    $ cat /sys/kernel/debug/tracing/trace | head -n35
    # tracer: nop
    #
    # entries-in-buffer/entries-written: 1219/1219   #P:8
    #
    #                              _-----=> irqs-off
    #                             / _----=> need-resched
    #                            | / _---=> hardirq/softirq
    #                            || / _--=> preempt-depth
    #                            ||| /     delay
    #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
    #              | |       |   ||||       |         |
            lt-zpool-30132 [003] .... 91344.200050: zfs_arc__miss: hdr...
          z_rd_int/0-30156 [003] .... 91344.200611: zfs_new_state__mru...
            lt-zpool-30132 [003] .... 91344.201173: zfs_arc__miss: hdr...
          z_rd_int/1-30157 [003] .... 91344.201756: zfs_new_state__mru...
            lt-zpool-30132 [003] .... 91344.201795: zfs_arc__miss: hdr...
          z_rd_int/2-30158 [003] .... 91344.202099: zfs_new_state__mru...
            lt-zpool-30132 [003] .... 91344.202126: zfs_arc__hit: hdr ...
            lt-zpool-30132 [003] .... 91344.202130: zfs_arc__hit: hdr ...
            lt-zpool-30132 [003] .... 91344.202134: zfs_arc__hit: hdr ...
            lt-zpool-30132 [003] .... 91344.202146: zfs_arc__miss: hdr...
          z_rd_int/3-30159 [003] .... 91344.202457: zfs_new_state__mru...
            lt-zpool-30132 [003] .... 91344.202484: zfs_arc__miss: hdr...
          z_rd_int/4-30160 [003] .... 91344.202866: zfs_new_state__mru...
            lt-zpool-30132 [003] .... 91344.202891: zfs_arc__hit: hdr ...
            lt-zpool-30132 [001] .... 91344.203034: zfs_arc__miss: hdr...
          z_rd_iss/1-30149 [001] .... 91344.203749: zfs_new_state__mru...
            lt-zpool-30132 [001] .... 91344.203789: zfs_arc__hit: hdr ...
            lt-zpool-30132 [001] .... 91344.203878: zfs_arc__miss: hdr...
          z_rd_iss/3-30151 [001] .... 91344.204315: zfs_new_state__mru...
            lt-zpool-30132 [001] .... 91344.204332: zfs_arc__hit: hdr ...
            lt-zpool-30132 [001] .... 91344.204337: zfs_arc__hit: hdr ...
            lt-zpool-30132 [001] .... 91344.204352: zfs_arc__hit: hdr ...
            lt-zpool-30132 [001] .... 91344.204356: zfs_arc__hit: hdr ...
            lt-zpool-30132 [001] .... 91344.204360: zfs_arc__hit: hdr ...

To highlight the kind of detailed information that is being exported
using this infrastructure, I've taken the first tracepoint line from the
output above and reformatted it such that it fits in 80 columns:

    lt-zpool-30132 [003] .... 91344.200050: zfs_arc__miss:
        hdr {
            dva 0x1:0x40082
            birth 15491
            cksum0 0x163edbff3a
            flags 0x640
            datacnt 1
            type 1
            size 2048
            spa 3133524293419867460
            state_type 0
            access 0
            mru_hits 0
            mru_ghost_hits 0
            mfu_hits 0
            mfu_ghost_hits 0
            l2_hits 0
            refcount 1
        } bp {
            dva0 0x1:0x40082
            dva1 0x1:0x3000e5
            dva2 0x1:0x5a006e
            cksum 0x163edbff3a:0x75af30b3dd6:0x1499263ff5f2b:0x288bd118815e00
            lsize 2048
        } zb {
            objset 0
            object 0
            level -1
            blkid 0
        }

For the specific tracepoint shown here, 'zfs_arc__miss', data is
exported detailing the arc_buf_hdr_t (hdr), blkptr_t (bp), and
zbookmark_t (zb) that caused the ARC miss (down to the exact DVA!).
This kind of precise and detailed information can be extremely valuable
when trying to answer certain kinds of questions.

For anybody unfamiliar but looking to build on this, I found the XFS
source code along with the following three web links to be extremely
helpful:

    * http://lwn.net/Articles/379903/
    * http://lwn.net/Articles/381064/
    * http://lwn.net/Articles/383362/

I should also node the more "boring" aspects of this patch:

    * The ZFS_LINUX_COMPILE_IFELSE autoconf macro was modified to
       support a sixth paramter. This parameter is used to populate the
       contents of the new conftest.h file. If no sixth parameter is
       provided, conftest.h will be empty.

    * The ZFS_LINUX_TRY_COMPILE_HEADER autoconf macro was introduced.
      This macro is nearly identical to the ZFS_LINUX_TRY_COMPILE macro,
      except it has support for a fifth option that is then passed as
      the sixth parameter to ZFS_LINUX_COMPILE_IFELSE.

These autoconf changes were needed to test the availability of the Linux
tracepoint macros. Due to the odd nature of the Linux tracepoint macro
API, a separate ".h" must be created (the path and filename is used
internally by the kernel's define_trace.h file).

    * The HAVE_DECLARE_EVENT_CLASS autoconf macro was introduced. This
      is to determine if we can safely enable the Linux tracepoint
      functionality. We need to selectively disable the tracepoint code
      due to the kernel exporting certain functions as GPL only. Without
      this check, the build process will fail at link time.

In addition, the SET_ERROR macro was modified into a tracepoint as well.
To do this, the 'sdt.h' file was moved into the 'include/sys' directory
and now contains a userspace portion and a kernel space portion. The
dprintf and zfs_dbgmsg* interfaces are now implemented as tracepoint as
well.

Signed-off-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Ned Bass <bass6@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-11-17 11:13:55 -08:00
Ned Bass
5024046763 cstyle: allow right paren on its own line
Make the style checker script accept right parentheses on their own
lines. This is motivated by the Linux tracepoints macro
DECLARE_EVENT_CLASS.

The code within TP_fast_assign() (a parameter of DECLARE_EVENT_CLASS)
is normal C assignments terminated by semicolons.  But the style
checker forbids us from following a semicolon with a non-blank and
from preceding a right parenthesis with white space.  Therefore the
closing parenthesis must go on the next line, yet the style checker
foribs us from indenting it for readability.  Relaxing the
no-non-blank-after-semicolon rule would open the door to too many bad
style practices. So instead we relax the
no-white-space-before-right-paren rule if the parenthesis is on its
own line.  The relaxation is overriden with the -p option so we still
have a way to catch misuse of this style.

Signed-off-by: Ned Bass <bass6@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-11-17 11:13:50 -08:00
Brian Behlendorf
bf2850de82 Fix source_tree variable in dkms build
The source_tree variable in the previous commit had an extra $.
Remove it so that source_tree is expanded properly.  An identical
fix has been applied in the original patch to the stable branch.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2776
2014-10-13 10:38:41 -07:00
Tom Prince
f178adea81 Point dkms build at installed source tree, rather than build directory.
Signed-off-by: Tom Prince <tom.prince@clusterhq.com>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2776
2014-10-09 12:05:43 -07:00
Tom Prince
fee48fd22c Install header during post-build rather than post-install.
New versions of dkms clean up the build directory after installing.

It appears that this was always intended, but had rm -rf "/path/to/build/*"
(note the quotes), which prevented it from working.

Also, the build step is already installing stuff into the directory where
these files go, so installing our stuff there as part of build rather than
install makes sense.

Signed-off-by: Tom Prince <tom.prince@clusterhq.com>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2776
2014-10-09 12:03:50 -07:00
Brian Behlendorf
093219a6b3 zpool-create.sh: allow features to be disabled
The zimport.sh script makes use of the zpool-create.sh script
to construct test pools for importing with older versions of
ZoL.  It is desirable to have a way to disable all the features
so new pools can be imported with older code.

The simplest and most flexible way to achieve this was to merge
the VERBOSE_FLAG and FORCE_FLAG in to a single ZPOOL_FLAGS
variable.  The contents of this variable will be used in the
'zpool create' allowing us to easily pass arbitrary flags.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Prakash Surya <surya1@llnl.gov>
Closes #2524
2014-07-25 11:58:31 -07:00
Turbo Fredriksson
0f629346bb Set LANG to a reasonable default (C)
Set LANG=C before calling 'rpmbuild' to avoid rpmbuild failing on
the translated date string in the changelog.

Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes: zfsonlinux/spl#306
2014-06-10 16:46:21 -07:00
Brian Behlendorf
e0b8f62902 Various zimport.sh fixes
1) $SPLSRC and $SRCDIR should be changed to $SRC_DIR.  These are
   vestiges of an earlier version of the script and were missed when
   it was updated.  Additionally ensure the directory is created.

2) The 'fail' function should take an integer argument for the
   error code to return.  Otherwise 0 (success) will be mistakenly
   returned and errors will we incorrectly suppressed.  The error
   code should be meaningful enough to determine where the script
   failed.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-04-17 09:30:55 -07:00
Brian Behlendorf
888f7141a3 Make zimport.sh bash dependency explicit
Unfortunately, the zimport.sh test script really does depend on
bash.  Moving to /bin/sh should be possible once the shared
infrastructure scripts it depends on is made portable.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-04-10 16:07:59 -07:00
Brian Behlendorf
443c3f7332 Improve zfs.sh error messages
Ensure an error message is logged when the 'zfs.sh' script fails
to either load a module or if udev fails to create the /dev/zfs
device.  Error messages for missing KERNEL_MODULES are suppressed
because that functionality may just be built-in to the kernel.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-04-10 14:27:00 -07:00
Brian Behlendorf
cc9ee13e1a Dynamically create loop devices
Several of the in-tree regression tests depend on the availability
of loop devices.  If for some reason no loop devices are available
the tests will fail.

Normally this isn't an issue because most Linux distributions create
8 loop devices by default.  This is enough for our purposes.  However,
recent Fedora releases have only been creating a single loop device
and this leads to failures.  Alternately, if something else of the
system is using the loop devices we may see failures.

The fix for this is to update the support scripts to dynamically
create loop devices as needed.  The scripts need only create a node
under /dev/ and the loop driver with create the minor.  This behavior
has been supported by the loop driver for ages.

Additionally this patch updates cleanup_loop_devices() to cleanup
loop devices which have already had their file store deleted.  This
helps prevent stale loop devices from accumulating on the system due
to test failures.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Prakash Surya <surya1@llnl.gov>
Closes #2249
2014-04-09 13:29:32 -07:00
Chris Dunlap
9e246ac3d8 Initial implementation of zed (ZFS Event Daemon)
zed monitors ZFS events.  When a zevent is posted, zed will run any
scripts that have been enabled for the corresponding zevent class.
Multiple scripts may be invoked for a given zevent.  The zevent
nvpairs are passed to the scripts as environment variables.

Events are processed synchronously by the single thread, and there is
no maximum timeout for script execution.  Consequently, a misbehaving
script can delay (or forever block) the processing of subsequent
zevents.  Plans are to address this in future commits.

Initial scripts have been developed to log events to syslog
and send email in response to checksum/data/io errors and
resilver.finish/scrub.finish events.  By default, email will only
be sent if the ZED_EMAIL variable is configured in zed.rc (which is
serving as a config file of sorts until a proper configuration file
is implemented).

Signed-off-by: Chris Dunlap <cdunlap@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #2
2014-04-02 13:10:03 -07:00
Brian Behlendorf
a16bc6bdd9 Add zimport.sh compatibility test script
Verify that an assortment of known good reference pools can be imported
using different versions of the ZoL code.

By default references pools for the major ZFS implementation will be
checked against the most recent ZoL tags and the master development branch.
Alternate tags or branches may be verified with the '-s <src-tag> option.
Passing the keyword "installed" will instruct the script to test whatever
version is installed.

Preferentially a reference pool is used for all tests.  However, if one
does not exist and the pool-tag matches one of the src-tags then a new
reference pool will be created using binaries from that source build.
This is particularly useful when you need to test your changes before
opening a pull request.

New reference pools may be added by placing a bzip2 compressed tarball
of the pool in the scripts/zpool-example directory and then passing
the -p <pool-tag> option.  To increase the test coverage reference pools
should be collected for all the major ZFS implementations.  Having these
pools easily available is also helpful to the developers.

Care should be taken to run these tests with a kernel supported by all
the listed tags.  Otherwise build failure will cause false positives.

EXAMPLES:

The following example will verify the zfs-0.6.2 tag, the master branch,
and the installed zfs version can correctly import the listed pools.
Note there is no reference pool available for master and installed but
because binaries are available one is automatically constructed.  The
working directory is also preserved between runs (-k) preventing the
need to rebuild from source for multiple runs.

 zimport.sh -k -f /var/tmp/zimport \
     -s "zfs-0.6.1 zfs-0.6.2 master installed" \
     -p "all master installed"

--------------------- ZFS on Linux Source Versions --------------
                zfs-0.6.1       zfs-0.6.2       master          0.6.2-180
-----------------------------------------------------------------
Clone SPL       Skip		Skip		Skip		Skip
Clone ZFS       Skip		Skip		Skip		Skip
Build SPL       Skip		Skip		Skip		Skip
Build ZFS       Skip		Skip		Skip		Skip
-----------------------------------------------------------------
zevo-1.1.1      Pass		Pass		Pass		Pass
zol-0.6.1       Pass		Pass		Pass		Pass
zol-0.6.2-173   Fail		Fail		Pass		Pass
zol-0.6.2       Pass		Pass		Pass		Pass
master          Fail		Fail		Pass		Pass
installed       Pass		Pass		Pass		Pass

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Issue #2094
2014-02-21 12:10:31 -08:00
Brian Behlendorf
99d3ece847 Add default FILEDIR path to zpool-config scripts
Allow the caller of the zpool-create.sh script to override
the default path where file vdevs are created.  This allows
for greated flexibilty when scripting.

Additionally, update the default path from /tmp/ to /var/tmp/
because these days /tmp/ is likely a ramdisk.  Even though
these files are sparse they may grow large in which case they
should be backed by a physical device.

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2120
2014-02-12 09:37:46 -08:00
Brian Behlendorf
0820ec65c4 Fix zconfig.sh test 9
Commit ba6a240 adjusted the behavior of 'zfs create -V'.  The
caller is no longer guaranteed that udev will have finished
creating the /dev/ entries by the time to command exits.  It
is therefore required that we explicitly block waiting for
udev to settle for this test to run reliably.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-02-10 15:48:32 -08:00
Brian Behlendorf
8ffef572ed cstyle: Allow spaces in all comments
Update the cstyle.pl script to allow pictures in all comments not
just header comments.  Recent changes from Illumos such as d3cc8b1
have relocated various pictures in the standard block comments to
make the code more readable.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #1821
2013-12-18 16:46:35 -08:00
Brian Behlendorf
c2d439dffd Silence e2fsck warning in zconfig.sh
When running zconfig.sh test 7 and 8 cause the following warning to
be printed to the console.  It's caused because we're snapshoting a
mounted ext2 filesystem which is not in a 'clean' state.  This is
to be expected since we have no guarentees about the on-disk
consistency of the filesystem.

EXT2-fs warning: mounting unchecked fs, running e2fsck is recommended

To silence the warning and preserve the intent of these test cases
they have been updated to unmount the filesystem prior to snapshoting
them.  This ensures the ext2 filesystem is in a consistent state
when the snapshot is taken.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ned Bass <bass6@llnl.gov>
Closes #1972
2013-12-16 09:46:10 -08:00
Brian Behlendorf
ba6a24026c Remove ZFC_IOC_*_MINOR ioctl()s
Early versions of ZFS coordinated the creation and destruction
of device minors from userspace.  This was inherently racy and
in late 2009 these ioctl()s were removed leaving everything up
to the kernel.  This significantly simplified the code.

However, we never picked up these changes in ZoL since we'd
already significantly adjusted this code for Linux.  This patch
aims to rectify that by finally removing ZFC_IOC_*_MINOR ioctl()s
and moving all the functionality down in to the kernel.  Since
this cleanup will change the kernel/user ABI it's being done
in the same tag as the previous libzfs_core ABI changes.  This
will minimize, but not eliminate, the disruption to end users.

Once merged ZoL, Illumos, and FreeBSD will basically be back
in sync in regards to handling ZVOLs in the common code.  While
each platform must have its own custom zvol.c implemenation the
interfaces provided are consistent.

NOTES:

1) This patch introduces one subtle change in behavior which
   could not be easily avoided.  Prior to this change callers
   of 'zfs create -V ...' were guaranteed that upon exit the
   /dev/zvol/ block device link would be created or an error
   returned.  That's no longer the case.  The utilities will no
   longer block waiting for the symlink to be created.  Callers
   are now responsible for blocking, this is why a 'udev_wait'
   call was added to the 'label' function in scripts/common.sh.

2) The read-only behavior of a ZVOL now solely depends on if
   the ZVOL_RDONLY bit is set in zv->zv_flags.  The redundant
   policy setting in the gendisk structure was removed.  This
   both simplifies the code and allows us to safely leverage
   set_disk_ro() to issue a KOBJ_CHANGE uevent.  See the
   comment in the code for futher details on this.

3) Because __zvol_create_minor() and zvol_alloc() may now be
   called in a sync task they must use KM_PUSHPAGE.

References:
  illumos/illumos-gate@681d9761e8

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ned Bass <bass6@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Closes #1969
2013-12-16 09:15:57 -08:00
Brian Behlendorf
a35beedfb3 Add cstyle.pl utility and cstyle.1 man page
Cstyle is the C source style checker used by Illumos.  Since the
original ZFS source was written using these style guidelines they
must also be followed by ZoL for consistency.

The checker has been added to the scripts directory and may be
run on a per file basis.  New patches should be careful to avoid
introducing new style warnings.

Additionally, the 'checkstyle' target has been added to the top
level Makefile and can be used to check the entire source tree.
While Zol has historically attempted to follow the SunOS style
guide the lack of a rigorous style checker has allowed various
warning to be introduced.  Currently there are 2211 reported
style violations and we want to gradually eliminate these from
the tree.

Note the cstyle.1 man page is provided under man/man1/cstyle.1
but since it is a developer utility it is not installed along
with the other man pages.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2013-10-30 11:36:30 -07:00
Prakash Surya
37fd6e00a6 Add script to fix file names in upstream patches
Added a simple sed script to do a search and replace on the Illumos
ZFS file names and replace them with the ZFS on Linux equivalent.

Example usage:

    # Replace Illumos paths with Linux paths
    $ ./scripts/zfs2zol-patch.sed arc.c.patch > arc.c.patch.linux

    # Ensure the script worked as expected
    $ diff arc.c.patch arc.c.patch.linux

    # Apply the patch using Linux paths
    $ patch -p1 < arc.c.patch.linux

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1679
2013-10-29 10:30:43 -07:00
Brian Behlendorf
cb79a4e8bb Add kmod repo integration
When the kmod packaging infrastructure was originally added the
dependency on the rpmfusion yum repositories was disabled.  This
was done at the time in favour of getting local builds working.

Now the time has come to conditionally re-enable that functionality
so we can properly provide binary kmod packages.

  ./configure --with-config=srpm
  make SRPM_DEFINE_KMOD='--define="repo rpmfusion"' srpm-kmod
  mock rebuild zfs-kmod-x.y.z-r.el6.src.rpm

One nice benefit of finishing this work is that the generic and
fedora spl-kmod spec files can be merged again.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2013-08-01 09:48:07 -07:00
Brian Behlendorf
91604b298c Open pools asynchronously after module load
One of the side effects of calling zvol_create_minors() in
zvol_init() is that all pools listed in the cache file will
be opened.  Depending on the state and contents of your pool
this operation can take a considerable length of time.

Doing this at load time is undesirable because the kernel
is holding a global module lock.  This prevents other modules
from loading and can serialize an otherwise parallel boot
process.  Doing this after module inititialization also
reduces the chances of accidentally introducing a race
during module init.

To ensure that /dev/zvol/<pool>/<dataset> devices are
still automatically created after the module load completes
a udev rules has been added.  When udev notices that the
/dev/zfs device has been create the 'zpool list' command
will be run.  This then will cause all the pools listed
in the zpool.cache file to be opened.

Because this process in now driven asynchronously by udev
there is the risk of problems in downstream distributions.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #756
Issue #1020
Issue #1234
2013-07-03 09:24:38 -07:00
Nathaniel Clark
389cf730ce Make spl directory setable when building rpms and add --buildroot
This adds ability to set the location of spl via defines when
building from the spec files.  This is useful for build systems
that build spl and zfs together without installing the actual rpms.

Signed-off-by: Nathaniel Clark <Nathaniel.Clark@misrule.us>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1486
2013-06-21 16:00:45 -07:00
Jan Engelhardt
4e95cc99b0 build: resolve orthographic and other grammatical errors
Signed-off-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2013-04-02 10:44:52 -07:00
Brian Behlendorf
0df23ca9a1 Provide ${kmodname}-devel-kmod for yum-builddep
In order to ensure that yum-builddep pulls in all the build
requirements a generic ${kmodname}-devel-kmod provides line is
added.  This allows a version of the development headers to be
included without requiring knowledge of the kernel version.

This is important because unlike rpmbuild which does correctly
expand the source rpm spec file, yum-builddep does not.  Without
this generic provides line mock which relies on yum-builddep is
unable to automatically satisfy the dependency.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2013-03-25 13:28:19 -07:00
Brian Behlendorf
352b557bd6 Use requested kernel for dkms builds
The --with-linux and --with-linux-obj options must be specified
as part of the dkms build otherwise the package will be built
against the running kernel.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2013-03-21 12:51:06 -07:00
Brian Behlendorf
ae0f0ba950 Use BUILD_DEPENDS option for dkms builds
Support was added to dkms so build dependencies can be specified.
This allows us to ensure that the spl package will always be built
before the zfs package.  Those patches have not yet been merged
upstream but they are available in the zfsonlinux/dkms repository.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2013-03-21 12:51:06 -07:00
Brian Behlendorf
f3757573a6 Refresh RPM packaging
Refresh the existing RPM packaging to conform to the 'Fedora
Packaging Guidelines'.  This includes adopting the kmods2
packaging standard which is used fod kmods distributed by
rpmfusion for Fedora/RHEL.

  http://fedoraproject.org/wiki/Packaging:Guidelines
  http://rpmfusion.org/Packaging/KernelModules/Kmods2

While the spec files have been entirely rewritten from a
user perspective the only major changes are:

* The Fedora packages now have a build dependency on the
  rpmfusion repositories.  The generic kmod packages also
  have a new dependency on kmodtool-1.22 but it is bundled
  with the source rpm so no additional packages are needed.

* The kernel binary module packages have been renamed from
  zfs-modules-* to kmod-zfs-* as specificed by kmods2.

* The is now a common kmod-zfs-devel-* package in addition
  to the per-kernel devel packages.  The common package
  contains the development headers while the per-kernel
  package contains kernel specific build products.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1341
2013-03-18 15:33:17 -07:00
Brian Behlendorf
48c028f5a5 Replace libexecdir with datadir
According to the FHS.  Testing scripts and examples which are all
architecture independent should be installed in a subdirectory
under /usr/share.

  http://www.pathname.com/fhs/2.2/fhs-4.11.html

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2013-03-06 15:46:40 -08:00
Eric Dillmann
0b4d1b5853 Add snapdev=[hidden|visible] dataset property
The new snapdev dataset property may be set to control the
visibility of zvol snapshot devices.  By default this value
is set to 'hidden' which will prevent zvol snapshots from
appearing under /dev/zvol/ and /dev/<dataset>/.  When set to
'visible' all zvol snapshots for the dataset will be visible.

This functionality was largely added because when automatic
snapshoting is enabled large numbers of read-only zvol snapshots
will be created.  When creating these devices the kernel will
attempt to read their partition tables, and blkid will attempt
to identify any filesystems on those partitions.  This leads
to a variety of issues:

1) The zvol partition tables will be read in the context of
   the `modprobe zfs` for automatically imported pools.  This
   is undesirable and should be done asynchronously, but for
   now reducing the number of visible devices helps.

2) Udev expects to be able to complete its work for a new
   block devices fairly quickly.  When many zvol devices are
   added at the same time this is no longer be true.  It can
   lead to udev timeouts and missing /dev/zvol links.

3) Simply having lots of devices in /dev/ can be aukward from
   a management standpoint.  Hidding the devices your unlikely
   to ever use helps with this.  Any snapshot device which is
   needed can be made visible by changing the snapdev property.

NOTE: This patch changes the default behavior for zvols which
      was effectively 'snapdev=visible'.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1235
Closes #945
Issue #956
Issue #756
2013-03-05 12:37:54 -08:00
Brian Behlendorf
dbf763b39b Retire zpool_id infrastructure
In the interest of maintaining only one udev helper to give vdevs
user friendly names, the zpool_id and zpool_layout infrastructure
is being retired.  They are superseded by vdev_id which incorporates
all the previous functionality.

Documentation for the new vdev_id(8) helper and its configuration
file, vdev_id.conf(5), can be found in their respective man pages.
Several useful example files are installed under /etc/zfs/.

  /etc/zfs/vdev_id.conf.alias.example
  /etc/zfs/vdev_id.conf.multipath.example
  /etc/zfs/vdev_id.conf.sas_direct.example
  /etc/zfs/vdev_id.conf.sas_switch.example

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #981
2013-01-29 12:23:17 -08:00
Brian Behlendorf
ff5b1c8065 Quiet mkfs.ext2 output
The -q option should quiet the mkfs.ext2 output but certain
versions of e2fsprogs appear to ignore it.  This can result in
an extra 'done' message in the test output.  To keep this noise
from distracting just direct stdout to /dev/null.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2013-01-28 15:36:02 -08:00
Brian Behlendorf
930b6fec21 Stop using /bin/ as a source in zconfig.sh
Test 5, 6, 7, and 7 in zconfig.sh use /bin/ as a source of random
directories and files for their test.  This has lead to unexpected
tests failures because the total size of /bin/ on the test system
isn't checked and it is entirely possible for it to be larger than
the target filesystem.

To resolve this issue we create a somewhat random collection of
files and directories in /var/tmp to use.  On average we expect
about 5MB of data with the worst case being 20MB.  This is large
enough to be interesting and small enough to always fit in the
default test datasets.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1113
2013-01-28 14:51:26 -08:00
Brian Behlendorf
e528c9b461 Fix zconfig.sh partitioning error
Parted version 3.0 doesn't allow us to specify the start and end
percentages as 50% and 100% respectively.  This results in:

  Error: The location 100% is outside the device /dev/zd0

Therefore we change the syntax to 51% and -1 for end of device.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2013-01-24 13:57:16 -08:00
Brian Behlendorf
563103decd Fix test script error codes
The 'exit $?' command in the INT TERM EXIT trap was overwritting
the expected error code with the error code from mv.  Fix the
issue by removing the 'exit $?'.  It's important the we preserve
the original error code so failures are easily noticed.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2013-01-24 13:53:12 -08:00
Turbo Fredriksson
645fb9cc21 Implemented sharing datasets via SMB using libshare
Add the initial support for the 'smbshare' option using the
existing libshare infrastructure.  Because this implementation
relies on usershares samba version 3.0.23 is required.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #493
2012-12-03 09:42:15 -08:00
Andrew Reid
6cb7ab069d Do not return /dev/loop-control in unused_loop_device
The function unused_loop_device in /usr/libexec/zfs/common.sh
returns /dev/loop-control on the first call. This device is NOT
a loop device (https://github.com/torvalds/linux/commit/770fe30)
it is a control device. This in turn causes the script zconfig.sh
to fail with:

  zpool-create.sh: Error 1 creating /tmp/zpool-vdev0 ->
  /dev/loop-control loopback

The patch makes the function return /dev/loop[0-9]* which are
loop devices.

Signed-off-by: Andrew Reid <ColdCanuck@nailedtotheperch.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #797
2012-10-15 10:02:42 -07:00
Etienne Dechamps
2f342404c1 Force 4K blocksize when testing ext2 on zvol.
Currently, mkfs.ext2 on zconfig.sh zvols tries to use a 8K blocksize,
probably because by default zvol exposes an optimal I/O size of 8K.

Unfortunately, a ext2 blocksize of 8K is not supported by the kernel,
so the resulting filesystem is unmountable.

This patch fixes the issue by making sure the blocksize is 4K. We have
to use -F to force it else mkfs.ext2 won't allow us to use a blocksize
smaller than the optimal I/O size.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #979
2012-10-03 10:52:51 -07:00
Brian Behlendorf
ca8b5af89d Remove autotools products
Remove all of the generated autotools products from the repository
and update the .gitignore files accordingly.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #718
2012-08-27 11:47:44 -07:00
Etienne Dechamps
ee5fd0bb80 Set zvol discard_granularity to the volblocksize.
Currently, zvols have a discard granularity set to 0, which suggests to
the upper layer that discard requests of arbirarily small size and
alignment can be made efficiently.

In practice however, ZFS does not handle unaligned discard requests
efficiently: indeed, it is unable to free a part of a block. It will
write zeros to the specified range instead, which is both useless and
inefficient (see dnode_free_range).

With this patch, zvol block devices expose volblocksize as their discard
granularity, so the upper layer is aware that it's not supposed to send
discard requests smaller than volblocksize.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #862
2012-08-07 14:55:31 -07:00
Richard Yao
739a1a82e0 Linux 3.5 compat, end_writeback() changed to clear_inode()
The end_writeback() function was changed by moving the call to
inode_sync_wait() earlier in to evict().   This effecitvely changes
the ordering of the sync but it does not impact the details of
the zfs implementation.

However, as part of this change end_writeback() was renamed to
clear_inode() to reflect the new semantics.  This change does
impact us and clear_inode() now maps to end_writeback() for
kernels prior to 3.5.

Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #784
2012-07-23 12:29:36 -07:00
Richard Yao
ea1fdf46e2 Linux 3.5 compat, iops->truncate_range() removed
The vmtruncate_range() support has been removed from the kernel in
favor of using the fallocate method in the file_operations table.

Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #784
2012-07-23 12:29:32 -07:00
Richard Yao
756c3e5a9c Linux 3.5 compat, eops->encode_fh() takes inodes
The export_operations member ->encode_fh() has been updated to
take both the child and parent inodes.  This interface used to
take the child dentry and a bool describing if the parent is needed.

NOTE: While updating this code I noticed that we do not currently
cleanly handle the case where we're passed a connectable parent.
This code should be audited to make sure we're doing the right thing.

Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #784
2012-07-23 12:29:23 -07:00
Etienne Dechamps
b5a28807cd Move partition scanning from userspace to module.
Currently, zpool online -e (dynamic vdev expansion) doesn't work on
whole disks because we're invoking ioctl(BLKRRPART) from userspace
while ZFS still has a partition open on the disk, which results in
EBUSY.

This patch moves the BLKRRPART invocation from the zpool utility to the
module. Specifically, this is done just before opening the device in
vdev_disk_open() which is called inside vdev_reopen(). This requires
jumping through some hoops to get to the disk device from the partition
device, and to make sure we can still open the partition after the
BLKRRPART call.

Note that this new code path is triggered on dynamic vdev expansion
only; other actions, like creating a new pool, are unchanged and still
call BLKRRPART from userspace.

This change also depends on API changes which are available in 2.6.37
and latter kernels.  The build system has been updated to detect this,
but there is no compatibility mode for older kernels.  This means that
online expansion will NOT be available in older kernels.  However, it
will still be possible to expand the vdev offline.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #808
2012-07-17 09:17:31 -07:00
Richard Yao
6a0936babc Linux 3.4 compat, d_make_root() replaces d_alloc_root()
torvalds/linux@adc0e91ab1 introduced
introduced d_make_root() as a replacement for d_alloc_root(). Further
commits appear to have removed d_alloc_root() from the Linux source
tree. This causes the following failure:

  error: implicit declaration of function 'd_alloc_root'
  [-Werror=implicit-function-declaration]

To correct this we update the code to use the current d_make_root()
interface for readability.  Then we introduce an autotools check
to determine if d_make_root() is available.  If it isn't then we
define some compatibility logic which used the older d_alloc_root()
interface.

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #776
2012-06-11 10:04:49 -07:00
Brian Behlendorf
b39d3b9f7b Linux 3.3 compat, iops->create()/mkdir()/mknod()
The mode argument of iops->create()/mkdir()/mknod() was changed from
an 'int' to a 'umode_t'.  To prevent a compiler warning an autoconf
check was added to detect the API change and then correctly set a
zpl_umode_t typedef.  There is no functional change.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #701
2012-04-30 12:52:38 -07:00
Brian Behlendorf
1c5de20ae2 Add --enable-debug-dmu-tx configure option
Allow rigorous (and expensive) tx validation to be enabled/disabled
indepentantly from the standard zfs debugging.  When enabled these
checks ensure that all txs are constructed properly and that a dbuf
is never dirtied without taking the correct tx hold.

This checking is particularly helpful when adding new dmu consumers
like Lustre.  However, for established consumers such as the zpl
with no known outstanding tx construction problems this is just
overhead.

--enable-debug-dmu-tx  - Enable/disable validation of each tx as
--disable-debug-dmu-tx   it is constructed.  By default validation
                         is disabled due to performance concerns.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2012-03-23 12:25:17 -07:00
Brian Behlendorf
ebe7e575ea Add .zfs control directory
Add support for the .zfs control directory.  This was accomplished
by leveraging as much of the existing ZFS infrastructure as posible
and updating it for Linux as required.  The bulk of the core
functionality is now all there with the following limitations.

*) The .zfs/snapshot directory automount support requires a 2.6.37
   or newer kernel.  The exception is RHEL6.2 which has backported
   the d_automount patches.

*) Creating/destroying/renaming snapshots with mkdir/rmdir/mv
   in the .zfs/snapshot directory works as expected.  However,
   this functionality is only available to root until zfs
   delegations are finished.

      * mkdir - create a snapshot
      * rmdir - destroy a snapshot
      * mv    - rename a snapshot

The following issues are known defeciences, but we expect them to
be addressed by future commits.

*) Add automount support for kernels older the 2.6.37.  This should
   be possible using follow_link() which is what Linux did before.

*) Accessing the .zfs/snapshot directory via NFS is not yet possible.
   The majority of the ground work for this is complete.  However,
   finishing this work will require resolving some lingering
   integration issues with the Linux NFS kernel server.

*) The .zfs/shares directory exists but no futher smb functionality
   has yet been implemented.

Contributions-by: Rohan Puri <rohan.puri15@gmail.com>
Contributiobs-by: Andrew Barnes <barnes333@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #173
2012-03-22 13:03:47 -07:00
Brian Behlendorf
4b787d75c8 Cleanly support debug packages
Allow a source rpm to be rebuilt with debugging enabled.  This
avoids the need to have to manually modify the spec file.  By
default debugging is still largely disabled.  To enable specific
debugging features use the following options with rpmbuild.

  '--with debug'               - Enables ASSERTs

  # For example:
  $ rpmbuild --rebuild --with debug zfs-modules-0.6.0-rc6.src.rpm

Additionally, ZFS_CONFIG has been added to zfs_config.h for
packages which build against these headers.  This is critical
to ensure both zfs and the dependant package are using the same
prototype and structure definitions.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2012-02-27 14:08:17 -08:00
Etienne Dechamps
30930fba21 Add support for DISCARD to ZVOLs.
DISCARD (REQ_DISCARD, BLKDISCARD) is useful for thin provisioning.
It allows ZVOL clients to discard (unmap, trim) block ranges from
a ZVOL, thus optimizing disk space usage by allowing a ZVOL to
shrink instead of just grow.

We can't use zfs_space() or zfs_freesp() here, since these functions
only work on regular files, not volumes. Fortunately we can use the
low-level function dmu_free_long_range() which does exactly what we
want.

Currently the discard operation is not added to the log. That's not
a big deal since losing discard requests cannot result in data
corruption. It would however result in disk space usage higher than
it should be. Thus adding log support to zvol_discard() is probably
a good idea for a future improvement.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2012-02-09 16:19:38 -08:00