* Supports booting of a ZFS snapshot.
Do this by cloning the snapshot into a dataset. If this, the resulting
dataset, already exists, destroy it. Then mount it on root.
* If snapshot does not exist, use base dataset (the part before '@')
as boot filesystem instead.
* If no snapshot is specified on the 'root=' kernel command line, but there
is an '@', then get a list of snapshots below that filesystem and ask the
user which to use.
* Clone with 'mountpoint=none' and 'canmount=noauto' - we mount manually
and explicitly.
* For sub-filesystems, that doesn't have a mountpoint property set, we use
the 'org.zol:mountpoint' to keep track of it's mountpoint.
* Allow rollback of snapshots instead of clone it and boot from the clone.
* Allow mounting a root- and subfs with mountpoint=legacy set
* Allow mounting a filesystem which is using nativ encryption.
* Support all currently used kernel command line arguments
All the different distributions have their own standard on what to specify
on the kernel command line to boot of a ZFS filesystem.
* Extra options:
* zfsdebug=(on,yes,1) Show extra debugging information
* zfsforce=(on,yes,1) Force import the pool
* rollback=(on,yes,1) Rollback (instead of clone) the snapshot
* Only try to import pool if it haven't already been imported
* This will negate the need to force import a pool that have not been exported cleanly.
* Support exclusion of pools to import by setting ZFS_POOL_EXCEPTIONS in /etc/default/zfs.
* Support additional configuration variable ZFS_INITRD_ADDITIONAL_DATASETS
to mount additional filesystems not located under your root dataset.
* Include /etc/modprobe.d/{zfs,spl}.conf in the initrd if it/they exist.
* Include the udev rule to use by-vdev for pool imports.
* Include the /etc/default/zfs file to the initrd.
* Only try /dev/disk/by-* in the initrd if USE_DISK_BY_ID is set.
* Use /dev/disk/by-vdev before anything.
* Add /dev as a last ditch attempt.
* Fallback to using the cache file if that exist if nothing else worked.
* Use /sbin/modprobe instead of built-in (BusyBox) modprobe.
This gets rid of the message "modprobe: can't load module zcommon".
Thanx to pcoultha for finding this.
Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#2116Closes#2114
For kernels which do not implement a per-suberblock shrinker,
those older than Linux 3.1, the shrink_dcache_parent() function
was used to attempt to reclaim dentries. This was found not be
entirely reliable and could lead to performance issues on older
kernels running meta-data heavy workloads.
To address this issue a zfs_sb_prune_aliases() function has been
added to implement this functionality. It relies on traversing
the list of znodes for a filesystem and adding them to a private
list with a reference held. The private list can then be safely
walked outside the z_znodes_lock to prune dentires and drop the
last reference so the inode can be freed.
This provides the same synchronous behavior as the per-filesystem
shrinker and has the advantage of depending on only long standing
interfaces.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Closes#3501
Linux 3.15 commit torvalds/linux@293bc98 introduced two new methods.
The ->read_iter() and ->write_iter() methods were designed to replace
the ->aio_read() and ->aio_write() interfaces. Both interfaces were
preserved for several kernel releases in order to migrate all existing
consumers to the new interfaces. But as of Linux 4.1 the legacy
interface has been retired and the ZFS code must be updated to use
the new interfaces.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#3352
Kernels >= 3.12 have a NUMA-aware superblock shrinker which is used in
ZoL by zfs_sb_prune(). This patch calls the shrinker for each on-line
NUMA node in order that memory be freed for each one.
Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#3495
* Based on the init scripts included with Debian GNU/Linux, then take code
from the already existing ones, trying to merge them into one set of
scripts that will work for 'everyone' for better maintainability.
* Add configurable variables to control the workings of the init scripts:
* ZFS_INITRD_PRE_MOUNTROOT_SLEEP
Set a sleep time before we load the module (used primarily by initrd
scripts to allow for slower media (such as USB devices etc) to be
availible before we load the zfs module).
* ZFS_INITRD_POST_MODPROBE_SLEEP
Set a timed sleep in the initrd to after the load of the zfs module.
* ZFS_INITRD_ADDITIONAL_DATASETS
To allow for mounting additional datasets in the initrd. Primarily used
in initrd scripts to allow for when filesystem needed to boot (such as
/usr, /opt, /var etc) isn't directly under the root dataset.
* ZFS_POOL_EXCEPTIONS
Exclude pools from being imported (in the initrd and/or init scripts).
* ZFS_DKMS_ENABLE_DEBUG, ZFS_DKMS_ENABLE_DEBUG_DMU_TX, ZFS_DKMS_DISABLE_STRIP
Set to control how dkms should build the dkms packages.
* ZPOOL_IMPORT_PATH
Set path(s) where "zpool import" should import pools from.
This was previously the job of "USE_DISK_BY_ID" (which is still used
for backwards compatibility) but was renamed to allow for better
control of import path(s).
* If old USE_DISK_BY_ID is set, but not new ZPOOL_IMPORT_PATH, then we
set ZPOOL_IMPORT_PATH to sane defaults just to be on the safe side.
* ZED_ARGS
To allow for local options to zed without having to change the init script.
* The import function, do_import(), imports pools by name instead of '-a'
for better control of pools to import and from where.
* If USE_DISK_BY_ID is set (for backwards compatibility), but isn't 'yes'
then ignore it.
* If pool(s) isn't found with a simple "zpool import" (seen it happen),
try looking for them in /dev/disk/by-id (if it exists). Any duplicates
(pools found with both commands) is filtered out.
* IF we have found extra pool(s) this way, we must force USE_DISK_BY_ID
so that the first, simple "zpool import $pool" is able to find it.
* Fallback on importing the pool using the cache file (if it exists) only
if 'simple' import (either with ZPOOL_IMPORT_PATH or the 'built in'
defaults) didn't work.
* The export function, do_export(), will export all pools imported, EXCEPT
the root pool (if there is one).
* ZED script from the Debian GNU/Linux packages added.
* Refreshed ZED init script from behlendorf@5e7a660 to be portable so it
may be used on both LSB and Redhat style systems.
* If there is no pool(s) imported and zed successfully shut down, we will
unload the zfs modules.
* The function library file for the ZoL init script is installed as
/etc/init.d/zfs-functions.
* The four init scripts, the /etc/{defaults,sysconfig,conf.d}/zfs config file
as well as the common function library is tagged as '%config(noreplace)' in
the rpm rules file to make sure they are not replaced automatically if locally
modifed.
* Pitfals and workarounds:
* If we're running from init, remove stale /etc/dfs/sharetab before importing
pools in the zfs-import init script.
* On Debian GNU/Linux, there's a 'sendsigs' script that will kill basically
everything quite early in the shutdown phase and zed is/should be stopped
much later than that. We don't want zed to be among the ones killed, so add
the zed pid to list of pids for 'sendsigs' to ignore.
* CentOS uses echo_success() and echo_failure() to print out status of
command. These in turn uses "echo -n \0xx[etc]" to move cursor and choose
colour etc. This doesn't work with the modified IFS variable we need to
use in zfs-import for some reason, so work around that when we define
zfs_log_{end,failure}_msg() for RedHat and derivative distributions.
* All scripts passes ShellCheck (with one false positive in do_mount()).
Signed-off-by: Turbo Fredriksson turbo@bayour.com
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Richard Yao <ryao@gentoo.org>
Reviewed by: Chris Dunlap <cdunlap@llnl.gov>
Closes#2974Closes#2107
Commit 60e9f69 added the --with-mounthelperdir option for Gentoo
and in the process accidentally modified the default installation
location. For security reasons mount(8) expects it to only be
installed under /sbin.
Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#3426
Commit f4af6bb783 which added support
for REQ_FAILFAST_MASK but the new autoconf test didn't use the same
preprocessor macro name as the code did.
The effect is that FAILFAST mode has not been enabled for ZoL in any
post-2.6.35 kernel.
Retire the HAVE_BIO_RW_FAILFAST interface used in pre-2.6.28 kernels.
Raise an error condition if the FAILFAST interface can't be detected.
Signed-off-by: Tim Chase <tim@onlight.com
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#3386
Provide a Redhat specific zfs-kmod.spec file which uses the old style
kmods (not kmods2) packaging. By using the provided kmodtool script
packages can be built which support weak modules. This allows for the
kernel to be updated without having to rebuild the ZFS kernel modules.
Packages for RHEL/Centos/SL/TOSS which use this spec file can by built
as follows:
$ ./configure --with-spec=redhat
$ make rpms
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Originally it was thought that custom spec files might be required
for Fedora. Happily that has turns out not to be the case. Since
this directory just contains symlinks to the generic spec files it
can be removed.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Explicitly disable the unused by variable warnings by setting
__attribute__((unused)) for bdi_setup_and_register(). This is
required because the function is defined with the __must_check
attribute.
Signed-off-by: Bill McGonigle <bill-github.com-public1@bfccomputing.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#3141
The 'capabilities' argument which was passed to bdi_setup_and_register()
has been removed. File systems should no longer pass BDI_CAP_MAP_COPY.
For our purposes this means there are now three different interfaces
which must be handled. A zpl_bdi_setup_and_register() wrapper function
has been introduced to provide a single interface to the ZPL code.
* 2.6.32 - 2.6.33, bdi_setup_and_register() is not exported.
* 2.6.34 - 3.19, bdi_setup_and_register() takes 3 arguments.
* 4.0 - x.y, bdi_setup_and_register() takes 2 arguments.
I've also taken this opportunity to remove HAVE_BDI because kernels
older then 2.6.32 are no longer supported. All kernels newer than
this will have one of the above interfaces.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Closes#3128
struct access f->f_dentry->d_inode was replaced by accessor function
file_inode(f)
Signed-off-by: Joerg Thalheim <joerg@higgsboson.tk>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#3084
Using AC_LANG_SOURCE with some versions of autoconf is problematic if
the given source is to be written to a header file. Such versions assume
the contents are to be written to conftest.c and generate shell code to
that effect. The contents of the test program to detect support for
Linux tracepoints were consequently malformed (containing the source for
conftest.h) so the build system incorrectly disabled tracepoints
support. Fix this in ZFS_LINUX_TRY_COMPILE_HEADER by passing the header
source directly to ZFS_LINUX_COMPILE_IFELSE.
Signed-off-by: Ned Bass <bass6@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #2953
This patch leverages Linux tracepoints from within the ZFS on Linux
code base. It also refactors the debug code to bring it back in sync
with Illumos.
The information exported via tracepoints can be used for a variety of
reasons (e.g. debugging, tuning, general exploration/understanding,
etc). It is advantageous to use Linux tracepoints as the mechanism to
export this kind of information (as opposed to something else) for a
number of reasons:
* A number of external tools can make use of our tracepoints
"automatically" (e.g. perf, systemtap)
* Tracepoints are designed to be extremely cheap when disabled
* It's one of the "accepted" ways to export this kind of
information; many other kernel subsystems use tracepoints too.
Unfortunately, though, there are a few caveats as well:
* Linux tracepoints appear to only be available to GPL licensed
modules due to the way certain kernel functions are exported.
Thus, to actually make use of the tracepoints introduced by this
patch, one might have to patch and re-compile the kernel;
exporting the necessary functions to non-GPL modules.
* Prior to upstream kernel version v3.14-rc6-30-g66cc69e, Linux
tracepoints are not available for unsigned kernel modules
(tracepoints will get disabled due to the module's 'F' taint).
Thus, one either has to sign the zfs kernel module prior to
loading it, or use a kernel versioned v3.14-rc6-30-g66cc69e or
newer.
Assuming the above two requirements are satisfied, lets look at an
example of how this patch can be used and what information it exposes
(all commands run as 'root'):
# list all zfs tracepoints available
$ ls /sys/kernel/debug/tracing/events/zfs
enable filter zfs_arc__delete
zfs_arc__evict zfs_arc__hit zfs_arc__miss
zfs_l2arc__evict zfs_l2arc__hit zfs_l2arc__iodone
zfs_l2arc__miss zfs_l2arc__read zfs_l2arc__write
zfs_new_state__mfu zfs_new_state__mru
# enable all zfs tracepoints, clear the tracepoint ring buffer
$ echo 1 > /sys/kernel/debug/tracing/events/zfs/enable
$ echo 0 > /sys/kernel/debug/tracing/trace
# import zpool called 'tank', inspect tracepoint data (each line was
# truncated, they're too long for a commit message otherwise)
$ zpool import tank
$ cat /sys/kernel/debug/tracing/trace | head -n35
# tracer: nop
#
# entries-in-buffer/entries-written: 1219/1219 #P:8
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
lt-zpool-30132 [003] .... 91344.200050: zfs_arc__miss: hdr...
z_rd_int/0-30156 [003] .... 91344.200611: zfs_new_state__mru...
lt-zpool-30132 [003] .... 91344.201173: zfs_arc__miss: hdr...
z_rd_int/1-30157 [003] .... 91344.201756: zfs_new_state__mru...
lt-zpool-30132 [003] .... 91344.201795: zfs_arc__miss: hdr...
z_rd_int/2-30158 [003] .... 91344.202099: zfs_new_state__mru...
lt-zpool-30132 [003] .... 91344.202126: zfs_arc__hit: hdr ...
lt-zpool-30132 [003] .... 91344.202130: zfs_arc__hit: hdr ...
lt-zpool-30132 [003] .... 91344.202134: zfs_arc__hit: hdr ...
lt-zpool-30132 [003] .... 91344.202146: zfs_arc__miss: hdr...
z_rd_int/3-30159 [003] .... 91344.202457: zfs_new_state__mru...
lt-zpool-30132 [003] .... 91344.202484: zfs_arc__miss: hdr...
z_rd_int/4-30160 [003] .... 91344.202866: zfs_new_state__mru...
lt-zpool-30132 [003] .... 91344.202891: zfs_arc__hit: hdr ...
lt-zpool-30132 [001] .... 91344.203034: zfs_arc__miss: hdr...
z_rd_iss/1-30149 [001] .... 91344.203749: zfs_new_state__mru...
lt-zpool-30132 [001] .... 91344.203789: zfs_arc__hit: hdr ...
lt-zpool-30132 [001] .... 91344.203878: zfs_arc__miss: hdr...
z_rd_iss/3-30151 [001] .... 91344.204315: zfs_new_state__mru...
lt-zpool-30132 [001] .... 91344.204332: zfs_arc__hit: hdr ...
lt-zpool-30132 [001] .... 91344.204337: zfs_arc__hit: hdr ...
lt-zpool-30132 [001] .... 91344.204352: zfs_arc__hit: hdr ...
lt-zpool-30132 [001] .... 91344.204356: zfs_arc__hit: hdr ...
lt-zpool-30132 [001] .... 91344.204360: zfs_arc__hit: hdr ...
To highlight the kind of detailed information that is being exported
using this infrastructure, I've taken the first tracepoint line from the
output above and reformatted it such that it fits in 80 columns:
lt-zpool-30132 [003] .... 91344.200050: zfs_arc__miss:
hdr {
dva 0x1:0x40082
birth 15491
cksum0 0x163edbff3a
flags 0x640
datacnt 1
type 1
size 2048
spa 3133524293419867460
state_type 0
access 0
mru_hits 0
mru_ghost_hits 0
mfu_hits 0
mfu_ghost_hits 0
l2_hits 0
refcount 1
} bp {
dva0 0x1:0x40082
dva1 0x1:0x3000e5
dva2 0x1:0x5a006e
cksum 0x163edbff3a:0x75af30b3dd6:0x1499263ff5f2b:0x288bd118815e00
lsize 2048
} zb {
objset 0
object 0
level -1
blkid 0
}
For the specific tracepoint shown here, 'zfs_arc__miss', data is
exported detailing the arc_buf_hdr_t (hdr), blkptr_t (bp), and
zbookmark_t (zb) that caused the ARC miss (down to the exact DVA!).
This kind of precise and detailed information can be extremely valuable
when trying to answer certain kinds of questions.
For anybody unfamiliar but looking to build on this, I found the XFS
source code along with the following three web links to be extremely
helpful:
* http://lwn.net/Articles/379903/
* http://lwn.net/Articles/381064/
* http://lwn.net/Articles/383362/
I should also node the more "boring" aspects of this patch:
* The ZFS_LINUX_COMPILE_IFELSE autoconf macro was modified to
support a sixth paramter. This parameter is used to populate the
contents of the new conftest.h file. If no sixth parameter is
provided, conftest.h will be empty.
* The ZFS_LINUX_TRY_COMPILE_HEADER autoconf macro was introduced.
This macro is nearly identical to the ZFS_LINUX_TRY_COMPILE macro,
except it has support for a fifth option that is then passed as
the sixth parameter to ZFS_LINUX_COMPILE_IFELSE.
These autoconf changes were needed to test the availability of the Linux
tracepoint macros. Due to the odd nature of the Linux tracepoint macro
API, a separate ".h" must be created (the path and filename is used
internally by the kernel's define_trace.h file).
* The HAVE_DECLARE_EVENT_CLASS autoconf macro was introduced. This
is to determine if we can safely enable the Linux tracepoint
functionality. We need to selectively disable the tracepoint code
due to the kernel exporting certain functions as GPL only. Without
this check, the build process will fail at link time.
In addition, the SET_ERROR macro was modified into a tracepoint as well.
To do this, the 'sdt.h' file was moved into the 'include/sys' directory
and now contains a userspace portion and a kernel space portion. The
dprintf and zfs_dbgmsg* interfaces are now implemented as tracepoint as
well.
Signed-off-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Ned Bass <bass6@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
This file may be added by automake and therefore should be added
to config/.gitignore. For the full list of possible auxiliary
programs see the full automake documentation.
http://www.gnu.org/software/automake/manual/automake.html#Auxiliary-Programs
Signed-off-by: Marcel Wysocki <maci.stgn@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#2848
Installing outside of the prefix is not permissible under Gentoo Prefix.
The package manager will cause the installation process to fail if/when
it sees this. We could handle this by disabling systemd support on
prefix because systemd does not check these paths, but the Gentoo
Council decided that small files such as these should be installed.
That means disabling systemd support on prefix is not an acceptable
workaround. As a consequence, we need some way of control the directory
into which these files are installed.
Making this configurable increases our compliance with the
freedesktop.org specification, which allows these files to be installed
into /etc/modules-load.d:
http://www.freedesktop.org/software/systemd/man/modules-load.d.html
Signed-off-by: Richard Yao <richard.yao@clusterhq.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #2641
Installing outside of the prefix is not permissible under Gentoo Prefix.
The package manager will cause the installation process to fail if/when
it sees this. I could script a workaround inside the ebuild, but it
seemed to make more sense to make this more configurable.
Signed-off-by: Richard Yao <richard.yao@clusterhq.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #2641
Since we changed the default location for the kernel headers to respect
--prefix in the SPL, we must search that location to prevent user builds
from breaking.
Signed-off-by: Richard Yao <richard.yao@clusterhq.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #2641
Apply the license specified in the META file to ensure the
compatibility checks are all performed consistently.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #2757
The HAVE_IOCTL_* configure checks were originally added for
compatibility with an ancient version of glibc. This support
and additional complexity is no longer needed and is therefore
being removed.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Closes#585
There are two common locations where udev and dracut components are
commonly installed. When building packages using the 'make rpm|deb'
targets check those common locations and pass them to rpmbuild. For
non-standard configurations these values can be provided by the
the following configure options:
--with-udevdir=DIR install udev helpers [default=check]
--with-udevruledir=DIR install udev rules [[UDEVDIR/rules.d]]
--with-dracutdir=DIR install dracut helpers [default=check]
When rebuilding using the source packages the per-distribution
default values specified in the spec file will be used. This is
the preferred way to build packages for a distribution but the
ability to override the defaults is provided as a convenience.
Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#2310Closes#1680
Set LANG=C before calling 'rpmbuild' to avoid rpmbuild failing on
the translated date string in the changelog.
Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes: zfsonlinux/spl#306
This adds ability to set the location of the kernel via defines
when building from the spec files. This is useful when building
against a kernel installed in a non-standard location.
Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1874
From day one the various ZFS libraries should have been placed in their
own sub-packages. Primarily this allows for multiple major versions of
the libraries to be concurrently installed. It also facilitates a
smaller build environment by minimizing the required dependencies.
The specific changes required to split the libraries from the utilities
are as follows:
* libzpool2, libnvpair1, libuutil1, and libzfs2 packages were added
and contain the versioned shared libraries. The Fedora packaging
guidelines discourage providing static libraries so they are not
included in the packages.
http://fedoraproject.org/wiki/Packaging:Guidelines#Packaging_Static_Libraries
* The zfs-devel package was renamed libzfs2-devel and the new package
obsoletes the old zfs-devel package. This package includes all
the required headers for the libzpool2, libnvpair1, libuutil1, and
libzfs2 libraries and their respective unversioned shared libraries.
This package should eventually be split in to individual lib*-devel
packages but it will still take some work to cleanly separate them.
Therefore the libzfs2-devel package provides the expected lib*-devel
packages so the all proper dependencies can still be created.
http://fedoraproject.org/wiki/Packaging:Guidelines#Devel_Packages
* Moved '/sbin/ldconfig' execution from the zfs packge to each of the
new library packages as described by the packaging guidelines.
http://fedoraproject.org/wiki/Packaging:Guidelines#Shared_Libraries
* The /usr/share/doc/ files were moved in to the libzfs2-devel package.
* Updated config/deb.am to be aware of the packaging changes. This
ensures that 'deb-utils' make target converts all the resulting
packages generated by the 'rpm-utils' target.
Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes: #2329Closes: #2341
Issue: #2145
When creating packages in a git repository the release number
can be automatically set by 'git describe'. This normally works
well but if your repository has newer tags which match the form
NAME-VERSION* the release may be incorrectly calculated. To
prevent this the match patten has been restricted to the contents
of the META file, NAME-VERSION.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
We need inode_owner_or_capable() for ZFS file attributes in addition to
xattrs, so it should go into its own file. This moves it into its own
file and changes it to be more comprehensive. It will now fail if no
known good API is detected.
Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #1691
rq_for_each_segment changed from taking bio_vec * to taking bio_vec.
We provide rq_for_each_segment4 which takes both.
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #2124
bi_sector, bi_size and bi_idx are moved from bio to bio->bi_iter.
This patch creates BIO_BI_*(bio) macros to hide the differences.
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #2124
posix_acl_{create,chmod} is changed to __posix_acl_{create_chmod}
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #2124
zed supports a '-M' cmdline opt to lock all pages in memory via
mlockall(). The _POSIX_MEMLOCK define is checked to determine whether
this function is supported. The current test assumes mlockall()
is supported if _POSIX_MEMLOCK is non-zero. However, this test is
insufficient according to mlock(2) and sysconf(3). If _POSIX_MEMLOCK
is -1, mlockall() is not supported; but if _POSIX_MEMLOCK is 0,
availability must be checked at runtime.
This commit adds an autoconf check for mlockall() to user.m4. The zed
code block for mlockall() is now guarded with a test for HAVE_MLOCKALL.
If defined, mlockall() will be called and its runtime availability
checked via its return value.
Signed-off-by: Chris Dunlap <cdunlap@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #2
Add macro definitions to AM_CPPFLAGS to propagate makefile installation
directory variables for libexecdir, runstatedir, sbindir, and
sysconfdir.
https://www.gnu.org/software/autoconf/manual/autoconf-2.69/html_node/Installation-Directory-Variables.html
A corollary is that you should not use these variables except
in makefiles. For instance, instead of trying to evaluate
datadir in configure and hard-coding it in makefiles using e.g.,
'AC_DEFINE_UNQUOTED([DATADIR], ["$datadir"], [Data directory.])',
you should add -DDATADIR='$(datadir)' to your makefile's definition
of CPPFLAGS (AM_CPPFLAGS if you are also using Automake).
The runstatedir directory is for "installing data files which the
programs modify while they run, that pertain to one specific machine,
and which need not persist longer than the execution of the program".
https://www.gnu.org/prep/standards/html_node/Directory-Variables.html
It will be defined by autoconf 2.70 or later, and default to
"$(localstatedir)/run".
http://git.savannah.gnu.org/gitweb/?p=autoconf.git;a=commit;h=a197431414088a417b407b9b20583b2e8f7363bd
Signed-off-by: Chris Dunlap <cdunlap@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #2
torvalds/linux@8077c0d983 added a
__must_check to the bdi_setup_and_register(), which caused our autotools
check to break. zfsonlinux/zfs@729210564a
was intended to correct that, but it depended on -Wno-unused-result,
which is unrecognized in older GCC versions. That commit has been
reverted in favor of a solution that does not require
-Wno-unused-result.
Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#2102Closes#2135
Older GCC versions do not obey -Wno-unused-result. This reverts commit
729210564a in favor of a solution that
does not require -Wno-unused-result.
Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #1906
When testing compiler flags, we only need to do compile test. Otherwise,
configure will fail with "configure: error: cannot run test program while
cross compiling" when cross compiling.
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#2191
This adds systemd unit files replacing the functionality offered by
the SysV init script found in etc/init.d.
It has been developed and tested on Fedora 19, Fedora 20
and openSuSE 13.1.
Four unit files and one target are offered.
zfs-import-cache.service:
Import pools from /etc/zfs/zpool.cache. This unit will wait for
udev to settle.
zfs-import-scan.service:
Import pools by scanning /dev/disk/by-id for zvols. This unit will
only run if /etc/zfs/zpool.cache is not present. This unit will wait
for udev to settle
zfs-mount.service:
Mount ZFS native filesystems. It contains a dependency to be loaded
before local-fs.target.
zfs-share.service:
Share NFS/SMB filesystems. This unit contains a dependency that
will cause it to be restarted whenever the smb or nfs-server unit
is restarted, restoring the shares added.
zfs.target:
This target pulls in the other units in order to start ZFS. It's
the only unit that can be enabled/disabled, all other services
are static and pulled in by dependencies. It will honour zfs=off
and zfs=no options on the kernel command line.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#2108
GCC >+ 4.8's aggressive loop optimization breaks some of the iterators
over the dn_blkptr[] pseudo-array in dnode_phys. Since dn_blkptr[] is
defined as a single-element array, GCC believes an iterator can only
access index 0 and will unroll the loop into a single iteration.
One way to resolve the issue would be to cast the array to a pointer
and fix all the iterators that might break. The only loop where it
is known to cause a problem is this loop in dmu_objset_write_ready():
for (i = 0; i < dnp->dn_nblkptr; i++)
bp->blk_fill += dnp->dn_blkptr[i].blk_fill;
In the common case where dn_nblkptr is 3, the loop is only executed a
single time and "i" is equal to 1 following the loop.
The specific breakage caused by this problem is that the blk_fill of
root block pointers wouldn't be set properly when more than one blkptr
is in use (when no indrect blocks are needed).
The simple reproducing sequence is:
zpool create tank /tank.img
zdb -ddddd tank 0
Notice that "fill=31", however, there are two L0 indirect blocks with
"F=31" and "F=5". The fill count should be 36 rather than 31. This
problem causes an assert to be hit in a simple "zdb tank" when built
with --enable-debug.
However, this approach was not taken because we need to be absolutely
sure we catch all instances of this unwanted optimization. Therefore,
the build system has been updated to detect if GCC supports the
aggressive loop optimization. If it does the optimization will be
explicitly disabled using the -fno-aggressive-loop-optimization option.
Original-fix-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#2010Closes#2051
Four new dataset properties have been added to support SELinux. They
are 'context', 'fscontext', 'defcontext' and 'rootcontext' which map
directly to the context options described in mount(8). When one of
these properties is set to something other than 'none'. That string
will be passed verbatim as a mount option for the given context when
the filesystem is mounted.
For example, if you wanted the rootcontext for a filesystem to be set
to 'system_u:object_r:fs_t' you would set the property as follows:
$ zfs set rootcontext="system_u:object_r:fs_t" storage-pool/media
This will ensure the filesystem is automatically mounted with that
rootcontext. It is equivalent to manually specifying the rootcontext
with the -o option like this:
$ zfs mount -o rootcontext=system_u:object_r:fs_t storage-pool/media
By default all four contexts are set to 'none'. Further information
on SELinux contexts is detailed in mount(8) and selinux(8) man pages.
Signed-off-by: Matthew Thode <prometheanfire@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Closes#1504
This broke compilation against Linux 3.13 and GCC 4.7.3.
Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1906
There's a missing semicolon and equals sign in the first hunk of this
commit in config/kernel-bdi.m4. This results in the test always
failing. The effects were noticed when rrdtool, a tool which modifies
files by mmap() and msync(), would have data never get saved to disk
in spite of the files working while the mounted filesystem remains
mounted.
Signed-off-by: DHE <git@dehacked.net>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Closes#1889
This change adds support for Posix ACLs by storing them as an xattr
which is common practice for many Linux file systems. Since the
Posix ACL is stored as an xattr it will not overwrite any existing
ZFS/NFSv4 ACLs which may have been set. The Posix ACL will also
be non-functional on other platforms although it may be visible
as an xattr if that platform understands SA based xattrs.
By default Posix ACLs are disabled but they may be enabled with
the new 'aclmode=noacl|posixacl' property. Set the property to
'posixacl' to enable them. If ZFS/NFSv4 ACL support is ever added
an appropriate acltype will be added.
This change passes the POSIX Test Suite cleanly with the exception
of xacl/00.t test 45 which is incorrect for Linux (Ext4 fails too).
http://www.tuxera.com/community/posix-test-suite/
Signed-off-by: Massimo Maggi <me@massimo-maggi.eu>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#170
libblkid support is dormant because the autotools check is broken and
liblkid identifies ZFS vdevs as "zfs_member", not "zfs". We fix that
with a few changes:
First, we fix the libblkid autotools check to do a few things:
1. Make a 64MB file, which is the minimum size ZFS permits.
2. Make 4 fake uberblock entries to make libblkid's check succeed.
3. Return 0 upon success to make autotools use the success case.
4. Include stdlib.h to avoid implicit declration of free().
5. Check for "zfs_member", not "zfs"
6. Make --with-blkid disable autotools check (avoids Gentoo sandbox violation)
7. Pass '-lblkid' correctly using LIBS not LDFLAGS.
Second, we change the libblkid support to scan for "zfs_member", not
"zfs".
This makes --with-blkid work on Gentoo.
Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #1751
c38367c73f was meant to eliminate runtime
function pointer modifications in autotools checks because they were
prone to false negatives on kernels hardened by the PaX project.
Unfortunately, I missed the xattr_handler and super_block->s_bdi
autotools checks. Recent changes to PaX constified
xattr_handler->get/set, which lead me to discover this oversight.
Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1433
Commit torvalds/linux@2233f31aad
replaced ->readdir() with ->iterate() in struct file_operations.
All filesystems must now use the new ->iterate method.
To handle this the code was reworked to use the new ->iterate
interface. Care was taken to keep the majority of changes
confined to the ZPL layer which is already Linux specific.
However, minor changes were required to the common zfs_readdir()
function.
Compatibility with older kernels was accomplished by adding
versions of the trivial dir_emit* helper functions. Also the
various *_readdir() functions were reworked in to wrappers
which create a dir_context structure to pass to the new
*_iterate() functions.
Unfortunately, the new dir_emit* functions prevent us from
passing a private pointer to the filldir function. The xattr
directory code leveraged this ability through zfs_readdir()
to generate the list of xattr names. Since we can no longer
use zfs_readdir() a simplified zpl_xattr_readdir() function
was added to perform the same task.
Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1653
Issue #1591
The iterate_supers_type() function which was introduced in the
3.0 kernel was supposed to provide a safe way to call an arbitrary
function on all super blocks of a specific type. Unfortunately,
because a list_head was used a bug was introduced which made it
possible for iterate_supers_type() to get stuck spinning on a
super block which was just deactivated.
This can occur because when the list head is removed from the
fs_supers list it is reinitialized to point to itself. If the
iterate_supers_type() function happened to be processing the
removed list_head it will get stuck spinning on that list_head.
The bug was fixed in the 3.3 kernel by converting the list_head
to an hlist_node. However, to resolve the issue for existing
3.0 - 3.2 kernels we detect when a list_head is used. Then to
prevent the spinning from occurring the .next pointer is set to
the fs_supers list_head which ensures the iterate_supers_type()
function will always terminate.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1045Closes#861Closes#790
Linux kernel commit torvalds/linux@db2a144 changed the return type
of block_device_operations->release() to void. Detect the expected
prototype and defined our callout accordingly.
Signed-off-by: Chris Dunlop <chris@onthe.net.au>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1494
The approach taken was the rework zfs_holey() as little as
possible and then just wrap the code as needed to ensure
correct locking and error handling.
Tested with xfstests 285 and 286. All tests pass except for
7-9 of 285 which try to reserve blocks first via fallocate(2)
and fail because fallocate(2) is not yet supported.
Note that the filp->f_lock spinlock did not exist prior to
Linux 2.6.30, but we avoid the need for autotools check by
virtue of the fact that SEEK_DATA/SEEK_HOLE support was not
added until Linux 3.1.
An autoconf check was added for lseek_execute() which is
currently a private function but the expectation is that it
will be exported perhaps as early as Linux 3.11.
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1384
The previous code was only waiting for the symver file. But the
postinst target of the DKMS script for SPL will not only create
the symvers file, but also the header spl_config.h.
If we are waiting in the configure script of ZFS for the SPL
symvers file, then we also need to wait for spl_config.h.
Otherwise the configure script will abort because the spl_config.h
is not yet available.
On top of that, the function ZFS_AC_SPL_MODULE_SYMVERS is moved
to the end of the function ZFS_AC_SPL to allow both checks share
the with-spl-timeout parameter.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1431
When the kmod packaging was introduced the ability to pass the
--enable-debug and --enable-dmu-tx options from configure all
the way through to `make rpm|deb` was accidenally lost. Update
ZFS_AC_RPM to explicitlu set RPM_DEFINE_COMMON with these
rpmbuild defines.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #1402
Preserve the release field when creating Debian packages. The
--keep-version option was not used because it results in a failure
when the git '<commit>_<hash>' syntax is used for the release.
The '_' is a valid character for RPM packages but not for DEBs.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Issue #1402
Issue #928