= Motivation
While dealing with another performance issue (see 126118f) we noticed
that we spend a lot of time in various places in the kernel when
constructing long nvlists. The problem is that when an nvlist is created
with the NV_UNIQUE_NAME set (which is the case most of the time), we do
a linear search through the whole list to ensure uniqueness for every
entry we add.
An example of the above scenario can be seen in the following
flamegraph, where more than have the time of the zfsdev_ioctl() is spent
on constructing nvlists. Flamegraph:
https://sdimitro.github.io/img/flame/sdimitro_snap_unmount3.svg
Adding a table to speed up lookups will help situations where we just
construct an nvlist (like the scenario above), in addition to regular
lookups and removals.
= What this patch does
In this diff we've implemented a hash-table on top of the nvlist code
that converts most nvlist operations from O(# number of entries) to
O(1)* (the start is for amortized time as the hash-table grows and
shrinks depending on the # of entries - plain lookup is strictly O(1)).
= Performance Analysis
To analyze the performance improvement I just used the setup from the
snapshot deletion issue mentioned above in the Motivation section.
Basically I created 10K filesystems with one snapshot each and then I
just used the API of libZFS_Core to pass down an nvlist of all the
snapshots to have them deleted. The reason I used my own driver program
was to have clean performance results of what actually happens in the
kernel. The flamegraphs and wall clock times mentioned below were
gathered from the start to the end of the driver program's run. Between
trials the testpool used was completely destroyed, the system was
rebooted and the testpool was completely recreated. The reason for this
dance was to get consistent results.
== Results (before patch):
=== Sampling Flamegraphs
[Trial 1] https://sdimitro.github.io/img/flame/DLPX-53417/trial-A.svg
[Trial 2] https://sdimitro.github.io/img/flame/DLPX-53417/trial-A2.svg
[Trial 3] https://sdimitro.github.io/img/flame/DLPX-53417/trial-A3.svg
=== Wall clock times (in seconds)
```
[Trial 4]
real 5.3
user 0.4
sys 2.3
[Trial 5]
real 8.2
user 0.4
sys 2.4
[Trial 6]
real 6.0
user 0.5
sys 2.3
```
== Results (after patch):
=== Sampling Flamegraphs
[Trial 1] https://sdimitro.github.io/img/flame/DLPX-53417/trial-Ae.svg
[Trial 2] https://sdimitro.github.io/img/flame/DLPX-53417/trial-A2e.svg
[Trial 3] https://sdimitro.github.io/img/flame/DLPX-53417/trial-A3e.svg
=== Wall clock times (in seconds)
```
[Trial 4]
real 4.9
user 0.0
sys 0.9
[Trial 5]
real 3.8
user 0.0
sys 0.9
[Trial 6]
real 3.6
user 0.0
sys 0.9
```
== Analysis
The results between the trials are consistent so in this sections I will
only talk about the flamegraph results from trial-1 and the wall-clock
results from trial-4.
From trial-1 we can see that zfs_dev_ioctl() goes from 2,331 to 996
samples counts. Specifically, the samples from fnvlist_add_nvlist() and
spa_history_log_nvl() are almost gone (~500 & ~800 to 5 & 5 samples),
leaving zfs_ioc_destroy_snaps() to dominate most samples from
zfs_dev_ioctl().
From trial-4 we see that the user time dropped to 0 secods. I believe
the consistent 0.4 seconds before my patch was applied was due to my
driver program constructing the long nvlist of snapshots so it can pass
it to the kernel. As for the system time, the effect there is more clear
(2.3 down to 0.9 seconds).
Porting Notes:
* DATA_TYPE_DONTCARE case added to switch in fm_nvprintr() and
zpool_do_events_nvprint().
Authored by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
OpenZFS-issue: https://www.illumos.org/issues/9580
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/b5eca7b1Closes#7748
Minimal changes required to integrate the SPL sources in to the
ZFS repository build infrastructure and packaging.
Build system and packaging:
* Renamed SPL_* autoconf m4 macros to ZFS_*.
* Removed redundant SPL_* autoconf m4 macros.
* Updated the RPM spec files to remove SPL package dependency.
* The zfs package obsoletes the spl package, and the zfs-kmod
package obsoletes the spl-kmod package.
* The zfs-kmod-devel* packages were updated to add compatibility
symlinks under /usr/src/spl-x.y.z until all dependent packages
can be updated. They will be removed in a future release.
* Updated copy-builtin script for in-kernel builds.
* Updated DKMS package to include the spl.ko.
* Updated stale AUTHORS file to include all contributors.
* Updated stale COPYRIGHT and included the SPL as an exception.
* Renamed README.markdown to README.md
* Renamed OPENSOLARIS.LICENSE to LICENSE.
* Renamed DISCLAIMER to NOTICE.
Required code changes:
* Removed redundant HAVE_SPL macro.
* Removed _BOOT from nvpairs since it doesn't apply for Linux.
* Initial header cleanup (removal of empty headers, refactoring).
* Remove SPL repository clone/build from zimport.sh.
* Use of DEFINE_RATELIMIT_STATE and DEFINE_SPINLOCK removed due
to build issues when forcing C99 compilation.
* Replaced legacy ACCESS_ONCE with READ_ONCE.
* Include needed headers for `current` and `EXPORT_SYMBOL`.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
TEST_ZIMPORT_SKIP="yes"
Closes#7556
When --enable-asan is provided to configure then build all user
space components with fsanitize=address. For kernel support
use the Linux KASAN feature instead.
https://github.com/google/sanitizers/wiki/AddressSanitizer
When using gcc version 4.8 any test case which intentionally
generates a core dump will fail when using --enable-asan.
The default behavior is to disable core dumps and only newer
versions allow this behavior to be controled at run time with
the ASAN_OPTIONS environment variable.
Additionally, this patch includes some build system cleanup.
* Rules.am updated to set the minimum AM_CFLAGS, AM_CPPFLAGS,
and AM_LDFLAGS. Any additional flags should be added on a
per-Makefile basic. The --enable-debug and --enable-asan
options apply to all user space binaries and libraries.
* Compiler checks consolidated in always-compiler-options.m4
and renamed for consistency.
* -fstack-check compiler flag was removed, this functionality
is provided by asan when configured with --enable-asan.
* Split DEBUG_CFLAGS in to DEBUG_CFLAGS, DEBUG_CPPFLAGS, and
DEBUG_LDFLAGS.
* Moved default kernel build flags in to module/Makefile.in and
split in to ZFS_MODULE_CFLAGS and ZFS_MODULE_CPPFLAGS. These
flags are set with the standard ccflags-y kbuild mechanism.
* -Wframe-larger-than checks applied only to binaries or
libraries which include source files which are built in
both user space and kernel space. This restriction is
relaxed for user space only utilities.
* -Wno-unused-but-set-variable applied only to libzfs and
libzpool. The remaining warnings are the result of an
ASSERT using a variable when is always declared.
* -D_POSIX_PTHREAD_SEMANTICS and -D__EXTENSIONS__ dropped
because they are Solaris specific and thus not needed.
* Ensure $GDB is defined as gdb by default in zloop.sh.
Signed-off-by: DHE <git@dehacked.net>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#7027
Fix build errors with gcc 7.2.0 on Gentoo with kernel 4.14
built with CONFIG_GCC_PLUGIN_RANDSTRUCT=y such as:
module/nvpair/nvpair.c:2810:2:error:
positional initialization of field in ?struct? declared with
'designated_init' attribute [-Werror=designated-init]
nvs_native_nvlist,
^~~~~~~~~~~~~~~~~
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mark Wright <gienah@gentoo.org>
Closes#5390Closes#6903
Authored by: Richard Lowe <richlowe@richlowe.net>
Reviewed by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net>
Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com>
Reviewed by: Andy Stormont <andyjstormont@gmail.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
OpenZFS-issue: https://www.illumos.org/issues/2976
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/5c5f137Closes#6582
Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Robert Mustacchi <rm@joyent.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Reviewed-by: Don Brady <dev.fs.zfs@gmail.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
OpenZFS-issue: https://www.illumos.org/issues/7261
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/48dd5e6Closes#6579
Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed by: Robert Mustacchi <rm@joyent.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Reviewed-by: Don Brady <dev.fs.zfs@gmail.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
OpenZFS-issue: https://www.illumos.org/issues/8375
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/843c211Closes#6578
Authored by: Daniel Hoffman <dj.hoffman@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Steve Gonczi <steve.gonczi@delphix.com>
Reviewed by: Chris Williamson <chris.williamson@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Igor Kozhukhov <igor@dilos.org>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Reviewed by: Robert Mustacchi <rm@joyent.com>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Ported-by: George Melikov <mail@gmelikov.ru>
This change removes all gendered language that did not refer specifically
to an individual person or pet. The convention taken was to use
variations on "they" when referring to users and/or human beings, while
using "it" when referring to code, functions, and/or libraries.
Additionally, we took the liberty to fix up any whitespace issues that
were found in any files that were already being modified.
OpenZFS-issue: https://www.illumos.org/issues/7812
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/ad626dbCloses#5822
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Haakan T Johansson <f96hajo@chalmers.se>
Closes#5547Closes#5543
The following new test cases need to have execute permissions set:
userquota/groupspace_003_pos.ksh
userquota/userquota_013_pos.ksh
userquota/userspace_003_pos.ksh
upgrade/upgrade_userobj_001_pos.ksh
upgrade/setup.ksh
upgrade/cleanup.ksh
The following source files accidentally were marked executable:
lib/libzpool/kernel.c
lib/libshare/nfs.c
lib/libzfs/libzfs_dataset.c
lib/libzfs/libzfs_util.c
tests/zfs-tests/cmd/rm_lnkcnt_zero_file/rm_lnkcnt_zero_file.c
tests/zfs-tests/cmd/dir_rd_update/dir_rd_update.c
cmd/zed/zed_exec.c
module/icp/core/kcf_sched.c
module/zfs/dsl_pool.c
module/zfs/arc.c
module/nvpair/nvpair.c
man/man5/zfs-module-parameters.5
Reviewed-by: GeLiXin <ge.lixin@zte.com.cn>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#5241
coverity scan CID:147531,type: Argument cannot be negative
- may copy data with negative size
coverity scan CID:147532,type: resource leaks
- may close a fd which is negative
coverity scan CID:147533,type: resource leaks
- may call pwrite64 with a negative size
coverity scan CID:147535,type: resource leaks
- may call fdopen with a negative fd
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: GeLiXin <ge.lixin@zte.com.cn>
Closes#5176
nvlist_pack() and nvlist_unpack are implemented recursively, which can
cause the stack to overflow with a deeply nested nvlist; i.e. an nvlist
which contains an nvlist, which contains an nvlist, which...
Unprivileged users can pass an nvlist to the kernel via certain ioctls
on /dev/zfs, which the kernel will unpack without additional permission
checking or validation. Therefore, an unprivileged user can cause the
kernel's stack to overflow and panic.
Ideally, these functions would be implemented non-recursively. As a
quick fix, this patch limits the depth of the recursion and returns an
error when attempting to pack and unpack a deeply-nested nvlist.
Signed-off-by: Adam Leventhal <ahl@delphix.com>
Signed-off-by: George Wilson <george.wilson@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: Prakash Surya <prakash.surya@delphix.com>
OpenZFS-issue: https://www.illumos.org/issues/7263
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/0511d6d
-
The indefinite article before nvlist should be "an", not "a".
We have 27 "an nvlist" and 7 "a nvlist" in our comment, they should
stay the same as we are such a strict filesystem.
Signed-off-by: GeLiXin <ge.lixin@zte.com.cn>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#4941
Build products from an out of tree build should be written
relative to the build directory. Sources should be referred
to by their locations in the source directory.
This is accomplished by adding the 'src' and 'obj' variables
for the module Makefile.am, using relative paths to reference
source files, and by setting VPATH when source files are not
co-located with the Makefile. This enables the following:
$ mkdir build
$ cd build
$ ../configure \
--with-spl=$HOME/src/git/spl/ \
--with-spl-obj=$HOME/src/git/spl/build
$ make -s
This change also has the advantage of resolving the following
warning which is generated by modern versions of automake.
Makefile.am:00: warning: source file 'xxx' is in a subdirectory,
Makefile.am:00: but option 'subdir-objects' is disabled
Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1082
In the original implementation of the SPL wrappers were provided
for module initialization and cleanup. This was done to abstract
away any compatibility code which might be needed for the SPL.
As it turned out the only significant compatibility issue was that
the default pwd during module load differed under Illumos and Linux.
Since this is such as minor thing and the wrappers complicate the
code they are being retired.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#2985
Several of the nvlist functions may perform allocations larger than
the 32k warning threshold. Convert them to use vmem_alloc() so the
best allocator is used.
Commit efcd79a retired KM_NODEBUG which was used to suppress large
allocation warnings. Concurrently the large allocation warning threshold
was increased from 8k to 32k. The goal was to identify the remaining
locations, such as this one, where the allocation can be larger than
32k. This patch is expected fine tuning resulting for the kmem-rework
changes, see commit 6e9710f.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#3057Closes#3079Closes#3081
Callers of kmem_alloc() which passed the KM_NODEBUG flag to suppress
the large allocation warning have been replaced by vmem_alloc() as
appropriate. The updated vmem_alloc() call will not print a warning
regardless of the size of the allocation.
A careful reader will notice that not all callers have been changed
to vmem_alloc(). Some have only had the KM_NODEBUG flag removed.
This was possible because the default warning threshold has been
increased to 32k. This is desirable because it minimizes the need
for Linux specific code changes.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Inclusion of SPL compatibility headers was moved out of the public
header sys/types.h to avoid conflicts with external packages. Include a
few compatiblity headers explicitly to cope with that change. Also,
sort some linux-specific inclusions alphabetically.
Signed-off-by: Ned Bass <bass6@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#2898
The mis-aligned memory accesses in nvpair_native_embedded() and
nvpair_native_embedded_array() will cause a 'Bus Error' for
architectures such as Sparc which not fully byte addressible.
To avoid this issue care is taken to avoid dereferencing the
potentially mis-aligned packed nvlist_t.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ned Bass <bass6@llnl.gov>
Signed-off-by: marku89 <mar42@kola.li>
Issue #1700
This change is related to commit 81eaf15 which ensured the correct
allocation handlers were installed for nvlist_alloc(). The nvlist
functions nvlist_dup(), nvlist_pack(), and nvlist_unpack() suffer
from the same issue and have been updated accordingly.
Signed-off-by: Ned Bass <bass6@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #1937
The vast majority of these changes are in Linux specific code.
They are the result of not having an automated style checker to
validate the code when it was originally written. Others were
caused when the common code was slightly adjusted for Linux.
This patch contains no functional changes. It only refreshes
the code to conform to style guide.
Everyone submitting patches for inclusion upstream should now
run 'make checkstyle' and resolve any warning prior to opening
a pull request. The automated builders have been updated to
fail a build if when 'make checkstyle' detects an issue.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1821
Use the standard Linux MODULE_VERSION macro to expose the installed
zavl, znvpair, zunicode, zcommon, zfs, and zpios module versions.
This will also automatically add a checksum of the .c files and
headers in "srcversion". See:
/sys/module/zavl/version
/sys/module/zavl/srcversion
/sys/module/znvpair/version
/sys/module/znvpair/srcversion
/sys/module/zunicode/version
/sys/module/zunicode/srcversion
/sys/module/zcommon/version
/sys/module/zcommon/srcversion
/sys/module/zfs/version
/sys/module/zfs/srcversion
/sys/module/zpios/version
/sys/module/zpios/srcversion
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1923
2882 implement libzfs_core
2883 changing "canmount" property to "on" should not always remount dataset
2900 "zfs snapshot" should be able to create multiple, arbitrary snapshots at once
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Chris Siden <christopher.siden@delphix.com>
Reviewed by: Garrett D'Amore <garrett@damore.org>
Reviewed by: Bill Pijewski <wdp@joyent.com>
Reviewed by: Dan Kruchinin <dan.kruchinin@gmail.com>
Approved by: Eric Schrock <Eric.Schrock@delphix.com>
References:
https://www.illumos.org/issues/2882https://www.illumos.org/issues/2883https://www.illumos.org/issues/2900illumos/illumos-gate@4445fffbbb
Ported-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1293
Porting notes:
WARNING: This patch changes the user/kernel ABI. That means that
the zfs/zpool utilities built from master are NOT compatible with
the 0.6.2 kernel modules. Ensure you load the matching kernel
modules from master after updating the utilities. Otherwise the
zfs/zpool commands will be unable to interact with your pool and
you will see errors similar to the following:
$ zpool list
failed to read pool configuration: bad address
no pools available
$ zfs list
no datasets available
Add zvol minor device creation to the new zfs_snapshot_nvl function.
Remove the logging of the "release" operation in
dsl_dataset_user_release_sync(). The logging caused a null dereference
because ds->ds_dir is zeroed in dsl_dataset_destroy_sync() and the
logging functions try to get the ds name via the dsl_dataset_name()
function. I've got no idea why this particular code would have worked
in Illumos. This code has subsequently been completely reworked in
Illumos commit 3b2aab1 (3464 zfs synctask code needs restructuring).
Squash some "may be used uninitialized" warning/erorrs.
Fix some printf format warnings for %lld and %llu.
Apply a few spa_writeable() changes that were made to Illumos in
illumos/illumos-gate.git@cd1c8b8 as part of the 3112, 3113, 3114 and
3115 fixes.
Add a missing call to fnvlist_free(nvl) in log_internal() that was added
in Illumos to fix issue 3085 but couldn't be ported to ZoL at the time
(zfsonlinux/zfs@9e11c73) because it depended on future work.
The non-blocking allocation handlers in nvlist_alloc() would be
mistakenly assigned if any flags other than KM_SLEEP were passed.
This meant that nvlists allocated with KM_PUSHPUSH or other KM_*
debug flags were effectively always using atomic allocations.
While these failures were unlikely it could lead to assertions
because KM_PUSHPAGE allocations in particular are guaranteed to
succeed or block. They must never fail.
Since the existing API does not allow us to pass allocation
flags to the private allocators the cleanest thing to do is to
add a KM_PUSHPAGE allocator.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closeszfsonlinux/spl#249
3006 VERIFY[S,U,P] and ASSERT[S,U,P] frequently check if first
argument is zero
Reviewed by Matt Ahrens <matthew.ahrens@delphix.com>
Reviewed by George Wilson <george.wilson@delphix.com>
Approved by Eric Schrock <eric.schrock@delphix.com>
References:
illumos/illumos-gate@fb09f5aad4https://illumos.org/issues/3006
Requires:
zfsonlinux/spl@1c6d149feb
Ported-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1509
2619 asynchronous destruction of ZFS file systems
2747 SPA versioning with zfs feature flags
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <gwilson@delphix.com>
Reviewed by: Richard Lowe <richlowe@richlowe.net>
Reviewed by: Dan Kruchinin <dan.kruchinin@gmail.com>
Approved by: Eric Schrock <Eric.Schrock@delphix.com>
References:
illumos/illumos-gate@53089ab7c8illumos/illumos-gate@ad135b5d64
illumos changeset: 13700:2889e2596bd6
https://www.illumos.org/issues/2619https://www.illumos.org/issues/2747
NOTE: The grub specific changes were not ported. This change
must be made to the Linux grub packages.
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
This commit introduces a "copy-builtin" script designed to prepare a
kernel source tree for building ZFS as a builtin module. The script
makes a full copy of all needed files, thus making the kernel source
tree fully independent of the zfs source package.
To achieve that, some compilation flags (-include, -I) have been moved
to module/Makefile. This Makefile is only used when compiling external
modules; when compiling builtin modules, a Kbuild file generated by the
configure-builtin script is used instead. This makes sure Makefiles
inside the kernel source tree does not contain references to the zfs
source package.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #851
The current ZFS implementation stores xattrs on disk using a hidden
directory. In this directory a file name represents the xattr name
and the file contexts are the xattr binary data. This approach is
very flexible and allows for arbitrarily large xattrs. However,
it also suffers from a significant performance penalty. Accessing
a single xattr can requires up to three disk seeks.
1) Lookup the dnode object.
2) Lookup the dnodes's xattr directory object.
3) Lookup the xattr object in the directory.
To avoid this performance penalty Linux filesystems such as ext3
and xfs try to store the xattr as part of the inode on disk. When
the xattr is to large to store in the inode then a single external
block is allocated for them. In practice most xattrs are small
and this approach works well.
The addition of System Attributes (SA) to zfs provides us a clean
way to make this optimization. When the dataset property 'xattr=sa'
is set then xattrs will be preferentially stored as System Attributes.
This allows tiny xattrs (~100 bytes) to be stored with the dnode and
up to 64k of xattrs to be stored in the spill block. If additional
xattr space is required, which is unlikely under Linux, they will be
stored using the traditional directory approach.
This optimization results in roughly a 3x performance improvement
when accessing xattrs which brings zfs roughly to parity with ext4
and xfs (see table below). When multiple xattrs are stored per-file
the performance improvements are even greater because all of the
xattrs stored in the spill block will be cached.
However, by default SA based xattrs are disabled in the Linux port
to maximize compatibility with other implementations. If you do
enable SA based xattrs then they will not be visible on platforms
which do not support this feature.
----------------------------------------------------------------------
Time in seconds to get/set one xattr of N bytes on 100,000 files
------+--------------------------------+------------------------------
| setxattr | getxattr
bytes | ext4 xfs zfs-dir zfs-sa | ext4 xfs zfs-dir zfs-sa
------+--------------------------------+------------------------------
1 | 2.33 31.88 21.50 4.57 | 2.35 2.64 6.29 2.43
32 | 2.79 30.68 21.98 4.60 | 2.44 2.59 6.78 2.48
256 | 3.25 31.99 21.36 5.92 | 2.32 2.71 6.22 3.14
1024 | 3.30 32.61 22.83 8.45 | 2.40 2.79 6.24 3.27
4096 | 3.57 317.46 22.52 10.73 | 2.78 28.62 6.90 3.94
16384 | n/a 2342.39 34.30 19.20 | n/a 45.44 145.90 7.55
65536 | n/a 2941.39 128.15 131.32* | n/a 141.92 256.85 262.12*
Legend:
* ext4 - Stock RHEL6.1 ext4 mounted with '-o user_xattr'.
* xfs - Stock RHEL6.1 xfs mounted with default options.
* zfs-dir - Directory based xattrs only.
* zfs-sa - Prefer SAs but spill in to directories as needed, a
trailing * indicates overflow in to directories occured.
NOTE: Ext4 supports 4096 bytes of xattr name/value pairs per file.
NOTE: XFS and ZFS have no limit on xattr name/value pairs per file.
NOTE: Linux limits individual name/value pairs to 65536 bytes.
NOTE: All setattr/getattr's were done after dropping the cache.
NOTE: All tests were run against a single hard drive.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #443
One of the neat tricks an autoconf style project is capable of
is allow configurion/building in a directory other than the
source directory. The major advantage to this is that you can
build the project various different ways while making changes
in a single source tree.
For example, this project is designed to work on various different
Linux distributions each of which work slightly differently. This
means that changes need to verified on each of those supported
distributions perferably before the change is committed to the
public git repo.
Using nfs and custom build directories makes this much easier.
I now have a single source tree in nfs mounted on several different
systems each running a supported distribution. When I make a
change to the source base I suspect may break things I can
concurrently build from the same source on all the systems each
in their own subdirectory.
wget -c http://github.com/downloads/behlendorf/zfs/zfs-x.y.z.tar.gz
tar -xzf zfs-x.y.z.tar.gz
cd zfs-x-y-z
------------------------- run concurrently ----------------------
<ubuntu system> <fedora system> <debian system> <rhel6 system>
mkdir ubuntu mkdir fedora mkdir debian mkdir rhel6
cd ubuntu cd fedora cd debian cd rhel6
../configure ../configure ../configure ../configure
make make make make
make check make check make check make check
This change also moves many of the include headers from individual
incude/sys directories under the modules directory in to a single
top level include directory. This has the advantage of making
the build rules cleaner and logically it makes a bit more sense.
Add autoconf style build infrastructure to the ZFS tree. This
includes autogen.sh, configure.ac, m4 macros, some scripts/*,
and makefiles for all the core ZFS components.