freebsd-dev

Author	SHA1	Message	Date
Matus Kral	57ae840077	Linux 4.1 compat: use read_iter() / write_iter() Linux 3.15 commit torvalds/linux@293bc98 introduced two new methods. The ->read_iter() and ->write_iter() methods were designed to replace the ->aio_read() and ->aio_write() interfaces. Both interfaces were preserved for several kernel releases in order to migrate all existing consumers to the new interfaces. But as of Linux 4.1 the legacy interface has been retired and the ZFS code must be updated to use the new interfaces. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3352	2015-06-18 12:06:59 -07:00
Tim Chase	90947b2357	3.12 compat, NUMA-aware per-superblock shrinker Kernels >= 3.12 have a NUMA-aware superblock shrinker which is used in ZoL by zfs_sb_prune(). This patch calls the shrinker for each on-line NUMA node in order that memory be freed for each one. Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3495	2015-06-17 10:43:13 -07:00
Tim Chase	e48533383b	Linux 2.6.36 compat, use REQ_FAILFAST_MASK and remove pre-2.6.36 support Commit `f4af6bb783` which added support for REQ_FAILFAST_MASK but the new autoconf test didn't use the same preprocessor macro name as the code did. The effect is that FAILFAST mode has not been enabled for ZoL in any post-2.6.35 kernel. Retire the HAVE_BIO_RW_FAILFAST interface used in pre-2.6.28 kernels. Raise an error condition if the FAILFAST interface can't be detected. Signed-off-by: Tim Chase <tim@onlight.com Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3386	2015-05-11 15:07:00 -07:00
Brian Behlendorf	8c45def24a	Linux 4.0 compat: bdi_setup_and_register() The 'capabilities' argument which was passed to bdi_setup_and_register() has been removed. File systems should no longer pass BDI_CAP_MAP_COPY. For our purposes this means there are now three different interfaces which must be handled. A zpl_bdi_setup_and_register() wrapper function has been introduced to provide a single interface to the ZPL code. * 2.6.32 - 2.6.33, bdi_setup_and_register() is not exported. * 2.6.34 - 3.19, bdi_setup_and_register() takes 3 arguments. * 4.0 - x.y, bdi_setup_and_register() takes 2 arguments. I've also taken this opportunity to remove HAVE_BDI because kernels older then 2.6.32 are no longer supported. All kernels newer than this will have one of the above interfaces. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Closes #3128	2015-03-03 10:49:45 -08:00
Jörg Thalheim	534759fad3	Linux 3.19 compat: file_inode was added struct access f->f_dentry->d_inode was replaced by accessor function file_inode(f) Signed-off-by: Joerg Thalheim <joerg@higgsboson.tk> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3084	2015-02-10 11:24:51 -08:00
Ned Bass	4e30e68caf	Don't use AC_LANG_SOURCE for conftest.h source Using AC_LANG_SOURCE with some versions of autoconf is problematic if the given source is to be written to a header file. Such versions assume the contents are to be written to conftest.c and generate shell code to that effect. The contents of the test program to detect support for Linux tracepoints were consequently malformed (containing the source for conftest.h) so the build system incorrectly disabled tracepoints support. Fix this in ZFS_LINUX_TRY_COMPILE_HEADER by passing the header source directly to ZFS_LINUX_COMPILE_IFELSE. Signed-off-by: Ned Bass <bass6@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #2953	2015-01-06 16:53:30 -08:00
Prakash Surya	0b39b9f96f	Swap DTRACE_PROBE* with Linux tracepoints This patch leverages Linux tracepoints from within the ZFS on Linux code base. It also refactors the debug code to bring it back in sync with Illumos. The information exported via tracepoints can be used for a variety of reasons (e.g. debugging, tuning, general exploration/understanding, etc). It is advantageous to use Linux tracepoints as the mechanism to export this kind of information (as opposed to something else) for a number of reasons: * A number of external tools can make use of our tracepoints "automatically" (e.g. perf, systemtap) * Tracepoints are designed to be extremely cheap when disabled * It's one of the "accepted" ways to export this kind of information; many other kernel subsystems use tracepoints too. Unfortunately, though, there are a few caveats as well: * Linux tracepoints appear to only be available to GPL licensed modules due to the way certain kernel functions are exported. Thus, to actually make use of the tracepoints introduced by this patch, one might have to patch and re-compile the kernel; exporting the necessary functions to non-GPL modules. * Prior to upstream kernel version v3.14-rc6-30-g66cc69e, Linux tracepoints are not available for unsigned kernel modules (tracepoints will get disabled due to the module's 'F' taint). Thus, one either has to sign the zfs kernel module prior to loading it, or use a kernel versioned v3.14-rc6-30-g66cc69e or newer. Assuming the above two requirements are satisfied, lets look at an example of how this patch can be used and what information it exposes (all commands run as 'root'): # list all zfs tracepoints available $ ls /sys/kernel/debug/tracing/events/zfs enable filter zfs_arc__delete zfs_arc__evict zfs_arc__hit zfs_arc__miss zfs_l2arc__evict zfs_l2arc__hit zfs_l2arc__iodone zfs_l2arc__miss zfs_l2arc__read zfs_l2arc__write zfs_new_state__mfu zfs_new_state__mru # enable all zfs tracepoints, clear the tracepoint ring buffer $ echo 1 > /sys/kernel/debug/tracing/events/zfs/enable $ echo 0 > /sys/kernel/debug/tracing/trace # import zpool called 'tank', inspect tracepoint data (each line was # truncated, they're too long for a commit message otherwise) $ zpool import tank $ cat /sys/kernel/debug/tracing/trace \| head -n35 # tracer: nop # # entries-in-buffer/entries-written: 1219/1219 #P:8 # # _-----=> irqs-off # / _----=> need-resched # \| / _---=> hardirq/softirq # \|\| / _--=> preempt-depth # \|\|\| / delay # TASK-PID CPU# \|\|\|\| TIMESTAMP FUNCTION # \| \| \| \|\|\|\| \| \| lt-zpool-30132 [003] .... 91344.200050: zfs_arc__miss: hdr... z_rd_int/0-30156 [003] .... 91344.200611: zfs_new_state__mru... lt-zpool-30132 [003] .... 91344.201173: zfs_arc__miss: hdr... z_rd_int/1-30157 [003] .... 91344.201756: zfs_new_state__mru... lt-zpool-30132 [003] .... 91344.201795: zfs_arc__miss: hdr... z_rd_int/2-30158 [003] .... 91344.202099: zfs_new_state__mru... lt-zpool-30132 [003] .... 91344.202126: zfs_arc__hit: hdr ... lt-zpool-30132 [003] .... 91344.202130: zfs_arc__hit: hdr ... lt-zpool-30132 [003] .... 91344.202134: zfs_arc__hit: hdr ... lt-zpool-30132 [003] .... 91344.202146: zfs_arc__miss: hdr... z_rd_int/3-30159 [003] .... 91344.202457: zfs_new_state__mru... lt-zpool-30132 [003] .... 91344.202484: zfs_arc__miss: hdr... z_rd_int/4-30160 [003] .... 91344.202866: zfs_new_state__mru... lt-zpool-30132 [003] .... 91344.202891: zfs_arc__hit: hdr ... lt-zpool-30132 [001] .... 91344.203034: zfs_arc__miss: hdr... z_rd_iss/1-30149 [001] .... 91344.203749: zfs_new_state__mru... lt-zpool-30132 [001] .... 91344.203789: zfs_arc__hit: hdr ... lt-zpool-30132 [001] .... 91344.203878: zfs_arc__miss: hdr... z_rd_iss/3-30151 [001] .... 91344.204315: zfs_new_state__mru... lt-zpool-30132 [001] .... 91344.204332: zfs_arc__hit: hdr ... lt-zpool-30132 [001] .... 91344.204337: zfs_arc__hit: hdr ... lt-zpool-30132 [001] .... 91344.204352: zfs_arc__hit: hdr ... lt-zpool-30132 [001] .... 91344.204356: zfs_arc__hit: hdr ... lt-zpool-30132 [001] .... 91344.204360: zfs_arc__hit: hdr ... To highlight the kind of detailed information that is being exported using this infrastructure, I've taken the first tracepoint line from the output above and reformatted it such that it fits in 80 columns: lt-zpool-30132 [003] .... 91344.200050: zfs_arc__miss: hdr { dva 0x1:0x40082 birth 15491 cksum0 0x163edbff3a flags 0x640 datacnt 1 type 1 size 2048 spa 3133524293419867460 state_type 0 access 0 mru_hits 0 mru_ghost_hits 0 mfu_hits 0 mfu_ghost_hits 0 l2_hits 0 refcount 1 } bp { dva0 0x1:0x40082 dva1 0x1:0x3000e5 dva2 0x1:0x5a006e cksum 0x163edbff3a:0x75af30b3dd6:0x1499263ff5f2b:0x288bd118815e00 lsize 2048 } zb { objset 0 object 0 level -1 blkid 0 } For the specific tracepoint shown here, 'zfs_arc__miss', data is exported detailing the arc_buf_hdr_t (hdr), blkptr_t (bp), and zbookmark_t (zb) that caused the ARC miss (down to the exact DVA!). This kind of precise and detailed information can be extremely valuable when trying to answer certain kinds of questions. For anybody unfamiliar but looking to build on this, I found the XFS source code along with the following three web links to be extremely helpful: * http://lwn.net/Articles/379903/ * http://lwn.net/Articles/381064/ * http://lwn.net/Articles/383362/ I should also node the more "boring" aspects of this patch: * The ZFS_LINUX_COMPILE_IFELSE autoconf macro was modified to support a sixth paramter. This parameter is used to populate the contents of the new conftest.h file. If no sixth parameter is provided, conftest.h will be empty. * The ZFS_LINUX_TRY_COMPILE_HEADER autoconf macro was introduced. This macro is nearly identical to the ZFS_LINUX_TRY_COMPILE macro, except it has support for a fifth option that is then passed as the sixth parameter to ZFS_LINUX_COMPILE_IFELSE. These autoconf changes were needed to test the availability of the Linux tracepoint macros. Due to the odd nature of the Linux tracepoint macro API, a separate ".h" must be created (the path and filename is used internally by the kernel's define_trace.h file). * The HAVE_DECLARE_EVENT_CLASS autoconf macro was introduced. This is to determine if we can safely enable the Linux tracepoint functionality. We need to selectively disable the tracepoint code due to the kernel exporting certain functions as GPL only. Without this check, the build process will fail at link time. In addition, the SET_ERROR macro was modified into a tracepoint as well. To do this, the 'sdt.h' file was moved into the 'include/sys' directory and now contains a userspace portion and a kernel space portion. The dprintf and zfs_dbgmsg* interfaces are now implemented as tracepoint as well. Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Ned Bass <bass6@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2014-11-17 11:13:55 -08:00
Richard Yao	d8d7826721	Search /usr/local/src for SPL Object Directory Since we changed the default location for the kernel headers to respect --prefix in the SPL, we must search that location to prevent user builds from breaking. Signed-off-by: Richard Yao <richard.yao@clusterhq.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #2641	2014-10-28 09:37:23 -07:00
Brian Behlendorf	e33045ee98	Make license compatibility checks consistent Apply the license specified in the META file to ensure the compatibility checks are all performed consistently. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #2757	2014-10-17 14:58:38 -07:00
Brian Behlendorf	1139491da7	Revert "Disable GCCs aggressive loop optimization" This reverts commit `0f62f3f9ab`. Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2010	2014-07-22 09:56:55 -07:00
Chunwei Chen	d4541210f3	Linux 3.14 compat: Immutable biovec changes in vdev_disk.c bi_sector, bi_size and bi_idx are moved from bio to bio->bi_iter. This patch creates BIO_BI_*(bio) macros to hide the differences. Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #2124	2014-04-10 14:28:38 -07:00
Brian Behlendorf	0f62f3f9ab	Disable GCCs aggressive loop optimization GCC >+ 4.8's aggressive loop optimization breaks some of the iterators over the dn_blkptr[] pseudo-array in dnode_phys. Since dn_blkptr[] is defined as a single-element array, GCC believes an iterator can only access index 0 and will unroll the loop into a single iteration. One way to resolve the issue would be to cast the array to a pointer and fix all the iterators that might break. The only loop where it is known to cause a problem is this loop in dmu_objset_write_ready(): for (i = 0; i < dnp->dn_nblkptr; i++) bp->blk_fill += dnp->dn_blkptr[i].blk_fill; In the common case where dn_nblkptr is 3, the loop is only executed a single time and "i" is equal to 1 following the loop. The specific breakage caused by this problem is that the blk_fill of root block pointers wouldn't be set properly when more than one blkptr is in use (when no indrect blocks are needed). The simple reproducing sequence is: zpool create tank /tank.img zdb -ddddd tank 0 Notice that "fill=31", however, there are two L0 indirect blocks with "F=31" and "F=5". The fill count should be 36 rather than 31. This problem causes an assert to be hit in a simple "zdb tank" when built with --enable-debug. However, this approach was not taken because we need to be absolutely sure we catch all instances of this unwanted optimization. Therefore, the build system has been updated to detect if GCC supports the aggressive loop optimization. If it does the optimization will be explicitly disabled using the -fno-aggressive-loop-optimization option. Original-fix-by: Tim Chase <tim@chase2k.com> Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2010 Closes #2051	2014-01-14 13:55:58 -08:00
Massimo Maggi	023699cd62	Posix ACL Support This change adds support for Posix ACLs by storing them as an xattr which is common practice for many Linux file systems. Since the Posix ACL is stored as an xattr it will not overwrite any existing ZFS/NFSv4 ACLs which may have been set. The Posix ACL will also be non-functional on other platforms although it may be visible as an xattr if that platform understands SA based xattrs. By default Posix ACLs are disabled but they may be enabled with the new 'aclmode=noacl\|posixacl' property. Set the property to 'posixacl' to enable them. If ZFS/NFSv4 ACL support is ever added an appropriate acltype will be added. This change passes the POSIX Test Suite cleanly with the exception of xacl/00.t test 45 which is incorrect for Linux (Ext4 fails too). http://www.tuxera.com/community/posix-test-suite/ Signed-off-by: Massimo Maggi <me@massimo-maggi.eu> Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #170	2013-10-29 14:54:26 -07:00
Richard Yao	0f37d0c8be	Linux 3.11 compat: fops->iterate() Commit torvalds/linux@2233f31aad replaced ->readdir() with ->iterate() in struct file_operations. All filesystems must now use the new ->iterate method. To handle this the code was reworked to use the new ->iterate interface. Care was taken to keep the majority of changes confined to the ZPL layer which is already Linux specific. However, minor changes were required to the common zfs_readdir() function. Compatibility with older kernels was accomplished by adding versions of the trivial dir_emit* helper functions. Also the various _readdir() functions were reworked in to wrappers which create a dir_context structure to pass to the new _iterate() functions. Unfortunately, the new dir_emit* functions prevent us from passing a private pointer to the filldir function. The xattr directory code leveraged this ability through zfs_readdir() to generate the list of xattr names. Since we can no longer use zfs_readdir() a simplified zpl_xattr_readdir() function was added to perform the same task. Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1653 Issue #1591	2013-08-15 16:19:07 -07:00
Brian Behlendorf	dba1d70566	Fix arc_adapt() spinning in iterate_supers_type() The iterate_supers_type() function which was introduced in the 3.0 kernel was supposed to provide a safe way to call an arbitrary function on all super blocks of a specific type. Unfortunately, because a list_head was used a bug was introduced which made it possible for iterate_supers_type() to get stuck spinning on a super block which was just deactivated. This can occur because when the list head is removed from the fs_supers list it is reinitialized to point to itself. If the iterate_supers_type() function happened to be processing the removed list_head it will get stuck spinning on that list_head. The bug was fixed in the 3.3 kernel by converting the list_head to an hlist_node. However, to resolve the issue for existing 3.0 - 3.2 kernels we detect when a list_head is used. Then to prevent the spinning from occurring the .next pointer is set to the fs_supers list_head which ensures the iterate_supers_type() function will always terminate. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1045 Closes #861 Closes #790	2013-07-17 09:28:06 -07:00
Chris Dunlop	a1d9543a39	3.10 API change: block_device_operations->release() returns void Linux kernel commit torvalds/linux@db2a144 changed the return type of block_device_operations->release() to void. Detect the expected prototype and defined our callout accordingly. Signed-off-by: Chris Dunlop <chris@onthe.net.au> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1494	2013-07-08 15:41:57 -07:00
Li Dongyang	802e7b5feb	Add SEEK_DATA/SEEK_HOLE to lseek()/llseek() The approach taken was the rework zfs_holey() as little as possible and then just wrap the code as needed to ensure correct locking and error handling. Tested with xfstests 285 and 286. All tests pass except for 7-9 of 285 which try to reserve blocks first via fallocate(2) and fail because fallocate(2) is not yet supported. Note that the filp->f_lock spinlock did not exist prior to Linux 2.6.30, but we avoid the need for autotools check by virtue of the fact that SEEK_DATA/SEEK_HOLE support was not added until Linux 3.1. An autoconf check was added for lseek_execute() which is currently a private function but the expectation is that it will be exported perhaps as early as Linux 3.11. Reviewed-by: Richard Laager <rlaager@wiktel.com> Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1384	2013-07-02 09:24:43 -07:00
Carlos Alberto Lopez Perez	5165473737	Ensure --with-spl-timeout waits for spl_config.h and symvers The previous code was only waiting for the symver file. But the postinst target of the DKMS script for SPL will not only create the symvers file, but also the header spl_config.h. If we are waiting in the configure script of ZFS for the SPL symvers file, then we also need to wait for spl_config.h. Otherwise the configure script will abort because the spl_config.h is not yet available. On top of that, the function ZFS_AC_SPL_MODULE_SYMVERS is moved to the end of the function ZFS_AC_SPL to allow both checks share the with-spl-timeout parameter. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1431	2013-05-02 15:40:44 -07:00
Jan Engelhardt	4e95cc99b0	build: resolve orthographic and other grammatical errors Signed-off-by: Jan Engelhardt <jengelh@inai.de> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-04-02 10:44:52 -07:00
Brian Behlendorf	9b2af9a097	Configure --with-spl{-obj} auto-detect cleanup Because the install location for the spl/zfs-devel headers was changed we need to refresh the auto-detect code. Note that for packaging which already explicitly calls --with-spl{-obj} nothing has changed. The updated code is now structured like that in ZFS_AC_KERNEL and should be cleaner and easier to maintain. In addition, it's stricter about detecting a valid source and object directory. It requires: * The source directory contains the file 'spl.release' * The object directory contains the file 'spl_config.h' * The following paths will be checked. Notice the /var/lib/ and /usr/src paths require that the spl and zfs version be matched. This is done to prevent accidentally mixing releases. dnl # 1) /var/lib/dkms/spl/<version>/build dnl # 2) /usr/src/spl-<version>/<kernel-version> dnl # 3) /usr/src/spl-<version> dnl # 4) ../spl dnl # 5) /usr/src/kernels/<kernel-version> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-03-13 13:42:16 -07:00
Etienne Dechamps	d9b0ebbe82	Remove the bio_empty_barrier() check. To determine whether the kernel is capable of handling empty barrier BIOs, we check for the presence of the bio_empty_barrier() macro, which was introduced in 2.6.24. If this macro is defined, then we can flush disk vdevs; if it isn't, then flushing is disabled. Unfortunately, the bio_empty_barrier() macro was removed in 2.6.37, even though the kernel is still capable of handling empty barrier BIOs. As a result, flushing is effectively disabled on kernels >= 2.6.37, meaning that starting from this kernel version, zfs doesn't use barriers to guarantee on-disk data consistency. This is quite bad and can lead to potential data corruption on power failures. This patch fixes the issue by removing the configure check for bio_empty_barrier(), as we don't support kernels <= 2.6.24 anymore. Thanks to Richard Kojedzinszky for catching this nasty bug. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1318	2013-02-24 10:22:34 -08:00
Etienne Dechamps	d75af3c0eb	Use -Werror for all kernel configure tests. As a matter of fact, we're already using -Werror for most tests because of a bug in kernel-bio-empty-barrier.m4 which sets -Werror without reverting it afterwards. This meant that all tests which ran after this one was using -Werror. This patch simply makes it clear that we're using -Werror and makes the code more readable and more predictable. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1317	2013-02-24 10:20:28 -08:00
Brian Behlendorf	2b7ab9d4d9	Linux 2.6.26 compat, lookup_bdev() It's doubtful many people were impacted by this but commit `6c28567` accidentally broke ZFS builds for 2.6.26 and earlier kernels. This commit depends on the lookup_bdev() function which exists in 2.6.26 but wasn't exported until 2.6.27. The availability of the function isn't critical so a wrapper is introduced which returns ERR_PTR(-ENOTSUP) when the function isn't defined. This will have the effect of causing zvol_is_zvol() to always fail for 2.6.26 kernels. This in turn means vdevs will always get opened concurrently which is good for normal usage. This will only become an issue if your using a zvol as a vdev in another pool. In which case you really should be using a newer kernel anyway. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1205	2013-01-28 15:35:00 -08:00
Brian Behlendorf	ee93035378	Use sb->s_d_op default dentry operations As of Linux 2.6.37 the right way to register custom dentry operations is to use the super block's ->s_d_op field. For older kernels they should be registered as part of the lookup operation. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1223	2013-01-18 15:04:23 -08:00
Ned Bass	f1a05fa114	Fix false ENOENT on snapshot control dentries Lookups in the snapshot control directory for an existing snapshot fail with ENOENT if an earlier lookup failed before the snapshot was created. This is because the earlier lookup causes a negative dentry to be cached which is never invalidated. The bug can be reproduced as follows (the second ls should succeed): $ ls /tank/.zfs/snapshot/s ls: cannot access /tank/.zfs/snapshot/s: No such file or directory $ zfs snap tank@s $ ls /tank/.zfs/snapshot/s ls: cannot access /tank/.zfs/snapshot/s: No such file or directory To remedy this, always invalidate cached dentries in the snapshot control directory. Since these entries never exist on disk there is no significant performance penalty for the extra lookups. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1192	2013-01-16 16:28:54 -08:00
Brian Behlendorf	8780c53961	Update SAs when an inode is dirtied Revert the portion of commit `d3aa3ea` which always resulted in the SAs being update when an mmap()'ed file was closed. That change accidentally resulted in unexpected ctime updates which upset tools like git. That was always a horrible hack and I'm happy it will never make it in to a tagged release. The right fix is something I initially resisted doing because I was worried about the additional overhead. However, in hindsight the overhead isn't as bad as I feared. This patch implemented the sops->dirty_inode() callback which is unsurprisingly called when an inode is dirtied. We leverage this callback to keep the znode SAs strictly in sync with the inode. However, for now we're going to go slowly to avoid introducing any new unexpected issues by only updating the atime, mtime, and ctime. This will cover the callpath of most concern to us. ->filemap_page_mkwrite->file_update_time->update_time-> mark_inode_dirty_sync->__mark_inode_dirty->dirty_inode Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #764 Closes #1140	2012-12-14 12:18:54 -08:00
Brian Behlendorf	56a517ae3a	Verify --with-linux source directory exists Previously this check was only performed when ./configure was attempting to autodetect your kernel source directory. But we should also handle the case where --with-linux was provided and is obviously wrong. This way we catch the error before invoking make and compiling the source with an incorrect autoconf results. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes zfsonlinux/spl#162	2012-11-29 15:08:35 -08:00
Brian Behlendorf	2404b01499	Improve AF hard disk detection Use the bdev_physical_block_size() interface to determine the minimize write size which can be issued without incurring a read-modify-write operation. This is used to set the ashift correctly to prevent a performance penalty when using AF hard disks. Unfortunately, this interface isn't entirely reliable because it's not uncommon for disks to misreport this value. For this reason you may still need to manually set your ashift with: zpool create -o ashift=12 ... The solution to this in the upstream Illumos source was to add a white list of known offending drives. Maintaining such a list will be a burden, but it still may be worth doing if we can detect a large number of these drives. This should be considered as future work. Reported-by: Richard Yao <ryao@cs.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #916	2012-11-15 11:06:14 -08:00
Richard Yao	95f5c63b47	Linux 3.6 compat, iops->mkdir() Use .mkdir instead of .create in 3.3 compatibility check. Linux 3.6 modifies inode_operations->create's function prototype. This causes an autotools Linux 3.3. compatibility check for a function prototype change in create, mkdir and mknode to fail. Since mkdir and mknode are unchanged, we modify the check to examine it instead. Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #873	2012-10-14 15:29:26 -07:00
Yuxuan Shui	558ef6d080	Linux 3.6 compat, iops->create() As of Linux commit ebfc3b49a7ac25920cb5be5445f602e51d2ea559 the struct nameidata is no longer passed to iops->create. Instead only the result of (inamedata->flags & LOOKUP_EXCL) is passed. ZFS like almost all Linux fileystems never made use of this so only the prototype needs to be wrapped for compatibility. Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #873	2012-10-14 14:42:25 -07:00
Yuxuan Shui	8f195a908f	Linux 3.6 compat, iops->lookup() As of Linux commit 00cd8dd3bf95f2cc8435b4cac01d9995635c6d0b the struct nameidata is no longer passed to iops->lookup. Instead only the inamedata->flags are passed. ZFS like almost all Linux fileystems never made use of this so only the prototype needs to be wrapped for compatibility. Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #873	2012-10-14 13:06:54 -07:00
Yuxuan Shui	3c20361075	Linux 3.6 compat, sget() As of Linux commit 9249e17fe094d853d1ef7475dd559a2cc7e23d42 the mount flags are now passed to sget() so they can be used when initializing a new superblock. ZFS never uses sget() in this fashion so we can simply pass a zero and add a zpl_sget() compatibility wrapper. Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #873	2012-10-14 13:06:48 -07:00
Brian Behlendorf	6d1d976b2c	Modify vdev_elevator_switch() to use elevator_change() As of Linux 2.6.36 an elevator_change() interface was added. This commit updates vdev_elevator_switch() to use this interface when available, otherwise it falls back to the usermodehelper method. Original-patch-by: foobarz <sysop@xeon.(none)> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #906	2012-10-03 13:31:44 -07:00
Cyril Plisko	393b44c711	Implement .commit_metadata hook for NFS export In order to implement synchronous NFS metadata semantics ZFS needs to provide the .commit_metadata hook. All it takes there is to make sure changes are committed to ZIL. Fortunately zfs_fsync() does just that, so simply calling it from zpl_commit_metadata() does the trick. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #969	2012-10-03 10:49:45 -07:00
Brian Behlendorf	cda4db408c	Revert "Improve AF hard disk detection" This reverts commit `395350c85d` which accidentally introduced issue #955. Pools using AF drives which were originally created with a sector size of 512 bytes will now be correctly detected to have physical sector size of 4096. This is desirable for a new pool, however for an existing pool abruptly changing the sector size causes problems. For this reason, this change is being reverted until the additional logic can be added to detect the existing pool case. Existing pools must use the ashift size stored in the label regardless of what the disk reports. This is critical for compatibility. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #955	2012-09-11 16:33:49 -07:00
Brian Behlendorf	395350c85d	Improve AF hard disk detection Use the bdev_physical_block_size() interface to determine the minimize write size which can be issued without incurring a read-modify-write operation. This is used to set the ashift correctly to prevent a performance penalty when using AF hard disks. Unfortunately, this interface isn't entirely reliable because it's not uncommon for disks to misreport this value. For this reason you may still need to manually set your ashift with: zpool create -o ashift=12 ... The solution to this in the upstream Illumos source was to add a while list of known offending drives. Maintaining such a list will be a burden, but it still may be worth doing if we can detect a large number of these drives. This should be considered as future work. Reported-by: Richard Yao <ryao@cs.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #916	2012-09-04 15:35:32 -07:00
Prakash Surya	f86373f5b2	Remove autoconf check for CONFIG_PREEMPT The autoconf macro which failed if CONFIG_PREEMPT was set in the kernel config was removed. With the inclusion of a few previous patches targeting support for preempt enabled kernels, it is now safe to run with this kernel config option enabled. Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #83	2012-08-27 11:54:41 -07:00
Richard Yao	074e72953c	Check kernel source directory for SPL ZFS fails to build when SPL is built into the kernel on unless --with-spl=/path/to/kernel/sources is specified. We fallback to the kernel sources directory when SPL is not found elsewhere to resolve that. Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closed #896	2012-08-26 13:49:09 -07:00
Prakash Surya	26e08952e6	Support building a zfs-modules-dkms sub package This commit adds support for building a zfs-modules-dkms sub package built around Dynamic Kernel Module Support. This is to allow building packages using the DKMS infrastructure which is intended to ease the burden of kernel version changes, upgrades, etc. By default zfs-modules-dkms-* sub package will be built as part of the 'make rpm' target. Alternately, you can build only the DKMS module package using the 'make rpm-dkms' target. Examples: # To build packaged binaries as well as a dkms packages $ ./configure && make rpm # To build only the packaged binary utilities and dkms packages $ ./configure && make rpm-utils rpm-dkms Note: Only the RHEL 5/6, CHAOS 5, and Fedora distributions are supported for building the dkms sub package. Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #535	2012-08-08 15:21:01 -07:00
Prakash Surya	5085d55817	Add '--with-spl-timeout' option When checking for the SPL Module.symvers file, a timeout can now be passed in which will pause the configure step while it waits for this file to be generated. By default, the configure behavior is unchanged as a timeout of 0 is used. If a positive number of seconds is passed, configure will wait that number of seconds for the Module.symvers file before moving on. The main motivation for this change was to support parallel execution of './configure && make' for the SPL and ZFS packages in preparation of supporting DKMS based packages. Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-08-08 15:20:55 -07:00
Etienne Dechamps	ee5fd0bb80	Set zvol discard_granularity to the volblocksize. Currently, zvols have a discard granularity set to 0, which suggests to the upper layer that discard requests of arbirarily small size and alignment can be made efficiently. In practice however, ZFS does not handle unaligned discard requests efficiently: indeed, it is unable to free a part of a block. It will write zeros to the specified range instead, which is both useless and inefficient (see dnode_free_range). With this patch, zvol block devices expose volblocksize as their discard granularity, so the upper layer is aware that it's not supposed to send discard requests smaller than volblocksize. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #862	2012-08-07 14:55:31 -07:00
Etienne Dechamps	705741827a	When checking for symbol exports, try compiling. This patch adds a new autoconf function: ZFS_LINUX_TRY_COMPILE_SYMBOL. This new function does the following: - Call LINUX_TRY_COMPILE with the specified parameters. - If unsuccessful, return false. - If successful and we're configuring with --enable-linux-builtin, return true. - Else, call CHECK_SYMBOL_EXPORT with the specified parameters and return the result. All calls to CHECK_SYMBOL_EXPORT are converted to LINUX_TRY_COMPILE_SYMBOL so that the tests work even when configuring for builtin on a kernel which doesn't have loadable module support, or hasn't been built yet. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #851	2012-07-26 13:42:57 -07:00
Etienne Dechamps	fc88a6dda9	Fake modpost stage for LINUX_COMPILE. Currently, when building a test case, we're compiling an entire Linux module from beginning to end. This includes the MODPOST stage, which generates a "conftest.mod.c" file with some boilerplate module declaration code. This poses a problem when configuring for built-in on kernels which have loadable module support disabled. In this case conftest.mod.c is referencing disabled code, resulting in a compilation failure, thus breaking the tests. This patch fixes the issue by faking the modpost stage when the --enable-linux-builtin option is provided. It does so by forcing the modpost command to be /bin/true, and using an empty conftest.mod.c file. The test module still compiles fine, although the result isn't loadable, but we don't really care at this point. Note it is important to preserve the modpost stage when building out of tree. The ZFS_AC_KERNEL_BLK_END_REQUEST, ZFS_AC_KERNEL_BLK_QUEUE_FLUSH, and ZFS_AC_KERNEL_BLK_RQ_BYTES configure checks all depend on it to identify GPL-only symbols. Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #851	2012-07-26 13:41:02 -07:00
Etienne Dechamps	319a99a3d4	Make configure builtin-aware. This patch adds a new option to configure: --enable-linux-builtin. When this option is used, the following happens: - Compilation of kernel modules is disabled. - A failure to find UTS_RELEASE is followed by a suggestion to run "make prepare" on the kernel source tree. This patch also adds a new test which tries to compile an empty module as a basic toolchain sanity test. If it fails and the option was specified, the error is followed by a suggestion to run "make scripts" on the kernel source tree. Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #851	2012-07-26 13:40:18 -07:00
Richard Yao	739a1a82e0	Linux 3.5 compat, end_writeback() changed to clear_inode() The end_writeback() function was changed by moving the call to inode_sync_wait() earlier in to evict(). This effecitvely changes the ordering of the sync but it does not impact the details of the zfs implementation. However, as part of this change end_writeback() was renamed to clear_inode() to reflect the new semantics. This change does impact us and clear_inode() now maps to end_writeback() for kernels prior to 3.5. Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #784	2012-07-23 12:29:36 -07:00
Richard Yao	ea1fdf46e2	Linux 3.5 compat, iops->truncate_range() removed The vmtruncate_range() support has been removed from the kernel in favor of using the fallocate method in the file_operations table. Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #784	2012-07-23 12:29:32 -07:00
Richard Yao	756c3e5a9c	Linux 3.5 compat, eops->encode_fh() takes inodes The export_operations member ->encode_fh() has been updated to take both the child and parent inodes. This interface used to take the child dentry and a bool describing if the parent is needed. NOTE: While updating this code I noticed that we do not currently cleanly handle the case where we're passed a connectable parent. This code should be audited to make sure we're doing the right thing. Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #784	2012-07-23 12:29:23 -07:00
Etienne Dechamps	b5a28807cd	Move partition scanning from userspace to module. Currently, zpool online -e (dynamic vdev expansion) doesn't work on whole disks because we're invoking ioctl(BLKRRPART) from userspace while ZFS still has a partition open on the disk, which results in EBUSY. This patch moves the BLKRRPART invocation from the zpool utility to the module. Specifically, this is done just before opening the device in vdev_disk_open() which is called inside vdev_reopen(). This requires jumping through some hoops to get to the disk device from the partition device, and to make sure we can still open the partition after the BLKRRPART call. Note that this new code path is triggered on dynamic vdev expansion only; other actions, like creating a new pool, are unchanged and still call BLKRRPART from userspace. This change also depends on API changes which are available in 2.6.37 and latter kernels. The build system has been updated to detect this, but there is no compatibility mode for older kernels. This means that online expansion will NOT be available in older kernels. However, it will still be possible to expand the vdev offline. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #808	2012-07-17 09:17:31 -07:00
Richard Yao	6a0936babc	Linux 3.4 compat, d_make_root() replaces d_alloc_root() torvalds/linux@adc0e91ab1 introduced introduced d_make_root() as a replacement for d_alloc_root(). Further commits appear to have removed d_alloc_root() from the Linux source tree. This causes the following failure: error: implicit declaration of function 'd_alloc_root' [-Werror=implicit-function-declaration] To correct this we update the code to use the current d_make_root() interface for readability. Then we introduce an autotools check to determine if d_make_root() is available. If it isn't then we define some compatibility logic which used the older d_alloc_root() interface. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #776	2012-06-11 10:04:49 -07:00
Ned Bass	cac1f230e0	Improve CONFIG_DEBUG_LOCK_ALLOC error message The configure script error message for kernels built with CONFIG_DEBUG_LOCK_ALLOC may give the impression that the issue is strictly with license compliance. To avoid confusion add some words indicating that the linking stage will fail if the build continues. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #773	2012-06-11 09:28:04 -07:00

1 2

93 Commits