freebsd-dev

Author	SHA1	Message	Date
Richard Yao	76c2b24c61	Fix distribution detection Improve the distribution detection by moving the tests for distribution specific files first. The Ubuntu and Debian checks are left for last because they are the least likely to be unique. This is particularly true in the case of Debian since so many distributions are based on Debian. Since this is currently only used to identify the correct packaging method for this system the result in many instances is simply cosmetic. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-03-05 10:38:27 -08:00
Ned Bass	613d88eda8	Align parition end on 1 MiB boundary Some devices have exhibited sensitivity to the ending alignment of partitions. In particular, even if the first partition begins at 1 MiB, we have seen many sd driver task abort errors with certain SSDs if the first partition doesn't end on a 1 MiB boundary. This occurs when the vdev label is read during pool creation or importation and causes a delay of about 30 seconds per device. It can also be simulated with dd when the pool isn't imported: dd if=/dev/sda1 of=/dev/null bs=262144 count=1 For the record, this problem was observed with SMARTMOD SG9XCA2E200GE01 200GB SSDs. Unfortunately I don't have a good explanation for this behavior. It seems to have something to do with highly fragmented single-sector requests being issued to the device, which it may not support. With end-aligned partitions at least page-sized requests were queued and issued to the driver according to blktrace. In any case, aligning the partition end is a fairly innocuous work-around, wasting at most 1 MiB of space. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #574	2012-03-05 09:49:50 -08:00
Brian Behlendorf	ec2626ad3f	Use SA_HDL_PRIVATE for SA xattrs A private SA handle must be used to ensure we can drop the dbuf hold on the spill block prior to calling dmu_tx_commit(). If we call dmu_tx_commit() before sa_handle_destroy(), then our hold will trigger a copy of the dbuf to be made. This is done to prevent data from leaking in to the syncing txg. As a result the original dirty spill block will remain cached. Additionally, relying on the shared zp->z_sa_hdl is unsafe in the xattr context because the znode may be asynchronously dropped from the cache. It's far safer and simpler just to use a private handle for xattrs. Plus any additional overhead is offset by the avoidance of the previously mentioned memory copy. These forever dirty buffers can be noticed in the arcstats under the anon_size. On a quiescent system the value should be zero. Without this fix and a SA xattr write workload you will see anon_size increase. Eventually, if enough dirty data builds up your system it will appear to hang. This occurs because the dmu won't allow new txs to be assigned until that dirty data is flushed, and it won't be because it's not part of an assigned tx. As an aside, I typically see anon_size lurk around 16k so I think there is another place in the code which needs a similar fix. However, this value doesn't grow over time so it isn't critical. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #503 Issue #513	2012-03-02 13:20:48 -08:00
Brian Behlendorf	4b787d75c8	Cleanly support debug packages Allow a source rpm to be rebuilt with debugging enabled. This avoids the need to have to manually modify the spec file. By default debugging is still largely disabled. To enable specific debugging features use the following options with rpmbuild. '--with debug' - Enables ASSERTs # For example: $ rpmbuild --rebuild --with debug zfs-modules-0.6.0-rc6.src.rpm Additionally, ZFS_CONFIG has been added to zfs_config.h for packages which build against these headers. This is critical to ensure both zfs and the dependant package are using the same prototype and structure definitions. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-02-27 14:08:17 -08:00
Brian Behlendorf	570827e129	Add 'dmu_tx' kstats entry Keep counters for the various reasons that a thread may end up in txg_wait_open() waiting on a new txg. This can be useful when attempting to determine why a particular workload is under performing. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-02-27 08:59:10 -08:00
Brian Behlendorf	13be560d89	Add arc_state_t stats to arcstats To ensure the arc is behaving properly we need greater visibility in to exactly how it's managing the systems memory. This patch takes one step in that direction be adding the current arc_state_t for the anon, mru, mru_ghost, mfu, and mfs_ghost lists. The l2 arc_state_t is already well represented in the arcstats. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-02-27 08:58:59 -08:00
Ned Bass	3a4f6caf08	Return success from check_slice() if device doesn't exist When creating a new pool, make_root_vdev() calls check_in_use() to ensure that none of the consituent disks are in use. If the disk contains a valid vdev label it is read to retrieve the list of its child vdevs and these are checked recursively. However, the partitions stored in the vdev label my no longer exist, for example if the partition table has since been altered. In any such case we would want the pool creation to proceed, so this change removes the check from check_slice() that returns an error if the device doesn't exist. As an added assurance, the Solaris implementation also returns sucess on ENOENT. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-02-27 08:52:38 -08:00
Alex Zhuravlev	a473d90cee	Export symbols for zero-copy Export additional symbols to make use of the DMU's zero-copy API. This allows external modules to move data in to and out of the ARC without incurring the cost of a memory copy. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-02-17 12:43:02 -08:00
Richard Yao	b41c9906dc	Support ashift=13 for 8KB SSD block sizes New SSDs are now available which use an internal 8k block size. To make sure ZFS can get the maximum performance out of these devices we're increasing the maximum ashift to 13 (8KB). This value is still small enough that we can fit 16 uberblocks in the vdev ring label. However, I don't want to increase this any futher or it will limit the ability the safely roll back a pool to recover it. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #565	2012-02-13 12:25:27 -08:00
Turbo Fredriksson	d2e032ca9c	Add 'fsid' mount option to allowed options. Resolves nfs-utils-1.0.x compatibility issue which requires that the fsid be set in the export options. exportfs: Warning: /tank/dir requires fsid= for NFS export Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #570	2012-02-13 09:43:57 -08:00
Brian Behlendorf	b10c77f70a	Export symbols for zero-copy Exported the required symbols to make use of the DMU's zero-copy API. This allows external modules to move data in to and out of the ARC without incurring the cost of a memory copy. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-02-10 11:56:55 -08:00
Brian Behlendorf	a31acb462d	Use spl_debug_* helpers When configuring the spl debug log support use the provided wrapper functions. This ensures that if --disable-debug-log was used when buiding the spl the functions will have no effect. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-02-09 16:37:48 -08:00
Etienne Dechamps	30930fba21	Add support for DISCARD to ZVOLs. DISCARD (REQ_DISCARD, BLKDISCARD) is useful for thin provisioning. It allows ZVOL clients to discard (unmap, trim) block ranges from a ZVOL, thus optimizing disk space usage by allowing a ZVOL to shrink instead of just grow. We can't use zfs_space() or zfs_freesp() here, since these functions only work on regular files, not volumes. Fortunately we can use the low-level function dmu_free_long_range() which does exactly what we want. Currently the discard operation is not added to the log. That's not a big deal since losing discard requests cannot result in data corruption. It would however result in disk space usage higher than it should be. Thus adding log support to zvol_discard() is probably a good idea for a future improvement. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-02-09 16:19:38 -08:00
Etienne Dechamps	cb2d19010d	Support the fallocate() file operation. Currently only the (FALLOC_FL_PUNCH_HOLE) flag combination is supported, since it's the only one that matches the behavior of zfs_space(). This makes it pretty much useless in its current form, but it's a start. To support other flag combinations we would need to modify zfs_space() to make it more flexible, or emulate the desired functionality in zpl_fallocate(). Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #334	2012-02-09 16:19:32 -08:00
Etienne Dechamps	aec69371a6	Check permissions in zfs_space(). This isn't done on Solaris because on this OS zfs_space() can only be called with an opened file handle. Since the addition of zpl_truncate_range() this isn't the case anymore, so we need to enforce access rights. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #334	2012-02-09 15:20:37 -08:00
Etienne Dechamps	5cb63a57f8	Implement the truncate_range() inode operation. This operation allows "hole punching" in ZFS files. On Solaris this is done via the vop_space() system call, which maps to the zfs_space() function. So we just need to write zpl_truncate_range() as a wrapper around zfs_space(). Note that this only works for regular files, not ZVOLs. This is currently an insecure implementation without permission checking, although this isn't that big of a deal since truncate_range() isn't even callable from userspace. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #334	2012-02-09 15:20:32 -08:00
Brian Behlendorf	93648f314c	Fix zconfig.sh non-optimal alignment The recent zvol improvements have changed default suggested alignment for zvols from 512b (default) to 8k (zvol blocksize). Because of this the zconfig.sh tests which create paritions are now generating a warning about non-optimal alignments. This change updates the need zconfig.sh tests such that a partition will be properly aligned. In the process, it shifts from using the sfdisk utility to the parted utility to create partitions. It also moves the creation of labels, partitions, and filesystems in to generic functions in common.sh.in. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-02-09 13:23:28 -08:00
Etienne Dechamps	dde9380a1b	Use 32 as the default number of zvol threads. Currently, the `zvol_threads` variable, which controls the number of worker threads which process items from the ZVOL queues, is set to the number of available CPUs. This choice seems to be based on the assumption that ZVOL threads are CPU-bound. This is not necessarily true, especially for synchronous writes. Consider the situation described in the comments for `zil_commit()`, which is called inside `zvol_write()` for synchronous writes: > itxs are committed in batches. In a heavily stressed zil there will be a > commit writer thread who is writing out a bunch of itxs to the log for a > set of committing threads (cthreads) in the same batch as the writer. > Those cthreads are all waiting on the same cv for that batch. > > There will also be a different and growing batch of threads that are > waiting to commit (qthreads). When the committing batch completes a > transition occurs such that the cthreads exit and the qthreads become > cthreads. One of the new cthreads becomes he writer thread for the batch. > Any new threads arriving become new qthreads. We can easily deduce that, in the case of ZVOLs, there can be a maximum of `zvol_threads` cthreads and qthreads. The default value for `zvol_threads` is typically between 1 and 8, which is way too low in this case. This means there will be a lot of small commits to the ZIL, which is very inefficient compared to a few big commits, especially since we have to wait for the data to be on stable storage. Increasing the number of threads will increase the amount of data waiting to be commited and thus the size of the individual commits. On my system, in the context of VM disk image storage (lots of small synchronous writes), increasing `zvol_threads` from 8 to 32 results in a 50% increase in sequential synchronous write performance. We should choose a more sensible default for `zvol_threads`. Unfortunately the optimal value is difficult to determine automatically, since it depends on the synchronous write latency of the underlying storage devices. In any case, a hardcoded value of 32 would probably be better than the current situation. Having a lot of ZVOL threads doesn't seem to have any real downside anyway. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Fixes #392	2012-02-08 13:58:10 -08:00
Etienne Dechamps	34037afe24	Improve ZVOL queue behavior. The Linux block device queue subsystem exposes a number of configurable settings described in Linux block/blk-settings.c. The defaults for these settings are tuned for hard drives, and are not optimized for ZVOLs. Proper configuration of these options would allow upper layers (I/O scheduler) to take better decisions about write merging and ordering. Detailed rationale: - max_hw_sectors is set to unlimited (UINT_MAX). zvol_write() is able to handle writes of any size, so there's no reason to impose a limit. Let the upper layer decide. - max_segments and max_segment_size are set to unlimited. zvol_write() will copy the requests' contents into a dbuf anyway, so the number and size of the segments are irrelevant. Let the upper layer decide. - physical_block_size and io_opt are set to the ZVOL's block size. This has the potential to somewhat alleviate issue #361 for ZVOLs, by warning the upper layers that writes smaller than the volume's block size will be slow. - The NONROT flag is set to indicate this isn't a rotational device. Although the backing zpool might be composed of rotational devices, the resulting ZVOL often doesn't exhibit the same behavior due to the COW mechanisms used by ZFS. Setting this flag will prevent upper layers from making useless decisions (such as reordering writes) based on incorrect assumptions about the behavior of the ZVOL. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-02-07 16:23:06 -08:00
Etienne Dechamps	b18019d2d8	Fix synchronicity for ZVOLs. zvol_write() assumes that the write request must be written to stable storage if rq_is_sync() is true. Unfortunately, this assumption is incorrect. Indeed, "sync" does not mean what we think it means in the context of the Linux block layer. This is well explained in linux/fs.h: WRITE: A normal async write. Device will be plugged. WRITE_SYNC: Synchronous write. Identical to WRITE, but passes down the hint that someone will be waiting on this IO shortly. WRITE_FLUSH: Like WRITE_SYNC but with preceding cache flush. WRITE_FUA: Like WRITE_SYNC but data is guaranteed to be on non-volatile media on completion. In other words, SYNC does not mean that the write must be on stable storage on completion. It just means that someone is waiting on us to complete the write request. Thus triggering a ZIL commit for each SYNC write request on a ZVOL is unnecessary and harmful for performance. To make matters worse, ZVOL users have no way to express that they actually want data to be written to stable storage, which means the ZIL is broken for ZVOLs. The request for stable storage is expressed by the FUA flag, so we must commit the ZIL after the write if the FUA flag is set. In addition, we must commit the ZIL before the write if the FLUSH flag is set. Also, we must inform the block layer that we actually support FLUSH and FUA. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-02-07 16:23:06 -08:00
Etienne Dechamps	56c34bac44	Support "sync=always" for ZVOLs. Currently the "sync=always" property works for regular ZFS datasets, but not for ZVOLs. This patch remedies that. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Fixes #374.	2012-02-07 16:23:06 -08:00
Darik Horn	e67329d8e0	Let libnvpair be linked independently of libzfs. Autoconf will fail to detect the ZoL libnvpair on systems that do not implicitly link library runtime dependencies, which is anything that has the GCC 4.5 DCO update. Build libuutil before libnvpair, and put it on the the LDADD line of the libnvpair automake template. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes: #560	2012-02-07 11:37:15 -08:00
Brian Behlendorf	47621f3d76	Linux 3.3 compat, sops->show_options() The second argument of sops->show_options() was changed from a 'struct vfsmount ' to a 'struct dentry '. Add an autoconf check to detect the API change and then conditionally define the expected interface. In either case we are only interested in the zfs_sb_t. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #549	2012-02-03 10:02:01 -08:00
Brian Behlendorf	d7e398ce1a	Cleanup ZFS debug infrastructure Historically the internal zfs debug infrastructure has been scattered throughout the code. Since we expect to start making more use of this code this patch performs some cleanup. * Consolidate the zfs debug infrastructure in the zfs_debug.[ch] files. This includes moving the zfs_flags and zfs_recover variables, plus moving the zfs_panic_recover() function. * Remove the existing unused functionality in zfs_debug.c and replace it with code which correctly utilized the spl logging infrastructure. * Remove the __dprintf() function from zfs_ioctl.c. This is dead code, the dprintf() functionality in the kernel relies on the spl log support. * Remove dprintf() from hdr_recl(). This wasn't particularly useful and was missing the required format specifier anyway. * Subsequent patches should unify the dprintf() and zfs_dbgmsg() functions. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-02-02 11:24:30 -08:00
Brian Behlendorf	0c5dde492f	Allow multiple values per directory entry When using zfs to back a Lustre filesystem it's advantageous to to store a fid with the object id in the directory zap. The only technical impediment to doing this is that the zpl code expects a single value in the zap per directory entry. This change relaxes that requirement such that multiple entries are allowed provided the first one is the object id. The zpl code will just ignore additional entries. This allows the ZoL count to mount datasets which are being used as Lustre server backends. Once the upstream feature flags support is merged in this change should be updated to a read-only feature. Until this occurs other zfs implementations will not be able to read the zfs filesystems created by Lustre. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-02-02 11:22:08 -08:00
Brian Behlendorf	e29be02e46	Export symbol zfs_attr_table Export the zfs_attr_table symbol so it may be used by non-zpl consumers which are still interested in writing a zpl compatible dataset (e.g. Lustre). Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-01-27 09:23:36 -08:00
Prakash Surya	ff998d804f	Ignore dataset if the dds_type is DMU_OST_OTHER Since the zpios and potentially other ZFS tests use the DMU_OST_OTHER type to label their datasets, the zpool and zfs commands should gracefully handle this type when it is encountered. This patch modifies the commands' behavior to ignore any datasets with a dds_type of DMU_OST_OTHER. Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #536	2012-01-19 09:29:48 -08:00
Brian Behlendorf	b4b599d250	Fix rpm dependencies This change updates the rpm spec files to have strictly correct package dependencies. That means a few things: * The zfs-modules package is now tied to a specific build of the spl-modules packages based on the kernel version. This ensures that the correct spl-modules packages will always get installed and not just the newest. * The zfs package now requires both the zfs-modules and spl packages. Thus a 'yum install zfs' will pull in the minimal set of packages required for a functional system. * The zfs-devel packages now require the zfs package to be installed which is normal behavior for -devel packages. * Remove the redundant distribution release extension. This is already added once because it is part of the kernel package release name. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-01-18 12:19:52 -08:00
Brian Behlendorf	b40a77aefc	Add the release component to headers When the original build system code was added the release component was accidentally omited from the development header install path. This patch adds the missing path component so it's always clear exactly what release your compiling against. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-01-18 12:19:47 -08:00
Darik Horn	f783130a1f	Allow GPT+EFI vdev replacement in boot pools. Commit zfsonlinux/zfs@57a4eddc4d allows the bootfs property to be set on any pool, but does not accommodate subsequent vdev changes. For example: # zpool replace rpool /dev/sda /dev/sdb operation not supported on this type of pool property 'bootfs' is not supported on EFI labeled devices For non-Solaris builds, disable the check that emits this error. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-01-18 11:05:24 -08:00
Darik Horn	750562833f	Combine libraries: spl, avl, efi, share, unicode. These libraries, which are an artifact of the ZoL development process, conflict with packages that are already in distribution: * libspl: SPL Programming Language * libavl: AVL for Linux * libefi: GRUB And these libraries are potential conflicts: * libshare: the Linux Mount Manager * libunicode: Perl and Python Recompose these five ZoL components into the four libraries that are conventionally provided by Solaris and FreeBSD systems: + libnvpair + libuutil + libzpool + libzfs This change resolves the name conflict, makes ZoL more compatible with existing software that uses autotools to detect ZFS, and allows pkg-zfs to better reflect the official Debian kFreeBSD packaging. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes: #430	2012-01-17 15:19:50 -08:00
Richard Laager	57a4eddc4d	Allow setting bootfs on any pool The vdev_is_bootable() restrictions are no longer necessary with recent GRUB2 code. FreeBSD has implemented the same change, except that I moved the Solaris comment to be inside the #ifdef __sun__ block. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #317	2012-01-17 13:49:07 -08:00
Ned Bass	08d08ebba2	Reduce number of zio free threads As described in Issue #458 and #258, unlinking large amounts of data can cause the threads in the zio free wait queue to start spinning. Reducing the number of z_fr_iss threads from a fixed value of 100 to 1 per cpu signficantly reduces contention on the taskq spinlock and improves throughput. Instrumenting the taskq code showed that __taskq_dispatch() can spend a long time holding tq->tq_lock if there are a large number of threads in the queue. It turns out the time spent in wake_up() scales linearly with the number of threads in the queue. When a large number of short work items are dispatched, as seems to be the case with unlink, the worker threads drain the queue faster than the dispatcher can fill it. They then all pile into the work wait queue to wait for new work items. So if 100 threads are in the queue, wake_up() takes about 100 times as long, and the woken threads have to spin until the dispatcher releases the lock. Reducing the number of threads helps with the symptoms, but doesn't get to the root of the problem. It would seem that wake_up() shouldn't scale linearly in time with queue depth, particularly if we are only trying to wake up one thread. In that vein, I tried making all of the waiting processes exclusive to prevent the scheduler from iterating over the entire list, but I still saw the linear time scaling. So further investigation is needed, but in the meantime reducing the thread count is an easy workaround. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #258 Issue #458	2012-01-17 08:54:00 -08:00
Brian Behlendorf	a8783adf24	Increase link count limit to 2^31-1 Originally, the per-file link limit was set to 65536 because the exact Linux VFS limit was unclear. Internally ZFS is able to support 64-bit link counts. After a more careful investigation the limit can be safely raised to 2^31-1. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #514	2012-01-13 11:43:59 -08:00
Prakash Surya	58d956b085	Run ZFS_AC_PACMAN only if $VENDOR is "arch" Unfortunately, Arch's package manager `pacman` shares it's name with a popular arcade video game. Thus, in order to refrain from executing the video game when we mean to execute the package manager, ZFS_AC_PACMAN is now only run when $VENDOR is determined to be "arch". Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #517	2012-01-13 09:03:11 -08:00
Suman Chakravartula	e18be9a637	Add overlay(-O) mount option support Linux supports mounting over non-empty directories by default. In Solaris this is not the case and -O option is required for zfs mount to mount a zfs filesystem over a non-empty directory. For compatibility, I've added support for -O option to mount zfs filesystems over non-empty directories if the user wants to, just like in Solaris. I've defined MS_OVERLAY to record it in the flags variable if the -O option is supplied. The flags variable passes through a few functions and its checked before performing the empty directory check in zfs_mount function. If -O is given, the check is not performed. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #473	2012-01-12 15:49:38 -08:00
Darik Horn	96b91ef0d6	Apply the ZoL coding standard to zpl_xattr.c Make the indenting in the zpl_xattr.c file consistent with the Sun coding standard by removing soft tabs. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-01-12 15:12:03 -08:00
Brian Behlendorf	166dd49de0	Linux 3.2 compat, security_inode_init_security() The security_inode_init_security() API has been changed to include a filesystem specific callback to write security extended attributes. This was done to support the initialization of multiple LSM xattrs and the EVM xattr. This change updates the code to use the new API when it's available. Otherwise it falls back to the previous implementation. In addition, the ZFS_AC_KERNEL_6ARGS_SECURITY_INODE_INIT_SECURITY autoconf test has been made more rigerous by passing the expected types. This is done to ensure we always properly the detect the correct form for the security_inode_init_security() API. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #516	2012-01-12 15:06:39 -08:00
Richard Laager	2932b6a800	Treat /dev/vd* as whole disks Correctly detect /dev/vd devices as whole disks and attempt to create an EFI partition table. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-01-11 16:44:54 -08:00
Darik Horn	b97f368d04	Avoid using awk in the zpool_id script. Some implementations of `awk` incorrectly parse the \< and \> regex symbols, so use a `while read` loop and regular globbing instead. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes: #259	2012-01-11 11:56:56 -08:00
Brian Behlendorf	ab26409db7	Linux 3.1 compat, super_block->s_shrink The Linux 3.1 kernel has introduced the concept of per-filesystem shrinkers which are directly assoicated with a super block. Prior to this change there was one shared global shrinker. The zfs code relied on being able to call the global shrinker when the arc_meta_limit was exceeded. This would cause the VFS to drop references on a fraction of the dentries in the dcache. The ARC could then safely reclaim the memory used by these entries and honor the arc_meta_limit. Unfortunately, when per-filesystem shrinkers were added the old interfaces were made unavailable. This change adds support to use the new per-filesystem shrinker interface so we can continue to honor the arc_meta_limit. The major benefit of the new interface is that we can now target only the zfs filesystem for dentry and inode pruning. Thus we can minimize any impact on the caching of other filesystems. In the context of making this change several other important issues related to managing the ARC were addressed, they include: * The dnlc_reduce_cache() function which was called by the ARC to drop dentries for the Posix layer was replaced with a generic zfs_prune_t callback. The ZPL layer now registers a callback to drop these dentries removing a layering violation which dates back to the Solaris code. This callback can also be used by other ARC consumers such as Lustre. arc_add_prune_callback() arc_remove_prune_callback() * The arc_reduce_dnlc_percent module option has been changed to arc_meta_prune for clarity. The dnlc functions are specific to Solaris's VFS and have already been largely eliminated already. The replacement tunable now represents the number of bytes the prune callback will request when invoked. * Less aggressively invoke the prune callback. We used to call this whenever we exceeded the arc_meta_limit however that's not strictly correct since it results in over zeleous reclaim of dentries and inodes. It is now only called once the arc_meta_limit is exceeded and every effort has been made to evict other data from the ARC cache. * More promptly manage exceeding the arc_meta_limit. When reading meta data in to the cache if a buffer was unable to be recycled notify the arc_reclaim thread to invoke the required prune. * Added arcstat_prune kstat which is incremented when the ARC is forced to request that a consumer prune its cache. Remember this will only occur when the ARC has no other choice. If it can evict buffers safely without invoking the prune callback it will. * This change is also expected to resolve the unexpect collapses of the ARC cache. This would occur because when exceeded just the arc_meta_limit reclaim presure would be excerted on the arc_c value via arc_shrink(). This effectively shrunk the entire cache when really we just needed to reclaim meta data. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #466 Closes #292	2012-01-11 11:46:02 -08:00
Prakash Surya	8eaa020b46	Move Arch Linux's VENDOR check above Ubuntu's If the lsb-release package is installed on an Arch Linux distribution, the configure step will incorrectly detect the running distribution as Ubuntu. This is a result of both distributions providing an /etc/lsb-release file, and the Ubuntu VENDOR check being performed first. Since the Arch Linux test check's for a file more specific to the Arch Linux distribution, moving Arch Linux's VENDOR check above Unbuntu's check provides a quick and easy solution. Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2011-12-19 12:05:10 -08:00
Darik Horn	afd7da0ce7	Add LIBSELINUX to mount_zfs_LDFLAGS. Regenerating the autotools configuration on Debian and Ubuntu systems causes compilation to fail with this error message: cmd/mount_zfs/../../cmd/mount_zfs/mount_zfs.c:403: undefined reference to `is_selinux_enabled' In the automake template, set "mount_zfs_LDFLAGS = ... $(LIBSELINUX)" so that the /sbin/mount.zfs utility is linked to libselinux. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2011-12-16 20:04:42 -08:00
Darik Horn	28eb9213d8	Linux 3.2 compat: set_nlink() Directly changing inode->i_nlink is deprecated in Linux 3.2 by commit SHA: bfe8684869601dacfcb2cd69ef8cfd9045f62170 Use the new set_nlink() kernel function instead. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes: #462	2011-12-16 20:02:52 -08:00
Darik Horn	e6101ea87f	Update the character class in the zpool man page. ZoL and all Solaris derivatives allow pool names to contain the colon and space characters. Update the man page to reflect current behavior. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes: #438	2011-12-16 14:00:38 -08:00
Prakash Surya	6ba3b44614	Add make rule for building Arch Linux packages Added the necessary build infrastructure for building packages compatible with the Arch Linux distribution. As such, one can now run: $ ./configure $ make pkg # Alternatively, one can run 'make arch' as well on the Arch Linux machine to create two binary packages compatible with the pacman package manager, one for the zfs userland utilities and another for the zfs kernel modules. The new packages can then be installed by running: # pacman -U $package.pkg.tar.xz In addition, source-only packages suitable for an Arch Linux chroot environment or remote builder can also be build using the 'sarch' make rule. NOTE: Since the source dist tarball is created on the fly from the head of the build tree, it's MD5 hash signature will be continually influx. As a result, the md5sum variable was intentionally omitted from the PKGBUILD files, and the '--skipinteg' makepkg option is used. This may or may not have any serious security implications, as the source tarball is not being downloaded from an outside source. Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #491	2011-12-14 19:14:23 -08:00
Garrett D'Amore	a38718a63d	Illumos #734 : Use taskq_dispatch_ent() interface It has been observed that some of the hottest locks are those of the zio taskqs. Contention on these locks can limit the rate at which zios are dispatched which limits performance. This upstream change from Illumos uses new interface to the taskqs which allow them to utilize a prealloc'ed taskq_ent_t. This removes the need to perform an allocation at dispatch time while holding the contended lock. This has the effect of improving system performance. Reviewed by: Albert Lee <trisk@nexenta.com> Reviewed by: Richard Lowe <richlowe@richlowe.net> Reviewed by: Alexey Zaytsev <alexey.zaytsev@nexenta.com> Reviewed by: Jason Brian King <jason.brian.king@gmail.com> Reviewed by: George Wilson <gwilson@zfsmail.com> Reviewed by: Adam Leventhal <ahl@delphix.com> Approved by: Gordon Ross <gwr@nexenta.com> References to Illumos issue: https://www.illumos.org/issues/734 Ported-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #482	2011-12-14 09:19:30 -08:00
Brian Behlendorf	30a9524e45	Set zvol_major/zvol_threads permissions The zvol_major and zvol_threads module options were being created with 0 permission bits. This prevented them from being listed in the /sys/module/zfs/parameters/ directory, although they were visible in `modinfo zfs`. This patch fixes the issue by updating the permission bits to 0444. For the moment these options must be read-only because they are used during module initialization. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #392	2011-12-07 09:27:50 -08:00
Brian Behlendorf	23bdb07d4e	Update default ARC memory limits In the upstream OpenSolaris ZFS code the maximum ARC usage is limited to 3/4 of memory or all but 1GB, whichever is larger. Because of how Linux's VM subsystem is organized these defaults have proven to be too large which can lead to stability issues. To avoid making everyone manually tune the ARC the defaults are being changed to 1/2 of memory or all but 4GB. The rational for this is as follows: * Desktop Systems (less than 8GB of memory) Limiting the ARC to 1/2 of memory is desirable for desktop systems which have highly dynamic memory requirements. For example, launching your web browser can suddenly result in a demand for several gigabytes of memory. This memory must be reclaimed from the ARC cache which can take some time. The user will experience this reclaim time as a sluggish system with poor interactive performance. Thus in this case it is preferable to leave the memory as free and available for immediate use. * Server Systems (more than 8GB of memory) Using all but 4GB of memory for the ARC is preferable for server systems. These systems often run with minimal user interaction and have long running daemons with relatively stable memory demands. These systems will benefit most by having as much data cached in memory as possible. These values should work well for most configurations. However, if you have a desktop system with more than 8GB of memory you may wish to further restrict the ARC. This can still be accomplished by setting the 'zfs_arc_max' module option. Additionally, keep in mind these aren't currently hard limits. The ARC is based on a slab implementation which can suffer from memory fragmentation. Because this fragmentation is not visible from the ARC it may believe it is within the specified limits while actually consuming slightly more memory. How much more memory get's consumed will be determined by how badly fragmented the slabs are. In the long term this can be mitigated by slab defragmentation code which was OpenSolaris solution. Or preferably, using the page cache to back the ARC under Linux would be even better. See issue #75 for the benefits of more tightly integrating with the page cache. This change also fixes a issue where the default ARC max was being set incorrectly for machines with less than 2GB of memory. The constant in the arc_c_max comparison must be explicitly cast to a uint64_t type to prevent overflow and the wrong conditional branch being taken. This failure was typically observed in VMs which are commonly created with less than 2GB of memory. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #75	2011-12-05 12:02:12 -08:00
Darik Horn	660cbada0f	Quote variables in the zfs.lsb script. For consistency and safety, quote all variables in the zfs.lsb script. This protects in the unlikely case that any of the file names contain whitespace. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #439	2011-12-05 09:51:55 -08:00

... 3 4 5 6 7 ...

793 Commits