freebsd-dev

Author	SHA1	Message	Date
Xin LI	4acaabea05	MFV r251619: ZFS needs better comments. Illumos ZFS issues: 3741 zfs needs better comments MFC after: 2 weeks	2013-06-11 19:02:36 +00:00
Xin LI	9e43a32a5c	MFV r251519: * Illumos ZFS issue #3805 arc shouldn't cache freed blocks Quote from the Illumos issue: ZFS should proactively evict freed blocks from the cache. Even though these freed blocks will never be used again, and thus will eventually be evicted, this causes us to use memory inefficiently for 2 reasons: 1. A block that is freed has no chance of being accessed again, but will be kept in memory preferentially to a block that was accessed before it (and is thus older) but has not been freed and thus has at least some chance of being accessed again. 2. We partition the ARC into several buckets: user data that has been accessed only once (MRU) metadata that has been accessed only once (MRU) user data that has been accessed more than once (MFU) metadata that has been accessed more than once (MFU) The user data vs metadata split is somewhat arbitrary, and the primary control on how much memory is used to cache data vs metadata is to simply try to keep the proportion the same as it has been in the past (each bucket "evicts against" itself). The secondary control is to evict data before evicting metadata. Because of this bucketing, we may end up with one bucket mostly containing freed blocks that are very old, while another bucket has more recently accessed, still-allocated blocks. Data in the useful bucket (with still-allocated blocks) may be evicted in preference to data in the useless bucket (with old, freed blocks). On dcenter, we saw that the MFU metadata bucket was 230MB, while the MFU data bucket was 27GB and the MRU metadata bucket was 256GB. However, the vast majority of data in the MRU metadata bucket (256GB) was freed blocks, and thus useless. Meanwhile, the MFU metadata bucket (230MB) was constantly evicting useful blocks that will be soon needed. The problem of cache segmentation is a larger problem that needs more investigation. However, if we stop caching freed blocks, it should reduce the impact of this more fundamental issue. MFC after: 2 weeks	2013-06-08 09:11:20 +00:00
Xin LI	ca8a27d4b1	MFV r251474: * Illumos zfs issue #3137 L2ARC compression Whether or not to compress buffers entering the L2ARC is controlled by "compression" setting on the dataset, when compression is not "off", L2ARC compression is enabled. The compress method is always LZ4 for L2ARC when enabled because it works best for the scenario. MFC after: 2 weeks	2013-06-06 23:21:41 +00:00
Mark Johnston	f263e440d4	SDT probes can directly pass up to five arguments as arguments to dtrace_probe(). Arguments beyond these five must be obtained in an architecture-specific way; this can be done through the getargval provider method, and through dtrace_getarg() if getargval isn't overridden. This change fixes two off-by-one bugs in the way these arguments are fetched in FreeBSD's DTrace implementation. First, the SDT provider must set the aframes parameter to 1 when creating a probe. The aframes parameter controls the number of frames that dtrace_getarg() will step over in order to find the frame containing the extra arguments. On FreeBSD, dtrace_getarg() is called in SDT probe context via dtrace_probe()->dtrace_dif_emulate()->dtrace_dif_variable->dtrace_getarg() so aframes must be 3 since the arguments are in dtrace_probe()'s frame; it was previously being called with a value of 2 instead. illumos uses a different aframes value for SDT probes, but this is because illumos SDT probes fire by triggering the #UD fault handler rather than calling dtrace_probe() directly. The second bug has to do with the way arguments are grabbed out dtrace_probe()'s frame on amd64. The code currently jumps over the first stack argument and retrieves the rest of them using a pointer into the stack. This works on i386 because all of dtrace_probe()'s arguments will be on the stack and the first argument is the probe ID, which should be ignored. However, it is incorrect to ignore the first stack argument on amd64, so we correct the pointer used to access the arguments. MFC after: 2 weeks	2013-06-02 01:05:36 +00:00
Mark Johnston	18161786c6	Port the SDT test now that it's possible to create SDT probes that take seven arguments. The original test uses Solaris' uadmin system call to trigger the test probe; this change adds a sysctl to the dtrace_test module and gets the test program to trigger the test probe via the sysctl handler. The test is currently failing on amd64 because of some bugs in the way that probe arguments beyond the first five are obtained - these bugs will be fixed in a separate change.	2013-06-02 00:33:36 +00:00
Mark Johnston	427bc75e19	The fasttrap provider cleans up probes asynchronously when a process with USDT probes exits. This was previously done with a callout; however, it is possible to sleep while holding the DTrace mutexes, so a panic will occur on INVARIANTS kernels if the callout handler can't immediately acquire one of these mutexes. This panic will be frequently triggered on systems where a USDT-enabled program (perl, for instance) is often run. This revision changes the fasttrap cleanup mechanism so that a dedicated thread is used instead of a callout. The old behaviour is otherwise preserved. Reviewed by: rpaulo MFC after: 1 month	2013-05-24 03:29:32 +00:00
Mark Johnston	09e6105ff4	Bring back part of r249367 by adding DTrace's temporal option, which allows users to guarantee that the output of DTrace scripts will be time-ordered. This option is enabled by adding the line #pragma D option temporal to the beginning of a script, or by adding '-x temporal' to the arguments of dtrace(1). This change fixes a bug in the original port of the temporal option. This bug was causing some assertions to fail, so they had been disabled; in this revision the assertions are working properly and are enabled. The DTrace version number has been bumped from 1.9.0 to 1.9.1 to reflect the language change that's being introduced. This change corresponds to part of illumos-gate commit e5803b76927480: 3021 option for time-ordered output from dtrace(1M) Reviewed by: pfg Obtained from: illumos MFC after: 1 month	2013-05-12 16:26:33 +00:00
Davide Italiano	a28d25df31	In case ZFS doesn't use UMA for buffers there's no need to waste memory creating zones that will remain empty. Reviewed by: pjd	2013-05-01 17:34:44 +00:00
Steven Hartland	562a9d583b	Changed ZFS TRIM sysctl from vfs.zfs.trim_disable -> vfs.zfs.trim.enabled Enabled ZFS TRIM by default Reviewed by: pjd (mentor) Approved by: pjd (mentor) MFC after: 2 weeks	2013-04-26 11:24:20 +00:00
Martin Matuska	95b6497f5e	MFV r249857: Merge vendor bugfix for a possible deadlock related to async destroy and improve write performance by introducing a new lock protecting tx_open_txg. Illumos ZFS issues: 3642 dsl_scan_active() should not issue I/O to determine if async destroying is active 3643 txg_delay should not hold the tc_lock MFC after: 1 week	2013-04-24 21:21:03 +00:00
Martin Matuska	90eafc0bb8	The zfs synctask code restructuring introduced a new bug that makes it impossible to set quota and reservation on pools lower than version 22. Problem has been reported and a solution discussed with vendor. Illumos ZFS issues: 3739 cannot set zfs quota or reservation on pool version < 22 Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reported by: Steve Wills <swills@FreeBSD.org> MFC after: 3 days	2013-04-23 06:28:35 +00:00
Pedro F. Giffuni	03836978be	DTrace: Revert r249367 The following change from illumos brought caused DTrace to pause in an interactive environment: 3026 libdtrace should set LD_NOLAZYLOAD=1 to help the pid provider This was not detected during testing because it doesn't affect scripts. We shouldn't be changing the environment, especially since the LD_NOLAZYLOAD option doesn't apply to our (GNU) ld. Unfortunately the change from upstream was made in such a way that it is very difficult to separate this change from the others so, at least for now, it's better to just revert everything. Reference: https://www.illumos.org/issues/3026 Reported by: Navdeep Parhar and Mark Johnston	2013-04-17 02:20:17 +00:00
Pedro F. Giffuni	ddd5b8e9b4	DTrace: option for time-ordered output Merge changes from illumos: 3021 option for time-ordered output from dtrace(1M) 3022 DTrace: keys should not affect the sort order when sorting by value 3023 it should be possible to dereference dynamic variables 3024 D integer narrowing needs some work 3025 register leak in D code generation 3026 libdtrace should set LD_NOLAZYLOAD=1 to help the pid provider This brings yet another feature implemented in upstream DTrace. A complete description is available here: http://dtrace.org/blogs/ahl/2012/07/28/my-new-dtrace-favorite/ This change bumps the DT_VERS_* number to 1.9.1 in accordance to what is done in illumos. This change was somewhat complicated because upstream is mixed many changes in an individual commit and some of the tests don't really apply to us. There are also appear to be differences in timestamping with Solaris so we had to workaround some assertions making sure no regression happened. Special thanks to Fabian Keil for changes and testing. Illumos Revisions: 13758:23432da34147 Reference: https://www.illumos.org/issues/3021 https://www.illumos.org/issues/3022 https://www.illumos.org/issues/3023 https://www.illumos.org/issues/3024 https://www.illumos.org/issues/3025 https://www.illumos.org/issues/1694 Tested by: Fabian Keil Obtained from: Illumos MFC after: 1 months	2013-04-11 16:24:36 +00:00
Martin Matuska	2cb0c5e424	MFV r249354: Merge bugfixes accepted and integrated by vendor. Underlying problems have been reported by us and fixed in r240942 and r249196. Illumos ZFS issues: 3645 dmu_send_impl: possibilty of pool hold leak 3692 Panic on zfs receive of a recursive deduplicated stream MFC after: 8 days	2013-04-11 07:40:30 +00:00
Martin Matuska	86161c3eeb	Cast to (void *)(uintptr_t) on copyout and copyin of zfs_iocparm_t.zfs_cmd MFC after: 9 days	2013-04-10 07:01:17 +00:00
Martin Matuska	83b4af1142	ZFS expects a copyout of zfs_cmd_t on an ioctl error. Our sys_ioctl() doesn't copyout in this case. To solve this issue a new struct zfs_iocparm_t is introduced consisting of: - zfs_ioctl_version (future backwards compatibility purposes) - user space pointer to zfs_cmd_t (copyin and copyout) - size of zfs_cmd_t (verification purposes) The copyin and copyout of zfs_cmd_t is now done the illumos (vendor) way what makes porting of new changes easier and ensures correct behavior if returning an error. MFC after: 10 days	2013-04-09 22:27:44 +00:00
Martin Matuska	a548fef5dc	MFV r249186: Do not list read-only pools in zpool.cache Reduce diff against vendor in unused vdev_disk.c Illumos ZFS issues: 3639 zpool.cache should skip over readonly pools 3640 want automatic devid updates MFC after: 1 week	2013-04-06 17:24:00 +00:00
Martin Matuska	95e7edacfe	MFV r248660: Merge vendor change - modify time processing in deadman thread. Illumos ZFS issues: 3618 ::zio dcmd does not show timestamp data MFC after: 3 weeks	2013-04-06 17:15:47 +00:00
Martin Matuska	367437755d	Provide a fix for kernel panic if receiving recursive deduplicated streams. Problem reported to vendor. Illumos ZFS issues: 3692 Panic on zfs receive of a recursive deduplicated stream MFC after: 2 weeks	2013-04-06 11:54:41 +00:00
Martin Matuska	f1b5c26470	MFV r248217: Merge change from vendor to reduce diff only. ZFS dtrace probes are not supported on FreeBSD yet. Illumos ZFS issues: 3598 want to dtrace when errors are generated in zfs MFC after: 3 weeks	2013-04-06 10:39:38 +00:00
Martin Matuska	bae7bccf39	MFV r242816: Import vendor change to reduce diff, no effect on FreeBSD. Illumos ZFS issues: 3517 importing pool with autoreplace=on and "hole" vdevs crashes syseventd	2013-04-06 08:21:37 +00:00
Andriy Gapon	9ff9b984c9	spa_open_common: fix argument to zvol_create_minors Prior to r248571 spa_open was always called with a bare pool name, but now it is called with a dataset name instead (spa_lookup handles that). So, when a ZFS root is mounted spa_open is called with a name of a root dataset, which can very well be different from the pool name. But zvol_create_minors should be called with the pool name, because it performs a recursive traversal of all datasets under the name to find all those that are volumes. MFC after: 7 days	2013-04-03 11:06:26 +00:00
Martin Matuska	03863a70e1	Fix possible pool hold leak in dmu_send_impl() Problem reported to vendor: https://www.illumos.org/issues/3645 Reported by: Andriy Gapon <avg@FreeBSD.org> MFC after: 15 days	2013-04-03 09:52:30 +00:00
Martin Matuska	41451f4a0e	Do not check against uninitialized rc and comment out vendor code MFC after: 16 days	2013-04-02 08:15:39 +00:00
Pedro F. Giffuni	9f4c7ba460	Dtrace: enablings on defunct providers prevent providers from unregistering Merge change from illumos: 1368 enablings on defunct providers prevent providers from unregistering We try to address some underlying differences between the Solaris and FreeBSD implementations: dtrace_attach() / dtrace_detach() are currently unimplemented in FreeBSD but the new code from illumos makes use of taskq so some adaptations were made to dtrace_open() and dtrace_close() to handle them appropriately. Illumos Revision: r13430:8e6add739e38 Reference: https://www.illumos.org/issues/1368 Reviewed by: gnn Tested by: Fabian Keil Obtained from: Illumos MFC after: 3 weeks	2013-04-01 19:13:46 +00:00
Martin Matuska	20547d41f8	Call dmu_snapshot_list_next() in zvol.c with dsl_pool_config lock held Submitted by: Andriy Gapon <avg@FreeBSD.org> MFC after: 17 days	2013-04-01 16:14:57 +00:00
Pedro F. Giffuni	f5678b698a	Dtrace: dtrace.c erroneously checks for memory alignment on amd64. Merge change from illumos: 3511 dtrace.c erroneously checks for memory alignment on amd64 Illumos Revision: c93cc65 Reference: https://www.illumos.org/issues/3511 Obtained from: Illumos MFC after: 3 weeks	2013-03-26 20:17:08 +00:00
Pedro F. Giffuni	5472787377	Dtrace: Add SUN MDB-like type-aware print() action. Merge change from illumos: 1694 Add type-aware print() action This is a very nice feature implemented in upstream Dtrace. A complete description is available here: http://dtrace.org/blogs/eschrock/2011/10/26/your-mdb-fell-into-my-dtrace/ This change bumps the DT_VERS_* number to 1.9.0 in accordance to what is done in illumos. While here also include some minor cleanups to ease further merging and appease clang with a fix by Fabian Keil. Illumos Revisions: 13501:c3a7090dbc16 13483:f413e6c5d297 Reference: https://www.illumos.org/issues/1560 https://www.illumos.org/issues/1694 Tested by: Fabian Keil Obtained from: Illumos MFC after: 1 month	2013-03-25 20:38:09 +00:00
Pedro F. Giffuni	730cecb05a	Dtrace: add toupper()/tolower() and enhancements to lltostr(). Merge changes from illumos: 1451 DTrace needs toupper()/tolower() subroutines 1457 lltostr() D subroutine should take an optional base This change bumps the DT_VERS_* number to 1.8.1 in accordance to what is done in illumos. The test suite we currently include is outdated and doesnt support some updates in tst.subr.d which had to be left out for now. Illumos Revisions: r13458 5e394d8db762 r13459 c3454574dd1a Reference: https://www.illumos.org/issues/1451 https://www.illumos.org/issues/1457 Tested by: Fabian Keil Obtained from: Illumos MFC after: 1 month	2013-03-25 15:40:57 +00:00
Pedro F. Giffuni	f2e66d30b8	Dtrace: add optional size argument to tracemem(). Merge change from illumos: 1455 DTrace tracemem() should take an optional size argument Our local enhancements to dt_print_bytes were equivalent to those in illumos but we made it match the illumos version to ease further code merges. For now leave out tst.smallsize.d and tst.smallsize.d.out since those don't seem to work cleanly on FreeBSD. This change bumps the DT_VERS_* number to 1.7.1 in accordance to what is done in illumos. Illumos Revision: 13457:571b0355c2e3 Reference: https://www.illumos.org/issues/1455 Tested by: Fabian Keil Obtained from: Illumos MFC after: 1 month	2013-03-24 19:12:08 +00:00
Will Andrews	58567a1b4e	ZFS: Fix a panic while unmounting a busy filesystem. This particular scenario was easily reproduced using a NFS export. When the first 'zfs unmount' occurred, it returned EBUSY via this path, while vflush() had flushed references on the filesystem's root vnode, which in turn caused its v_interlock to be destroyed. The next time 'zfs unmount' was called, vflush() tried to obtain this lock, which caused this panic. Since vflush() on FreeBSD is a definitive call, there is no need to check vfsp->vfs_count after it completes. Simply #ifdef sun this check. Submitted by: avg Reviewed by: avg Approved by: ken (mentor) MFC after: 1 month	2013-03-23 16:34:56 +00:00
Andriy Gapon	aaf2546b67	fbt_getargdesc: correctly handle types for return probes MFC after: 6 days	2013-03-23 08:52:50 +00:00
Andriy Gapon	a47016e9a9	fbt_typoff_init: fix an off by one in determining required memory size This issue would be silent most of the time, but if the requested memory is a multiple of a page size, then accessing one element beyond the end would lead to a kernel page fault. Otherwise, the unlucky last type would just be inaccessible. Reported by: glebius Tested by: glebius MFC after: 6 days	2013-03-23 08:48:44 +00:00
Steven Hartland	def84b9736	Fix for building libzpool under i386. Reviewed by: pjd (mentor) Approved by: pjd (mentor) MFC after: 2 weeks	2013-03-21 23:06:11 +00:00
Steven Hartland	2b114ad2a4	Add missing descriptions for ZFS sysctls Reviewed by: pjd (mentor) Approved by: pjd (mentor) MFC after: 2 weeks	2013-03-21 11:25:21 +00:00
Steven Hartland	adea827b21	Optimisation of TRIM processing. Previously TRIM processing was very bursty. This was made worse by the fact that TRIM requests on SSD's are typically much slower than reads or writes. This often resulted in stalls while large numbers of TRIM's where processed. In addition due to the way the TRIM thread was only woken by writes, deletes could stall in the queue for extensive periods of time. This patch adds a number of controls to how often the TRIM thread for each SPA processes its outstanding delete requests. vfs.zfs.trim.timeout: Delay TRIMs by up to this many seconds vfs.zfs.trim.txg_delay: Delay TRIMs by up to this many TXGs (reduced to 32) vfs.zfs.vdev.trim_max_bytes: Maximum pending TRIM bytes for a vdev vfs.zfs.vdev.trim_max_pending: Maximum pending TRIM segments for a vdev vfs.zfs.trim.max_interval: Maximum interval between TRIM queue processing (seconds) Given the most common TRIM implementation is ATA TRIM the current defaults are targeted at that. Reviewed by: pjd (mentor) Approved by: pjd (mentor) MFC after: 2 weeks	2013-03-21 11:02:08 +00:00
Steven Hartland	6ad46cec23	Names the ZFS TRIM thread Reviewed by: pjd (mentor) Approved by: pjd (mentor) MFC after: 2 weeks	2013-03-21 10:41:30 +00:00
Steven Hartland	89e5b43079	TRIM cache devices based on time instead of TXGs. Currently, the trim module uses the same algorithm for data and cache devices when deciding to issue TRIM requests, based on how far in the past the TXG is. Unfortunately, this is not ideal for cache devices, because the L2ARC doesn't use the concept of TXGs at all. In fact, when using a pool for reading only, the L2ARC is written but the TXG counter doesn't increase, and so no new TRIM requests are issued to the cache device. This patch fixes the issue by using time instead of the TXG number as the criteria for trimming on cache devices. The basic delay principle stays the same, but parameters are expressed in seconds instead of TXGs. The new parameters are named trim_l2arc_limit and trim_l2arc_batch, and both default to 30 second. Reviewed by: pjd (mentor) Approved by: pjd (mentor) Obtained from: `17122c31ac` MFC after: 2 weeks	2013-03-21 10:29:05 +00:00
Steven Hartland	78ad0c1c80	Improve TXG handling in the TRIM module. This patch adds some improvements to the way the trim module considers TXGs: - Free ZIOs are registered with the TXG from the ZIO itself, not the current SPA syncing TXG (which may be out of date); - L2ARC are registered with a zero TXG number, as L2ARC has no concept of TXGs; - The TXG limit for issuing TRIMs is now computed from the last synced TXG, not the currently syncing TXG. Indeed, under extremely unlikely race conditions, there is a risk we could trim blocks which have been freed in a TXG that has not finished syncing, resulting in potential data corruption in case of a crash. Reviewed by: pjd (mentor) Approved by: pjd (mentor) Obtained from: `5b46ad40d9` MFC after: 2 weeks	2013-03-21 10:16:10 +00:00
Steven Hartland	e07e3a3792	Don't register repair writes in the trim map. The trim map inflight writes tree assumes non-conflicting writes, i.e. that there will never be two simultaneous write I/Os to the same range on the same vdev. This seemed like a sane assumption; however, in actual testing, it appears that repair I/Os can very well conflict with "normal" writes. I'm not quite sure if these conflicting writes are supposed to happen or not, but in the mean time, let's ignore repair writes for now. This should be safe considering that, by definition, we never repair blocks that are freed. Reviewed by: pjd (mentor) Approved by: pjd (mentor) Obtained from: Source: `6a3cebaf7c`	2013-03-21 10:02:32 +00:00
Steven Hartland	e05aad2d33	Add TRIM support for L2ARC. This adds TRIM support to cache vdevs. When ARC buffers are removed from the L2ARC in arc_hdr_destroy(), arc_release() or l2arc_evict(), the size previously occupied by the buffer gets scheduled for TRIMming. As always, actual TRIMs are only issued to the L2ARC after txg_trim_limit. Reviewed by: pjd (mentor) Approved by: pjd (mentor) Obtained from: `31aae37399` MFC after: 2 weeks	2013-03-21 09:34:41 +00:00
Martin Matuska	192d547574	Release hold on pool before calling zvol_create_minor()	2013-03-20 09:56:20 +00:00
Martin Matuska	a0abc0d302	Run zvol_create_minors() only if in non-error case	2013-03-19 22:27:15 +00:00
Martin Matuska	e56718d734	Run zvol_create_minors() on snapshot creation	2013-03-19 22:14:50 +00:00
Martin Matuska	07091d8f14	MFV r247580: Merge synctask code restructuring from vendor. Modify forward and backward compatibility to support new change. Illumos ZFS issues: 3464 zfs synctask code needs restructuring Sponsored by: Hybrid Logic Ltd.	2013-03-19 12:51:18 +00:00
Martin Matuska	520268fb97	MFC @248493	2013-03-19 11:09:15 +00:00
Martin Matuska	87a5cb4650	Plug memory leak in dsl_check_snap_cb() This was unnoticed because the function is very rarely used. MFC after: 3 days	2013-03-19 07:47:51 +00:00
Martin Matuska	a602517b63	Add missing zvol_create_mirrors() on zfs_ioc_create()	2013-03-18 20:22:40 +00:00
John Baldwin	3cf3b9f097	Partially revert r195702. Deferring stops is now implemented via a set of calls to toggle TDF_SBDRY rather than passing PBDRY to individual sleep calls. - Remove the stop_allowed parameters from cursig() and issignal(). issignal() checks TDF_SBDRY directly. - Remove the PBDRY and SLEEPQ_STOP_ON_BDRY flags.	2013-03-18 17:23:58 +00:00
Martin Matuska	876a84e867	MFC @248461	2013-03-18 09:39:51 +00:00
Martin Matuska	6f4accc2de	Move common zfs ioctl compatibility functions (userland) into libzfs_compat.c Introduce additional constants for zfs ioctl versions	2013-03-18 09:32:29 +00:00
Justin Hibbits	80a5635c8b	Add FBT for PowerPC DTrace. Also, clean up the DTrace assembly code, much of which is not necessary for PowerPC. The FBT module can likely be factored into 3 separate files: common, intel, and powerpc, rather than duplicating most of the code between the x86 and PowerPC flavors. All DTrace modules for PowerPC will be MFC'd together once Fasttrap is completed.	2013-03-18 05:30:18 +00:00
Martin Matuska	af2e40ccd1	Merge libzfs_core part of r239388 Illumos ZFS issues: 3085 zfs diff panics, then panics in a loop on booting References: https://www.illumos.org/issues/3085	2013-03-17 18:49:11 +00:00
Martin Matuska	70b0720877	Fix accidentially changed ioc variable for old v15 compatibility	2013-03-17 17:28:06 +00:00
Martin Matuska	6cf922c88b	Fix typo in sysctl description Reported by: Jeremy Chadwick MFC after: 3 days	2013-03-17 15:53:27 +00:00
Martin Matuska	e2b4467975	libzfs_core: - provide complete backwards compatibility (old utility, new kernel) - add zfs_cmd_t compatibility mapping in both directions - determine ioctl address in zfs_ioctl_compat.c	2013-03-17 10:57:04 +00:00
Martin Matuska	4f33cfb284	Initialize "error" variable where illumos does.	2013-03-16 20:28:38 +00:00
Martin Matuska	a03fbc7ecf	MFC @248093	2013-03-09 11:57:51 +00:00
Attilio Rao	89f6b8632c	Switch the vm_object mutex to be a rwlock. This will enable in the future further optimizations where the vm_object lock will be held in read mode most of the time the page cache resident pool of pages are accessed for reading purposes. The change is mostly mechanical but few notes are reported: * The KPI changes as follow: - VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK() - VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK() - VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK() - VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED() (in order to avoid visibility of implementation details) - The read-mode operations are added: VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(), VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED() * The vm/vm_pager.h namespace pollution avoidance (forcing requiring sys/mutex.h in consumers directly to cater its inlining functions using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h consumers now must include also sys/rwlock.h. * zfs requires a quite convoluted fix to include FreeBSD rwlocks into the compat layer because the name clash between FreeBSD and solaris versions must be avoided. At this purpose zfs redefines the vm_object locking functions directly, isolating the FreeBSD components in specific compat stubs. The KPI results heavilly broken by this commit. Thirdy part ports must be updated accordingly (I can think off-hand of VirtualBox, for example). Sponsored by: EMC / Isilon storage division Reviewed by: jeff Reviewed by: pjd (ZFS specific review) Discussed with: alc Tested by: pho	2013-03-09 02:32:23 +00:00
Attilio Rao	c934116100	Merge from vmc-playground: Introduce a new KPI that verifies if the page cache is empty for a specified vm_object. This KPI does not make assumptions about the locking in order to be used also for building assertions at init and destroy time. It is mostly used to hide implementation details of the page cache. Sponsored by: EMC / Isilon storage division Reviewed by: jeff Reviewed by: alc (vm_radix based version) Tested by: flo, pho, jhb, davide	2013-03-09 02:05:29 +00:00
Martin Matuska	91edf37414	Comment out unfeasible illumos copyin code and restore previous behavior.	2013-03-07 23:45:16 +00:00
Martin Matuska	b3b6851c78	Add missing init functions Reduce diff to illumos	2013-03-06 11:33:25 +00:00
Xin LI	2f79ac7f21	Diff reduction with Illumos	2013-03-06 01:21:56 +00:00
Xin LI	227d24fc59	Use adx2 instead of adx in the second vsprintf, this fixes a panic.	2013-03-05 22:58:53 +00:00
Martin Matuska	400c4069a5	MFV r247845: Import ZFS bpobj bugfix from vendor. Illumos ZFS issues: 3603 panic from bpobj_enqueue_subobj() 3604 zdb should print bpobjs more verbosely References: https://www.illumos.org/issues/3603 https://www.illumos.org/issues/3604 MFC after: 1 week	2013-03-05 18:54:41 +00:00
Martin Matuska	dce1a726f2	WiP merge of libzfs_core (MFV r238590, r238592) not yet working, ioctl handling needs to be changed	2013-03-05 08:09:53 +00:00
Justin T. Gibbs	7e2a739f03	Fix assertion failure when using userland DTrace probes from the pid provider on a kernel compiled with INVARIANTS. sys/cddl/contrib/opensolaris/uts/intel/dtrace/fasttrap_isa.c: In fasttrap_probe_pid(), attempts to write to the address space of the thread that fired the probe must be performed with the process of the thread held. Use _PHOLD() to ensure this is the case. In fasttrap_probe_pid(), use proc_write_regs() instead of calling set_regs() directly. proc_write_regs() performs invariant checks to verify the calling environment of set_regs(). PROC_LOCK()/UNLOCK() around the call to proc_write_regs() so that it's invariants are satisfied. Sponsored by: Spectra Logic Corporation Reviewed by: gnn, rpaulo MFC after: 1 week	2013-03-04 22:07:36 +00:00
Rui Paulo	e0dffa2de2	Remove the extra parenthesis from the cv_init() macro. They are not necessary because we already use parenthesis in zfs_cv_init(). This fixes a long standing bug where there would be an extra ")" at the end of the string. This extra parenthesis would show up in the WCHAN of the process (top, stty status, etc.).	2013-03-03 06:42:36 +00:00
Pawel Jakub Dawidek	2609222ab4	Merge Capsicum overhaul: - Capability is no longer separate descriptor type. Now every descriptor has set of its own capability rights. - The cap_new(2) system call is left, but it is no longer documented and should not be used in new code. - The new syscall cap_rights_limit(2) should be used instead of cap_new(2), which limits capability rights of the given descriptor without creating a new one. - The cap_getrights(2) syscall is renamed to cap_rights_get(2). - If CAP_IOCTL capability right is present we can further reduce allowed ioctls list with the new cap_ioctls_limit(2) syscall. List of allowed ioctls can be retrived with cap_ioctls_get(2) syscall. - If CAP_FCNTL capability right is present we can further reduce fcntls that can be used with the new cap_fcntls_limit(2) syscall and retrive them with cap_fcntls_get(2). - To support ioctl and fcntl white-listing the filedesc structure was heavly modified. - The audit subsystem, kdump and procstat tools were updated to recognize new syscalls. - Capability rights were revised and eventhough I tried hard to provide backward API and ABI compatibility there are some incompatible changes that are described in detail below: CAP_CREATE old behaviour: - Allow for openat(2)+O_CREAT. - Allow for linkat(2). - Allow for symlinkat(2). CAP_CREATE new behaviour: - Allow for openat(2)+O_CREAT. Added CAP_LINKAT: - Allow for linkat(2). ABI: Reuses CAP_RMDIR bit. - Allow to be target for renameat(2). Added CAP_SYMLINKAT: - Allow for symlinkat(2). Removed CAP_DELETE. Old behaviour: - Allow for unlinkat(2) when removing non-directory object. - Allow to be source for renameat(2). Removed CAP_RMDIR. Old behaviour: - Allow for unlinkat(2) when removing directory. Added CAP_RENAMEAT: - Required for source directory for the renameat(2) syscall. Added CAP_UNLINKAT (effectively it replaces CAP_DELETE and CAP_RMDIR): - Allow for unlinkat(2) on any object. - Required if target of renameat(2) exists and will be removed by this call. Removed CAP_MAPEXEC. CAP_MMAP old behaviour: - Allow for mmap(2) with any combination of PROT_NONE, PROT_READ and PROT_WRITE. CAP_MMAP new behaviour: - Allow for mmap(2)+PROT_NONE. Added CAP_MMAP_R: - Allow for mmap(PROT_READ). Added CAP_MMAP_W: - Allow for mmap(PROT_WRITE). Added CAP_MMAP_X: - Allow for mmap(PROT_EXEC). Added CAP_MMAP_RW: - Allow for mmap(PROT_READ \| PROT_WRITE). Added CAP_MMAP_RX: - Allow for mmap(PROT_READ \| PROT_EXEC). Added CAP_MMAP_WX: - Allow for mmap(PROT_WRITE \| PROT_EXEC). Added CAP_MMAP_RWX: - Allow for mmap(PROT_READ \| PROT_WRITE \| PROT_EXEC). Renamed CAP_MKDIR to CAP_MKDIRAT. Renamed CAP_MKFIFO to CAP_MKFIFOAT. Renamed CAP_MKNODE to CAP_MKNODEAT. CAP_READ old behaviour: - Allow pread(2). - Disallow read(2), readv(2) (if there is no CAP_SEEK). CAP_READ new behaviour: - Allow read(2), readv(2). - Disallow pread(2) (CAP_SEEK was also required). CAP_WRITE old behaviour: - Allow pwrite(2). - Disallow write(2), writev(2) (if there is no CAP_SEEK). CAP_WRITE new behaviour: - Allow write(2), writev(2). - Disallow pwrite(2) (CAP_SEEK was also required). Added convinient defines: #define CAP_PREAD (CAP_SEEK \| CAP_READ) #define CAP_PWRITE (CAP_SEEK \| CAP_WRITE) #define CAP_MMAP_R (CAP_MMAP \| CAP_SEEK \| CAP_READ) #define CAP_MMAP_W (CAP_MMAP \| CAP_SEEK \| CAP_WRITE) #define CAP_MMAP_X (CAP_MMAP \| CAP_SEEK \| 0x0000000000000008ULL) #define CAP_MMAP_RW (CAP_MMAP_R \| CAP_MMAP_W) #define CAP_MMAP_RX (CAP_MMAP_R \| CAP_MMAP_X) #define CAP_MMAP_WX (CAP_MMAP_W \| CAP_MMAP_X) #define CAP_MMAP_RWX (CAP_MMAP_R \| CAP_MMAP_W \| CAP_MMAP_X) #define CAP_RECV CAP_READ #define CAP_SEND CAP_WRITE #define CAP_SOCK_CLIENT \ (CAP_CONNECT \| CAP_GETPEERNAME \| CAP_GETSOCKNAME \| CAP_GETSOCKOPT \| \ CAP_PEELOFF \| CAP_RECV \| CAP_SEND \| CAP_SETSOCKOPT \| CAP_SHUTDOWN) #define CAP_SOCK_SERVER \ (CAP_ACCEPT \| CAP_BIND \| CAP_GETPEERNAME \| CAP_GETSOCKNAME \| \ CAP_GETSOCKOPT \| CAP_LISTEN \| CAP_PEELOFF \| CAP_RECV \| CAP_SEND \| \ CAP_SETSOCKOPT \| CAP_SHUTDOWN) Added defines for backward API compatibility: #define CAP_MAPEXEC CAP_MMAP_X #define CAP_DELETE CAP_UNLINKAT #define CAP_MKDIR CAP_MKDIRAT #define CAP_RMDIR CAP_UNLINKAT #define CAP_MKFIFO CAP_MKFIFOAT #define CAP_MKNOD CAP_MKNODAT #define CAP_SOCK_ALL (CAP_SOCK_CLIENT \| CAP_SOCK_SERVER) Sponsored by: The FreeBSD Foundation Reviewed by: Christoph Mallon <christoph.mallon@gmx.de> Many aspects discussed with: rwatson, benl, jonathan ABI compatibility discussed with: kib	2013-03-02 00:53:12 +00:00
Xin LI	5c737b11df	MFV r247575: Import a fix tighten assertion on SPA versions from vendor (Illumos). Illumos ZFS issue: 3543 Feature flags causes assertion in spa.c to miss certain cases MFC after: 2 weeks	2013-03-01 22:20:13 +00:00
Martin Matuska	4abd59512a	MFV r247316: Merge new read-only zfs properties from vendor (illumos) Illumos ZFS issues: 3588 provide zfs properties for logical (uncompressed) space used and referenced References: https://www.illumos.org/issues/3588 MFC after: 2 weeks	2013-03-01 21:58:51 +00:00
Martin Matuska	bb508e7732	Fix the zfs_ioctl compat layer to support zfs_cmd size change introduced in r247265 (ZFS deadman thread). Both new utilities now support the old kernel and new kernel properly detects old utilities. For future backwards compatibility, the vfs.zfs.version.ioctl read-only sysctl has been introduced. With this sysctl zfs utilities will be able to detect the ioctl interface version of the currently loaded zfs module. As a side effect, the zfs utilities between r247265 and this revision don't support the old kernel module. If you are using HEAD newer or equal than r247265, install the new kernel module (or whole kernel) first. MFC after: 10 days	2013-03-01 09:42:58 +00:00
Martin Matuska	24245e76ea	MFV 247176, 247178, 247315: Import metaslab_sync() speedup from vendor (illumos). Illumos ZFS issues: 3552 condensing one space map burns 3 seconds of CPU in spa_sync() thread 3564 spa_sync() spends 5-10% of its time in metaslab_sync() (when not condensing) 3578 transferring the freed map to the defer map should be constant time 3579 ztest trips assertion in metaslab_weight() References: https://www.illumos.org/issues/3552 https://www.illumos.org/issues/3564 https://www.illumos.org/issues/3578 https://www.illumos.org/issues/3579 MFC after: 2 weeks	2013-02-27 14:45:23 +00:00
Martin Matuska	e4428d63a8	Be more verbose on ZFS deadman I/O panic Patch suggested upstream. Suggested by: Olivier Cinquin MFC after: 12 days	2013-02-26 20:41:27 +00:00
Martin Matuska	e70664bafc	MFV v242732: Merge the ZFS I/O deadman thread from vendor (illumos). This feature panics the system on hanging ZFS I/O, helps debugging and resumes failed service. The panic behavior can be controlled with the loader-only tunables: vfs.zfs.deadman_enabled (enable or disable panic on stalled ZFS I/O) vfs.zfs.deadman_synctime (expiration time for stalled ZFS I/O) By default, ZFS I/O deadman is enabled by default on amd64 and i386 excluding virtual guest machines. Illumos ZFS issues: 3246 ZFS I/O deadman thread References: https://www.illumos.org/issues/3246 MFC after: 2 weeks	2013-02-25 12:33:31 +00:00
Martin Matuska	781c0f87d3	MFV r246653: Import vendor change to avoid "unitialized variable" warnings. Illumos ZFS issues: 3522 zfs module should not allow uninitialized variables References: https://www.illumos.org/issues/3522	2013-02-23 11:21:05 +00:00
Justin T. Gibbs	6a8f90edf5	Avoid panic when tearing down the DTrace pid provider for a process that has crashed. sys/cddl/contrib/opensolaris/uts/common/dtrace/fasttrap.c: In fasttrap_pid_disable(), we cannot PHOLD the proc structure for a process that no longer exists, but we still have other, fasttrap specific, state that must be cleaned up for probes that existed in the dead process. Instead of returning early if the process related to our probes isn't found, conditionalize the locking and carry on with a NULL proc pointer. The rest of the fasttrap code already understands that a NULL proc is possible and does the right things in this case. Sponsored by: Spectra Logic Corporation Reviewed by: rpaulo, gnn MFC after: 1 week	2013-02-20 17:55:17 +00:00
Xin LI	e469d5a70f	Eliminate real_LZ4_uncompress. It's unused and does not perform sufficient check against input stream (i.e. it could read beyond specified input buffer).	2013-02-14 21:02:18 +00:00
Martin Matuska	7c695febc9	Change vfs.zfs.write_to_degraded from CTLFLAG_RW to CTLFLAG_RWTUN Suggested by: pjd	2013-02-13 23:11:25 +00:00
Xin LI	314caba11c	Restore De Bruijn algorithm for sparc64 where the compiler rely on a library function for __builtin_c?z. Tested by: Michael Moll <kvedulv kvedulv de>	2013-02-13 17:30:54 +00:00
Martin Matuska	6a33bbc041	Merge zfs_ioctl.c code that should have been merged together with ZFS v28. Fixes several problems if working with read-only pools. Changed code originaly introduced in onnv-gate 13061:bda0decf867b Contains changes up to illumos-gate 13700:4bc0783f6064 PR: kern/175897 Suggested by: avg MFC after: 2 weeks	2013-02-11 21:10:55 +00:00
Martin Matuska	9689178c3f	MFV r246633: Import vendor bugfixes regarding SA rounding, header size and layout. This was already partially fixed by avg. Illumos ZFS issues: 3512 rounding discrepancy in sa_find_sizes() 3513 mismatch between SA header size and layout References: https://www.illumos.org/issues/3512 https://www.illumos.org/issues/3513 MFC after: 2 weeks	2013-02-11 14:29:38 +00:00
Martin Matuska	bb03847418	MFV r246394: Add tunable to allow block allocation on degraded vdevs. Illumos ZFS issues: 3507 Tunable to allow block allocation even on degraded vdevs References: https://www.illumos.org/issues/3507 MFC after: 2 weeks	2013-02-11 13:59:57 +00:00
Martin Matuska	ff20578569	MFV r246392: Import vendor ZFS bugfix fixing a possible deadlock in arc_read(). Illumos ZFS issues: 3498 panic in arc_read(): !refcount_is_zero(&pbuf->b_hdr->b_refcnt) References: https://www.illumos.org/issues/3498 MFC after: 2 weeks	2013-02-11 12:42:11 +00:00
Martin Matuska	8a2dc7faae	MFV r246390: Import minor type change in refcount.h header from vendor (illumos). MFC after: 2 weeks	2013-02-11 07:48:57 +00:00
Martin Matuska	fd9778c236	MFV r246388: Import vendor bugfixes Illumos ZFS issues: 3422 zpool create/syseventd race yield non-importable pool 3425 first write to a new zvol can fail with EFBIG References: https://www.illumos.org/issues/3422 https://www.illumos.org/issues/3425 MFC after: 2 weeks	2013-02-10 19:32:55 +00:00
Xin LI	ef17620fc8	MFV r245512: * Illumos zfs issue #3035 [1] LZ4 compression support in ZFS. LZ4 is a new high-speed BSD-licensed compression algorithm created by Yann Collet that delivers very high compression and decompression performance compared to lzjb (>50% faster on compression, >80% faster on decompression and around 3x faster on compression of incompressible data), while giving better compression ratio [1]. This version of LZ4 corresponds to upstream's [2] revision 85. Please note that for obvious reasons this is not backward read compatible. This means once a pool have LZ4 compressed data, these data can no longer be read by older ZFS implementations. Local changes: - On-stack hash table disabled and using kernel slab allocator instead, at this time. This requires larger kernel thread stack for zio workers. This may change in the future should we adjusted the zio workers' thread stack size. - likely and unlikely will be undefined if they are already defined, this is required for i386 XEN build. - Removed De Bruijn sequence based __builtin_ctz family of builtins in favor of the latter. Both GCC and clang supports these builtins. - Changed the way the LZ4 code detects endianness. - Manual pages modifications to mention the feature based on Illumos counterpart. - Boot loader changes to make it support LZ4 decompression. [1] https://www.illumos.org/issues/3035 [2] http://code.google.com/p/lz4/source/list Obtained from: Illumos (13921:9d721847e469) Tested on: FreeBSD/amd64 MFC after: 1 month	2013-02-09 06:39:28 +00:00
Sergey Kandaurov	3c56b4f165	Fix warning: comparison of unsigned expression < 0 is always false. Reported by: clang	2013-02-08 09:54:53 +00:00
Andriy Gapon	0dcab786b8	zfs_vget, zfs_fhtovp: properly handle the z_shares_dir object A special gfs vnode corresponds to that object. A regular zfs vnode must not be returned. This should be upstreamed. Reported by: pluknet Submitted by: rmacklem Tested by: pluknet MFC after: 10 days	2013-02-08 07:49:54 +00:00
Andriy Gapon	e2bb19dce5	zfs: update comments about zfid_long_t to match the FreeBSD definitions MFC after: 1 week	2013-02-08 07:44:15 +00:00
Andriy Gapon	c7d346f269	zfs: fix, improve and re-organize page_lookup and page_unlock Now they are split into two pairs: page_hold/page_unhold for mappedread and page_busy/page_unbusy for update_pages. For mappedread we simply hold a page that is to be used as a source if it is resident and valid (and not busy). This is sufficient since we are only doing page -> user buffer copying. There is no page <-> backing storage I/O involved. update_pages is now better split to properly handle the putpages case (page -> arc) and the regular write case (arc -> page). For the latter we use complete protocol of marking an object with paging-in-progress and marking a page with io_start (busy count). Also, in this case we remove the write bit from all page mappings and clear dirty bits of the pages, the former is needed to ensure that the latter does the right thing. Additionally we update a page if it is cached instead of just freeing it as was done before. This needs to be verified. A minor detail: ZFS-backed pages should always be either fully valid or fully invalid. Assert this and use simpler API that does not deal with sub-page blocks. Reviewed by: kib MFC after: 26 days	2013-02-03 18:42:20 +00:00
Justin Hibbits	7e7a9efdb5	Fix the PowerPC DTrace copy functions. The kernel doesn't hold the same view to the user map, so use the md copy in/out functions provided by the kernel. MFC with: r242723	2013-02-03 00:19:34 +00:00
Andriy Gapon	5583e07188	solaris compat: remove KM_ZERO - there is no such flag in Solaris and derivatives - the flag was added in an unrelated change - the flag is not used The proper way to allocate zeroed out memory is to use kmem_zalloc. MFC after: 3 days	2013-02-02 11:41:05 +00:00
Andriy Gapon	13235aaa89	zfs: add MODULE_VERSION for zfsctrl This should allow the kernel linker to easily detect a situation when the module is present both in a kernel and in a preloaded file (zfs.ko). Reviewed by: jhb MFC after: 5 days	2013-02-02 11:35:18 +00:00
Andriy Gapon	ea84c62f93	spa_generate_rootconf: add support for old vdev labels It seems that old ZFS versions (v15) completely omit "vdev_children" property when there is a single child. Reported by: jase Tested by: jase MFC after: 1 week	2013-01-26 10:34:17 +00:00
Xin LI	5c74885e99	MFV r245510: improve the comment in txg.c Obtained from: Illumos (13910:f3454e0a097c) MFC after: 2 weeks	2013-01-16 22:59:50 +00:00
Konstantin Belousov	614b9f9130	For zfs vnodes, use the standard inode number based hash algorithm. Reviewed and tested by: peter Sponsored by: The FreeBSD Foundation MFC after: 5 days	2013-01-14 05:45:33 +00:00
Xin LI	290a1ba9a4	The current ZFS code expects ddt_zap_count to always succeed by asserting the underlying zap_count() to return no errors. However, it is possible that the pool reaches to such a state where zap_count would return error, leading to panics when a pool is imported. This commit changes the ddt_zap_count to return error returned from zap_count and handle the error appropriately. With this change, it's now possible to let zpool rollback damaged transaction groups and import the pool. Obtained from: ZFS on Linux github (`e8fd45a0f9`) MFC after: 1 month	2013-01-10 19:26:56 +00:00
Andriy Gapon	f71fbb1d12	zfs: solaris doesn't have KM_ZERO, kmem_zalloc should be used instead To do: remove KM_ZERO declaration Pointyhat to: avg (for mindlessly using the pseudo-flag) MFC after: instantly (to fix stable/8 build)	2012-12-23 19:58:41 +00:00
Ryan Stone	6969ef6656	Correct a series of errors in the hand-rolled locking for drace_debug.c: - Use spinlock_enter()/spinlock_exit() to prevent a thread holding a debug lock from being preempted to prevent other threads waiting on that lock from starvation. - Handle the possibility of CPU migration in between the fetch of curcpu and the call to spinlock_enter() by saving curcpu in a local variable. - Use memory barriers to prevent reordering of loads and stores of the data protected by the lock outside of the critical section - Eliminate false sharing of the locks by moving them into the structures that they protect and aligning them to a cacheline boundary. - Record the owning thread in the lock to make debugging future problems easier. Reviewed by: rpaulo (initial version) MFC after: 2 weeks	2012-12-23 15:50:37 +00:00

1 2 3 4 5 ...

861 Commits